Tool System: Orchestrator
1. Behavior and Context
In the framework's architecture, the Orchestrator is the primary interface used by the Agent. It abstracts away the low-level details of Docker management and JSON wire protocols.
Key behaviors:
- Mode Strategy: It supports two distinct execution patterns:
- PLAN Mode: A "One-Shot" approach where the LLM writes a single script to solve the entire task. It handles retries automatically if the script fails.
- INTERACTIVE Mode: A "Step-by-Step" approach where the LLM writes one step, sees the result, and then writes the next step. This is ideal for complex data exploration.
- Telemetry Aggregation: It produces a comprehensive RunRecord for every task, capturing every attempt, token usage, latency, and full Python tracebacks for auditing.
- Intelligent Retries: When a script fails (either via AST validation or a runtime crash), the Orchestrator provides the specific error back to the generator to create a corrected version.
- Loop Detection: In interactive sessions, it monitors for "Stagnation" (where the LLM produces identical results multiple times without finishing). It can proactively force the session to conclude to prevent infinite token waste.
2. Purpose
- Task Resolution: Bridging the gap between natural language intent and executable Python logic.
- Resiliency: Implementing a robust "Try-Correct-Retry" loop to handle the inherent non-determinism of LLM-generated code.
- Observability: Providing a high-fidelity audit trail for every technical action the agent performs.
- Resource Management: Coordinating with the SandboxPool to ensure containers are checked out and returned safely.
3. High-Level API & Examples
The ToolOrchestrator requires a ToolRegistry, a SandboxPool, and an LLMCaller adapter.
Example: Executing a Task in PLAN Mode
from jazzmine.core.tools import ToolOrchestrator, LLMCallResult
# 1. Define an adapter for your LLM provider
async def my_llm_adapter(prompt: str) -> LLMCallResult:
# prompt is the generation instruction from the Orchestrator
# ... call LLM API ...
return LLMCallResult(
text="emit_result({'status': 'ok'})",
prompt_tokens=150,
completion_tokens=25,
model="gpt-4o"
)
# 2. Initialize Orchestrator
orch = ToolOrchestrator(
registry=my_registry,
pool=my_pool,
llm_call=my_llm_adapter,
max_retries=3
)
# 3. Execute
result = await orch.execute(
task="Calculate the average age of users in the 'users' database.",
sandbox_name="database_env"
)
if result.success:
print(f"Data: {result.final_data}")
# Format for the Agent's context
print(result.to_agent_context())5. Detailed Class Functionality
LLMCallResult [Dataclass]
The required return type for the LLMCaller function.
- text: The generated Python code body.
- prompt_tokens / completion_tokens: Usage metrics used for aggregate turn costing.
- latency_ms: Generation time.
RunRecord [Dataclass]
The "Master Log" for a single execute() or execute_interactive() call.
- attempts: List of AttemptRecord (for PLAN mode).
- steps: List of StepRecord (for INTERACTIVE mode).
- total_prompt_tokens: The sum of all tokens used across all retry attempts.
- last_traceback(): Automatically retrieves the most recent Python error stack trace from the failed attempts.
ToolExecutionResult [Dataclass]
The object returned to the framework after execution.
- final_data: The payload from the sandbox.
- to_agent_context(): Generates an XML block (e.g., <tool_execution ...>) containing the result. This is designed to be appended to the agent's message history so the model can "read" the data it requested.
ToolOrchestrator [Main Class]
execute(task, sandbox_name)
Pattern: Plan Mode.
- Calls ScriptGenerator to build a prompt.
- Requests a script from the LLM.
- Assembles the script and runs AST Validation.
- If valid, checks out a container and calls ScriptExecutor.
- If execution fails, it repeats the loop (up to max_retries) using the previous error as feedback.
execute_interactive(task, sandbox_name, max_steps)
Pattern: Interactive Mode.
- Maintains a step_history.
- On each step, it builds a prompt containing all data collected so far.
- Loop Detection: If the results of Step N and Step N-1 are identical, it sets force_finish=True. This passes all collected data into a variable named collected in the script scope and commands the LLM to simply output the final answer.
- Ends when a script calls emit_result() or max_steps is reached.
6. Error Handling
Infrastructure Failures
If the SandboxPool fails to provide a container (e.g., Docker is down), the Orchestrator catches the error, marks the RunRecord as failed, and returns the error message to the agent.
Validation Failures
If the LLM generates code that uses a forbidden builtin (like open()) or forgets to call emit_result(), the Orchestrator intercepts the AST violations. It treats these as a "soft failure" and uses the violation list to prompt the LLM for a corrected script.
Max Retries
If the max_retries limit is reached without success, the Orchestrator returns a ToolExecutionResult with success=False. It includes the specific error from the last attempt, ensuring the agent can explain the failure to the user.
7. Remarks
Correlation & Logging
The Orchestrator is designed to work with structured logging. It emits script_generated and script_executed events. Because the agent typically binds context (like trace_id) to the thread/task variables, these orchestrator logs are automatically correlated with the specific user turn.
Token Optimization
In Interactive Mode, the Orchestrator tells the LLM: "Do NOT re-fetch data already shown above." This instruction, combined with the structured step_history, prevents the model from wasting tokens by repeatedly calling the same tools.
LLM Adapter Flexibility
The LLMCaller type is just a callable. This allows you to use different LLMs for different tasks—for example, using a high-reasoning model for the Agent reasoning, but a specialized, cheaper coding model for the Orchestrator's script generation.
# Use a coding model for the orchestrator
coder_llm = OpenAICompatibleLLM(model="deepseek-coder", ...)
orch = ToolOrchestrator(..., llm_call=coder_llm.agenerate)