Tool System: Executor Harness
1. Behavior and Context
In the framework's architecture, the harness is the ENTRYPOINT of the Docker image.
- Persistent Execution: Unlike a standard script that runs and exits, the harness is a long-lived process. It initializes once and then enters a loop, reading execution requests from stdin and streaming results to stdout.
- Isolation: It runs under a restricted user (sandbox) with all Linux capabilities dropped and a read-only root filesystem.
- Async-to-Sync Bridging: It manages a dedicated background thread and a private asyncio event loop. It automatically wraps async def tools so that LLM-generated scripts (which are written as standard synchronous Python) can call them without await syntax.
- Event-Driven Communication: It communicates with the host via a newline-terminated JSON wire protocol, emitting granular events for logs, intermediate data, and final results.
2. Purpose
- Secure Code Evaluation: Providing a restricted environment for executing untrusted code via Python's exec() and compile() built-ins.
- Resource Monitoring: Enforcing internal wall-clock timeouts using Unix signals (SIGALRM) to ensure a runaway script cannot hang the container.
- Dynamic Capability: Loading tools from a volume-mounted directory at runtime, allowing the same base image to support different sets of skills.
- Handshake Orchestration: Emitting standard signals (like ready and script_done) to allow the host-side SandboxPool and ScriptExecutor to synchronize with the container's state.
3. High-Level API (Internal Script Scope)
The "API" of the harness is what is visible to the LLM-generated script body. The harness injects a specific set of helpers and pre-loaded tools into the global namespace.
Example: Script Body Logic
python
# The LLM generates this body. The harness provides the tools and helpers.
# 1. Call a pre-loaded tool (Harness handles the async/sync bridging)
res = fetch_user_profile(user_id="user_99")
# 2. Logic check
if res.success:
# 3. Emit an intermediate update
emit_intermediate("profile_found", {"username": res.data["name"]})
# 4. Finalize the task
emit_result({"status": "completed", "points": res.data["loyalty_points"]})
else:
# 5. Log a debug message and fail
emit_log(f"Failed to find user: {res.message}", level="warning")
emit_result({"error": "user_not_found"})5. Detailed Functionality
Core Helpers (Callable by Scripts)
emit_result(data)
- Functionality: Finalizes the current task and returns the final payload to the Agent.
- Parameters: data (Any): A JSON-serializable object.
- Note: Calling this finishes the logic portion of the script.
emit_intermediate(label, data)
- Functionality: Streams partial results back to the Agent during execution.
- Parameters: label (str), data (Any).
- Use Case: Providing the Agent with data to "think" about before the final answer is ready.
emit_log(message, level="info")
- Functionality: Sends a diagnostic message to the host logs without affecting the task result.
Internal Runtime Logic
_load_tools()
- Functionality: Scans the /tools/ directory for .py files.
- Mechanism: It reads each file, compiles it, and executes it within the _base_globals dictionary. This makes every function defined in those files available for the LLM to call.
_wrap_async_tools()
- Functionality: Automates asyncio integration.
- Mechanism: It inspects all loaded tool functions. If a function is an async def, it is wrapped in a synchronous closure that uses asyncio.run_coroutine_threadsafe to execute the task on the dedicated tool-loop thread.
_run_script(script, execution_id, timeout, mode)
- Functionality: The evaluation core.
- Context Isolation: Creates a fresh dict of globals for each run.
- Timeout: Sets a signal.alarm(timeout).
- Execution: Runs the compiled script bytecode.
- Error Capture: If an exception occurs, it captures the traceback.format_exc() and emits an error event.
main()
- Functionality: The request-response loop.
- It emits {"type": "ready"} upon successful startup.
- It reads JSON requests from sys.stdin.
- It performs _check_secrets to ensure required environment variables are present before execution.
- It always emits {"type": "script_done"} after a script finishes, regardless of whether it succeeded, failed, or timed out.
5. The Wire Protocol (Stdout Events)
The harness communicates with the host via these JSON event types:
| Event Type | Description |
|---|---|
| ready | Sent once when the container is fully initialized and tools are loaded. |
| final_result | Sent when the script calls emit_result(). Contains the data payload. |
| intermediate | Sent when the script calls emit_intermediate(). |
| error | Sent if the script crashes or times out. Includes message and traceback. |
| log | Sent when the script calls emit_log(). |
| script_done | Crucial: Sent after every execution turn. Signals the host to stop waiting. |
6. Error Handling
- Timeouts: If the script exceeds the allocated time, the OS sends a SIGALRM. The harness catches this, resets the alarm, and returns a structured error: "Script timed out after Xs".
- Security Violations: If the script attempts to call sys.exit(), the harness catches the SystemExit exception and emits a standard error event instead of allowing the process to die.
- Secret Validation: The harness checks for missing environment variables before running the script. If a required secret (e.g., STRIPE_KEY) is missing, it aborts and returns an error message listing the missing keys.
7. Remarks
- IO Thread Safety: The harness uses a threading.Lock (_stdout_lock) to ensure that JSON events from the main thread and the async-tool thread do not overlap or corrupt the stdout stream.
- Synchronous LLM Scripts: The decision to wrap async tools as sync functions is intentional. It allows the Agent to write standard Python logic without understanding complex asynchronous concepts, leading to significantly higher code-generation success rates.
- State Management: By using a dedicated globals dictionary per execution, the harness ensures that variables defined in a previous failed attempt do not pollute the namespace of a retry attempt.