Executor Harness | Jazzmine Core

Tool System: Executor Harness

1. Behavior and Context

In the framework's architecture, the harness is the ENTRYPOINT of the Docker image.

Persistent Execution: Unlike a standard script that runs and exits, the harness is a long-lived process. It initializes once and then enters a loop, reading execution requests from stdin and streaming results to stdout.
Isolation: It runs under a restricted user (sandbox) with all Linux capabilities dropped and a read-only root filesystem.
Async-to-Sync Bridging: It manages a dedicated background thread and a private asyncio event loop. It automatically wraps async def tools so that LLM-generated scripts (which are written as standard synchronous Python) can call them without await syntax.
Event-Driven Communication: It communicates with the host via a newline-terminated JSON wire protocol, emitting granular events for logs, intermediate data, and final results.

2. Purpose

Secure Code Evaluation: Providing a restricted environment for executing untrusted code via Python's exec() and compile() built-ins.
Resource Monitoring: Enforcing internal wall-clock timeouts using Unix signals (SIGALRM) to ensure a runaway script cannot hang the container.
Dynamic Capability: Loading tools from a volume-mounted directory at runtime, allowing the same base image to support different sets of skills.
Handshake Orchestration: Emitting standard signals (like ready and script_done) to allow the host-side SandboxPool and ScriptExecutor to synchronize with the container's state.

3. High-Level API (Internal Script Scope)

The "API" of the harness is what is visible to the LLM-generated script body. The harness injects a specific set of helpers and pre-loaded tools into the global namespace.

Example: Script Body Logic

python

# The LLM generates this body. The harness provides the tools and helpers.

# 1. Call a pre-loaded tool (Harness handles the async/sync bridging)
res = fetch_user_profile(user_id="user_99")

# 2. Logic check
if res.success:
    # 3. Emit an intermediate update
    emit_intermediate("profile_found", {"username": res.data["name"]})
    
    # 4. Finalize the task
    emit_result({"status": "completed", "points": res.data["loyalty_points"]})
else:
    # 5. Log a debug message and fail
    emit_log(f"Failed to find user: {res.message}", level="warning")
    emit_result({"error": "user_not_found"})

5. Detailed Functionality

Core Helpers (Callable by Scripts)

emit_result(data)

Functionality: Finalizes the current task and returns the final payload to the Agent.
Parameters: data (Any): A JSON-serializable object.
Note: Calling this finishes the logic portion of the script.

emit_intermediate(label, data)

Functionality: Streams partial results back to the Agent during execution.
Parameters: label (str), data (Any).
Use Case: Providing the Agent with data to "think" about before the final answer is ready.

emit_log(message, level="info")

Functionality: Sends a diagnostic message to the host logs without affecting the task result.

Internal Runtime Logic

_load_tools()

Functionality: Scans the /tools/ directory for .py files.
Mechanism: It reads each file, compiles it, and executes it within the _base_globals dictionary. This makes every function defined in those files available for the LLM to call.

_wrap_async_tools()

Functionality: Automates asyncio integration.
Mechanism: It inspects all loaded tool functions. If a function is an async def, it is wrapped in a synchronous closure that uses asyncio.run_coroutine_threadsafe to execute the task on the dedicated tool-loop thread.

_run_script(script, execution_id, timeout, mode)

Functionality: The evaluation core.
Context Isolation: Creates a fresh dict of globals for each run.
Timeout: Sets a signal.alarm(timeout).
Execution: Runs the compiled script bytecode.
Error Capture: If an exception occurs, it captures the traceback.format_exc() and emits an error event.

main()

Functionality: The request-response loop.
It emits {"type": "ready"} upon successful startup.
It reads JSON requests from sys.stdin.
It performs _check_secrets to ensure required environment variables are present before execution.
It always emits {"type": "script_done"} after a script finishes, regardless of whether it succeeded, failed, or timed out.

5. The Wire Protocol (Stdout Events)

The harness communicates with the host via these JSON event types:

Event Type	Description
ready	Sent once when the container is fully initialized and tools are loaded.
final_result	Sent when the script calls emit_result(). Contains the data payload.
intermediate	Sent when the script calls emit_intermediate().
error	Sent if the script crashes or times out. Includes message and traceback.
log	Sent when the script calls emit_log().
script_done	Crucial: Sent after every execution turn. Signals the host to stop waiting.

6. Error Handling

Timeouts: If the script exceeds the allocated time, the OS sends a SIGALRM. The harness catches this, resets the alarm, and returns a structured error: "Script timed out after Xs".
Security Violations: If the script attempts to call sys.exit(), the harness catches the SystemExit exception and emits a standard error event instead of allowing the process to die.
Secret Validation: The harness checks for missing environment variables before running the script. If a required secret (e.g., STRIPE_KEY) is missing, it aborts and returns an error message listing the missing keys.

7. Remarks

IO Thread Safety: The harness uses a threading.Lock (_stdout_lock) to ensure that JSON events from the main thread and the async-tool thread do not overlap or corrupt the stdout stream.
Synchronous LLM Scripts: The decision to wrap async tools as sync functions is intentional. It allows the Agent to write standard Python logic without understanding complex asynchronous concepts, leading to significantly higher code-generation success rates.
State Management: By using a dedicated globals dictionary per execution, the harness ensures that variables defined in a previous failed attempt do not pollute the namespace of a retry attempt.