Tools
Core reference

Tool System: Script Executor

The ScriptExecutor is the low-level communication controller for the jazzmine tool system. It is responsible for transmitting Python scripts to a ManagedContainer, monitoring the container's output stream in real-time, and parsing the resulting event sequence into a structured result. It manages the delicate bidirectional bridge between the host application and the isolated sandbox environment.

1. Behavior and Context

In the tool execution pipeline, the ScriptExecutor is invoked by the ToolOrchestrator after a container has been successfully checked out from the SandboxPool.

  • Protocol Management: It implements a newline-terminated JSON protocol over standard I/O pipes (stdin/stdout). It serializes execution requests into the sandbox and deserializes events (logs, intermediates, results) as they are emitted.
  • Event-Driven Architecture: The executor does not wait for the script to finish before acting. It processes events asynchronously, allowing for real-time log streaming and progress updates via callbacks.
  • Mode Awareness:
  • In PLAN mode, it waits specifically for a final_result or an error event.
  • In INTERACTIVE mode, it also recognizes the script_done event, which signals that a specific step in a multi-part task has concluded.
  • Safety Thresholds: It strictly enforces resource limits at the communication layer, such as maximum output size (in bytes) and wall-clock execution timeouts.

2. Purpose

  • Bidirectional Communication: Managing the low-level asyncio streams of a running Docker container.
  • Runtime Observability: Capturing and aggregating logs, intermediate data slices, and full Python stack traces.
  • Resource Enforcement: Protecting the host system from "Output Flooding" (scripts generating infinite text) and hanging processes.
  • Structured Feedback: Transforming raw container stdout into a strongly-typed ExecutionResult that the Agent can "read" to understand the outcome of its work.

3. High-Level API

The ScriptExecutor is initialized with a policy and an optional callback for real-time data streaming.

Example: Executing a Script with Real-time Callbacks

python
import asyncio
from jazzmine.core.tools import ScriptExecutor, ResourceLimits, ExecutionMode

# 1. Define a callback to handle partial results (e.g., updating a UI)
async def on_data(event: dict):
    print(f"Streaming data update: {event['label']} -> {event['data']}")

# 2. Setup the executor with limits
executor = ScriptExecutor(
    limits=ResourceLimits(execution_timeout_sec=10),
    mode=ExecutionMode.PLAN,
    on_intermediate=on_data
)

# 3. (Scenario) Execute in a container obtained from the pool
async with pool.checkout("default") as container:
    script = "emit_intermediate('progress', 50); emit_result({'status': 'ok'})"
    
    result = await executor.run(
        container=container,
        script=script,
        execution_id="unique_turn_id_123"
    )

    if result.success:
        print(f"Success: {result.final_data}")

4. Detailed Functionality

ExecutionResult [Dataclass]

The comprehensive outcome of an execution attempt.

AttributeTypeDescription
successboolTrue if the script completed and called emit_result.
execution_idstrThe unique ID echoed back by the container.
final_dataAnyThe JSON payload returned by the script.
intermediatesList[dict]All partial results emitted via emit_intermediate.
logsList[dict]All debug messages emitted via emit_log.
errorstrA human-readable summary of what went wrong.
tracebackstrThe full Python stack trace (only if the script crashed).
duration_msintTotal time spent in the run() call.
output_bytesintTotal size of the JSON stream read from the container.

ScriptExecutor [Class]

run(container, script, required_secrets=None, execution_id=None)

Functionality: The main entry point for script execution.

How it works:

  1. Generates a unique execution_id if not provided.
  2. Wraps the script and metadata into a JSON request.
  3. Writes the request to the container's stdin_pipe and flushes it.
  4. Delegates to _collect_events to monitor the output.

_collect_events(container, execution_id, timeout, t_start) [Internal]

Functionality: The asynchronous event loop that monitors the container's life.

Events Handled:

EventBehavior
final_resultSets success=True, saves the payload, and returns.
errorSets success=False, captures the message and traceback, and returns.
script_doneIn INTERACTIVE mode, this triggers a return with final_data=None, indicating the current step is finished even if no final result was emitted. In PLAN mode, it is ignored while waiting for a result.
intermediateAppends to the local list and triggers the on_intermediate callback.
logCaptures internal sandbox logs for the audit trail.

5. Error Handling

  • Timeouts: If the container does not emit a terminal event before the ResourceLimits.execution_timeout_sec (plus a 5‑second buffer) is reached, the executor returns a failure result with the error message: "Timed out waiting for container response".
  • Sandbox Crashes: If the Python process inside the container crashes or is killed by the Docker OOM manager, the executor catches the IncompleteReadError and returns: "Container stdout closed unexpectedly".
  • Output Flooding: It monitors the bytes_read. If a script generates more than the max_output_bytes limit (default 1MB), the executor proactively returns a failure and closes the stream to prevent host memory exhaustion.
  • Invalid JSON: Any non-JSON line encountered in the container output is logged as a debug message and skipped, ensuring the event loop doesn't crash on standard print statements.

6. Remarks

  • Handshake Reliability: The use of execution_id ensures that events from previous turns (that might still be in the pipe buffer) are ignored.
  • Callback Latency: The on_intermediate callback is await‑ed. It is highly recommended that this callback logic be extremely efficient, as a slow callback will block the executor from reading the next events from the sandbox.
  • DooD Support: Since the executor interacts with standard Python streams, it is fully agnostic of whether it is communicating with a local Docker socket or a remote one.