1. Behavior and Context
In the tool execution pipeline, the ScriptExecutor is invoked by the ToolOrchestrator after a container has been successfully checked out from the SandboxPool.
- Protocol Management: It implements a newline-terminated JSON protocol over standard I/O pipes (stdin/stdout). It serializes execution requests into the sandbox and deserializes events (logs, intermediates, results) as they are emitted.
- Event-Driven Architecture: The executor does not wait for the script to finish before acting. It processes events asynchronously, allowing for real-time log streaming and progress updates via callbacks.
- Mode Awareness:
- In PLAN mode, it waits specifically for a final_result or an error event.
- In INTERACTIVE mode, it also recognizes the script_done event, which signals that a specific step in a multi-part task has concluded.
- Safety Thresholds: It strictly enforces resource limits at the communication layer, such as maximum output size (in bytes) and wall-clock execution timeouts.
2. Purpose
- Bidirectional Communication: Managing the low-level asyncio streams of a running Docker container.
- Runtime Observability: Capturing and aggregating logs, intermediate data slices, and full Python stack traces.
- Resource Enforcement: Protecting the host system from "Output Flooding" (scripts generating infinite text) and hanging processes.
- Structured Feedback: Transforming raw container stdout into a strongly-typed ExecutionResult that the Agent can "read" to understand the outcome of its work.
3. High-Level API
The ScriptExecutor is initialized with a policy and an optional callback for real-time data streaming.
Example: Executing a Script with Real-time Callbacks
import asyncio
from jazzmine.core.tools import ScriptExecutor, ResourceLimits, ExecutionMode
# 1. Define a callback to handle partial results (e.g., updating a UI)
async def on_data(event: dict):
print(f"Streaming data update: {event['label']} -> {event['data']}")
# 2. Setup the executor with limits
executor = ScriptExecutor(
limits=ResourceLimits(execution_timeout_sec=10),
mode=ExecutionMode.PLAN,
on_intermediate=on_data
)
# 3. (Scenario) Execute in a container obtained from the pool
async with pool.checkout("default") as container:
script = "emit_intermediate('progress', 50); emit_result({'status': 'ok'})"
result = await executor.run(
container=container,
script=script,
execution_id="unique_turn_id_123"
)
if result.success:
print(f"Success: {result.final_data}")4. Detailed Functionality
ExecutionResult [Dataclass]
The comprehensive outcome of an execution attempt.
| Attribute | Type | Description |
|---|---|---|
| success | bool | True if the script completed and called emit_result. |
| execution_id | str | The unique ID echoed back by the container. |
| final_data | Any | The JSON payload returned by the script. |
| intermediates | List[dict] | All partial results emitted via emit_intermediate. |
| logs | List[dict] | All debug messages emitted via emit_log. |
| error | str | A human-readable summary of what went wrong. |
| traceback | str | The full Python stack trace (only if the script crashed). |
| duration_ms | int | Total time spent in the run() call. |
| output_bytes | int | Total size of the JSON stream read from the container. |
ScriptExecutor [Class]
run(container, script, required_secrets=None, execution_id=None)
Functionality: The main entry point for script execution.
How it works:
- Generates a unique execution_id if not provided.
- Wraps the script and metadata into a JSON request.
- Writes the request to the container's stdin_pipe and flushes it.
- Delegates to _collect_events to monitor the output.
_collect_events(container, execution_id, timeout, t_start) [Internal]
Functionality: The asynchronous event loop that monitors the container's life.
Events Handled:
| Event | Behavior |
|---|---|
| final_result | Sets success=True, saves the payload, and returns. |
| error | Sets success=False, captures the message and traceback, and returns. |
| script_done | In INTERACTIVE mode, this triggers a return with final_data=None, indicating the current step is finished even if no final result was emitted. In PLAN mode, it is ignored while waiting for a result. |
| intermediate | Appends to the local list and triggers the on_intermediate callback. |
| log | Captures internal sandbox logs for the audit trail. |
5. Error Handling
- Timeouts: If the container does not emit a terminal event before the ResourceLimits.execution_timeout_sec (plus a 5‑second buffer) is reached, the executor returns a failure result with the error message: "Timed out waiting for container response".
- Sandbox Crashes: If the Python process inside the container crashes or is killed by the Docker OOM manager, the executor catches the IncompleteReadError and returns: "Container stdout closed unexpectedly".
- Output Flooding: It monitors the bytes_read. If a script generates more than the max_output_bytes limit (default 1MB), the executor proactively returns a failure and closes the stream to prevent host memory exhaustion.
- Invalid JSON: Any non-JSON line encountered in the container output is logged as a debug message and skipped, ensuring the event loop doesn't crash on standard print statements.
6. Remarks
- Handshake Reliability: The use of execution_id ensures that events from previous turns (that might still be in the pipe buffer) are ignored.
- Callback Latency: The on_intermediate callback is await‑ed. It is highly recommended that this callback logic be extremely efficient, as a slow callback will block the executor from reading the next events from the sandbox.
- DooD Support: Since the executor interacts with standard Python streams, it is fully agnostic of whether it is communicating with a local Docker socket or a remote one.