Tools
Core reference

Tool System: Script Generator

The ScriptGenerator is the code-authoring and security enforcement engine of the jazzmine tool system. It is responsible for bridging the gap between an LLM's high-level reasoning and a secure Docker sandbox. It generates specialized Python prompts to guide the LLM in writing executable code, assembles the resulting text into a standardized script format, and performs deep static analysis (AST validation) to ensure the code is safe and productive before it ever reaches a container.

1. Behavior and Context

In the orchestration lifecycle, the ScriptGenerator acts as the "Compiler and Security Guard."

  • Layered Construction: It enforces a three-layer script structure: a Preamble (framework-provided context), a Body (LLM-generated logic), and a Postamble (execution cleanup).
  • Prompt Specialization: It dynamically builds system prompts based on whether it is a first attempt or a retry. In retry scenarios, it provides targeted feedback (e.g., explaining KeyError vs. index access) to help the LLM correct itself.
  • Static Guarding: Before execution, it parses the LLM's code into an Abstract Syntax Tree (AST). It inspects every node to block dangerous operations (like open or exec) and ensures the script complies with the framework's communication protocol.

2. Purpose

  • Security: Preventing unauthorized filesystem access, network bypasses, or code injection by strictly controlling allowed Python built-ins.
  • Reliability: Verifying that every script actually returns data by enforcing calls to emit_result or emit_intermediate.
  • Efficiency: Caching the "Tools Prompt" XML block to reduce overhead during multi-step interactive sessions.
  • Error Recovery: Transforming raw Python tracebacks into human-readable instructions that the LLM can use to fix its own code.

3. High-Level API

The ScriptGenerator is initialized with a ToolRegistry. It is stateless and safe to use across multiple concurrent execution tasks.

Example: The Full Generation & Validation Cycle

python
from jazzmine.core.tools import ScriptGenerator, registry

# 1. Initialize
generator = ScriptGenerator(registry)

# 2. Build the prompt for the LLM
prompt = generator.build_prompt(
    task="Calculate the total revenue from the 'orders' list.",
    sandbox_name="ecommerce"
)

# 3. (Scenario) LLM generates the script body
llm_code = """
items = get_orders().data['orders']
total = sum(i['price'] for i in items)
emit_result({'total': total})
"""

# 4. Assemble into a full script
full_script = generator.assemble_script(llm_code, sandbox_name="ecommerce")

# 5. Validate security and logic
violations = generator.validate(full_script, sandbox_name="ecommerce")

if not violations:
    print("Script is safe and valid.")
else:
    print(generator.validation_error_message(violations))

4. Detailed Functionality

ScriptGenerator [Class]

build_prompt(task, sandbox_name, previous_script=None, previous_error=None, attempt=1)

Functionality: Generates the comprehensive instruction set for the LLM.

Parameters:

  • task (str): The natural language description of what the script must do.
  • sandbox_name (str): Used to retrieve the relevant tool documentation.
  • previous_script / previous_error: Provided during retries to facilitate self-correction.
  • attempt (int): The current retry counter.

How it works: If attempt == 1, it uses a standard system prompt containing 10 strict rules for script safety and logic. If attempt > 1, it switches to a retry template that includes the previous failure. It features a specialized regex "Hinter" that detects KeyError and warns the LLM not to use integer indices on dictionary-based tool results.


assemble_script(body, sandbox_name)

Functionality: Wraps the LLM's logic into the final executable file format.

How it works:

  1. Calls _strip_fences to remove any Markdown formatting (e.g., `python ).
  2. Injects a Preamble Comment listing the available tools and mandatory helpers. This serves as a lightweight "reminder" for the Python runtime and human auditors.
  3. Appends the body and a newline postamble.

validate(script, sandbox_name)

Functionality: Performs deep static analysis using the ast module.

How it works: It executes two specialized AST visitors:

  • _SecurityVisitor:
  • Blocks all import and from ... import statements (all imports are pre-provided in the sandbox).
  • Blocks exec, eval, compile, open, __import__, exit, and input.
  • The "Parentheses" Guard: Detects the common LLM error of assigning a tool function to a variable without calling it (e.g., r = get_user instead of r = get_user()).
  • _EmitCallChecker:
  • Verifies that the script contains at least one call to emit_result or emit_intermediate. If missing, the script is rejected as "unproductive."

validate_script(script, allowed_tools) [Module Function]

Functionality: The standalone engine used by the ScriptGenerator.validate method. It parses the source code into a tree and returns a list of human-readable violation strings.


5. Safe Built-ins Reference

The following Python built-ins are explicitly allowed within the sandbox:

CategoryBuilt-ins
Collectionslist, dict, set, tuple, len, range, enumerate, zip, map, filter, sorted, reversed
Typesstr, int, float, bool, type, isinstance, hasattr, getattr
Mathmin, max, sum, abs, round
Logic/Debugany, all, print, repr, format, vars, dir
Frameworkemit_intermediate, emit_result, emit_log

6. Error Handling

  • Syntax Errors: If the LLM generates unparseable Python, validate_script catches the SyntaxError and returns the specific line number and error message to the Orchestrator.
  • Security Violations: Every violation found by the AST visitors is returned as a clear, instructional string (e.g., "Line 5: call to 'open' is forbidden").
  • Empty Scripts: If the LLM returns an empty string or just whitespace, the validator rejects it as having no emit calls.

7. Remarks

  • No Markdown: The generator instructions explicitly command the LLM to return "Script body only — no markdown, no code fences." However, the _strip_fences utility is included as a robust safety net.
  • Pre-import Philosophy: jazzmine sandboxes are designed with a "Batteries Included" approach. Common libraries like json, time, and math are pre-imported by the harness. This is why the generator forbids the import statement—it eliminates the risk of an agent importing dangerous system libraries.
  • Deterministic Output: The "Tools Prompt" for each sandbox is cached. Since tools are registered at startup and are immutable at runtime, this ensures prompt construction is O(1) after the first call.