Agent
Core reference

Telemetry

The telemetry module is the "Black Box Recorder" for the jazzmine Agent. It provides a mutable accumulator class, TurnTelemetry, which travels with the request through every phase of the Agent.chat() loop. It is responsible for capturing granular performance metrics, token consumption, and execution events as they occur, which are eventually crystallized into a persistent TurnTrace.

Agent: telemetry

1. Behavior and Context

In the jazzmine architecture, TurnTelemetry acts as the transient state for observability:

  • Turn-Scoped Lifecycle: An instance is created at the exact moment Agent.chat() is entered and discarded once the turn's data is safely written to the MessageStore.
  • Multisystem Capture: It records data from disparate systems: LLM providers (latency and tokens), the Sandbox orchestrator (scripts and results), and the Working Memory (slot and flow transitions).
  • Mode Awareness: It contains specialized logic to flatten the complex step-attempt hierarchy of "Interactive Mode" tool execution into a linear "Tool Trace" suitable for auditing.

2. Purpose

  • Audit Fidelity: Ensuring every line of code generated by the LLM and every response from the sandbox is preserved for post-mortem analysis.
  • Precise Token Accounting: Aggregating usage from "Reasoning" calls (the main loop) and "Coding" calls (script generation) to provide an accurate total turn cost.
  • Latency Attribution: Providing timestamps for every sub-operation to help identify bottlenecks in the agent's response time.
  • Event Reconstruction: Linking high-level flow changes to the specific tool results that triggered them.

3. High-Level API (Internal Implementation)

While TurnTelemetry is primarily an internal component, understanding its API is vital for extending the agent's logging capabilities.

Example: Recording an LLM interaction

python
# Assuming 'telemetry' is an instance of TurnTelemetry
from jazzmine.core.agent.telemetry import TurnTelemetry

telemetry = TurnTelemetry()

# 1. Record a successful LLM call
telemetry.record_llm_ok(
    purpose="agent_loop",
    response=llm_response_obj,  # Contains text, model, and usage
    latency_ms=450
)

# 2. Add a trace from a sandbox execution (RunRecord)
telemetry.add_tool_trace(
    run=orchestrator_run_record,
    task_index=1,
    description="Retrieve account balance"
)

# 3. Access aggregated totals
print(f"Total Prompt Tokens: {telemetry.total_prompt_tokens()}")

5. Detailed Functionality

TurnTelemetry [Class]

record_llm_ok(purpose, response, latency_ms)

Functionality: Extracts metadata from a successful LLMResponse and appends an LLMCallRecord to the internal list.

  • purpose: String label (e.g., "enhancement", "agent_loop", "slot_extraction").
  • latency_ms: Measured network/generation time.

record_llm_error(purpose, error, model="")

Functionality: Records a failed attempt to reach an LLM provider. This ensures that even if the agent fails, the audit trail shows the exact exception that occurred.


add_tool_trace(run, task_index, description)

Functionality: The primary bridge between the ToolOrchestrator and the Agent. It takes a low-level RunRecord (which contains raw sandbox logs and scripts) and transforms it into a high-level ToolTrace.


total_prompt_tokens() / total_completion_tokens()

Functionality: Performs a turn-wide summation.

  • Direct Tokens: Counts tokens from reasoning calls (stored in llm_calls).
  • Generated Tokens: Counts tokens from script generation attempts (stored within tool_traces).

run_record_to_tool_trace(...) [Module Helper]

Functionality: Implements the transformation logic for sandbox telemetry.

How it works:

  • Plan Mode: Maps every attempt 1:1 to a ScriptGenAttempt.
  • Interactive Mode: Flattens the nested step-attempt structure into a single chronological list of attempts, assigning a "Global Attempt Number" to each script generation.
  • Latency Calculation: In interactive mode, it sums the durations of the last successful attempt of every step to provide a realistic execution latency metric.
  • Result Capture: Aggregates all intermediate results (emitted via emit_intermediate) and the final structured data into the trace.

6. Token Accounting Logic

The module implements a strict "Split-Sum" strategy:

  • Reasoning Tokens: Tokens consumed by the LLM when deciding what to do.
  • Coding Tokens: Tokens consumed by the LLM (or a specialized coder model) when writing the actual Python script for the sandbox.
  • Slot Extraction Tokens: Tokens consumed when the agent is trying to parse user inputs into specific form fields.

By summing these across llm_calls and tool_traces, the agent provides a complete financial and resource usage picture for the turn.


7. Error Handling

  • Attribute Safety: record_llm_ok uses getattr() with defaults of 0 or "" for response objects. This prevents the telemetry system from crashing if a mock LLM or an edge-case provider response is missing standard fields.
  • JSON Serialization: All data captured here is designed to be compatible with the MessageStore persistence logic, which relies on JSON serialization of the Pydantic models.

8. Remarks

  • Interactive Mode Trace: Note that in "Interactive Mode," multiple Python scripts might be generated for a single task. The ToolTrace resulting from this mode includes all of them, allowing developers to see the step-by-step "thought process" of the sandbox logic.
  • Correlation: The started_at_ms field is captured the moment the class is instantiated. This provides a baseline to calculate the total wall-clock time for the turn.