Agent: telemetry
1. Behavior and Context
In the jazzmine architecture, TurnTelemetry acts as the transient state for observability:
- Turn-Scoped Lifecycle: An instance is created at the exact moment Agent.chat() is entered and discarded once the turn's data is safely written to the MessageStore.
- Multisystem Capture: It records data from disparate systems: LLM providers (latency and tokens), the Sandbox orchestrator (scripts and results), and the Working Memory (slot and flow transitions).
- Mode Awareness: It contains specialized logic to flatten the complex step-attempt hierarchy of "Interactive Mode" tool execution into a linear "Tool Trace" suitable for auditing.
2. Purpose
- Audit Fidelity: Ensuring every line of code generated by the LLM and every response from the sandbox is preserved for post-mortem analysis.
- Precise Token Accounting: Aggregating usage from "Reasoning" calls (the main loop) and "Coding" calls (script generation) to provide an accurate total turn cost.
- Latency Attribution: Providing timestamps for every sub-operation to help identify bottlenecks in the agent's response time.
- Event Reconstruction: Linking high-level flow changes to the specific tool results that triggered them.
3. High-Level API (Internal Implementation)
While TurnTelemetry is primarily an internal component, understanding its API is vital for extending the agent's logging capabilities.
Example: Recording an LLM interaction
# Assuming 'telemetry' is an instance of TurnTelemetry
from jazzmine.core.agent.telemetry import TurnTelemetry
telemetry = TurnTelemetry()
# 1. Record a successful LLM call
telemetry.record_llm_ok(
purpose="agent_loop",
response=llm_response_obj, # Contains text, model, and usage
latency_ms=450
)
# 2. Add a trace from a sandbox execution (RunRecord)
telemetry.add_tool_trace(
run=orchestrator_run_record,
task_index=1,
description="Retrieve account balance"
)
# 3. Access aggregated totals
print(f"Total Prompt Tokens: {telemetry.total_prompt_tokens()}")5. Detailed Functionality
TurnTelemetry [Class]
record_llm_ok(purpose, response, latency_ms)
Functionality: Extracts metadata from a successful LLMResponse and appends an LLMCallRecord to the internal list.
- purpose: String label (e.g., "enhancement", "agent_loop", "slot_extraction").
- latency_ms: Measured network/generation time.
record_llm_error(purpose, error, model="")
Functionality: Records a failed attempt to reach an LLM provider. This ensures that even if the agent fails, the audit trail shows the exact exception that occurred.
add_tool_trace(run, task_index, description)
Functionality: The primary bridge between the ToolOrchestrator and the Agent. It takes a low-level RunRecord (which contains raw sandbox logs and scripts) and transforms it into a high-level ToolTrace.
total_prompt_tokens() / total_completion_tokens()
Functionality: Performs a turn-wide summation.
- Direct Tokens: Counts tokens from reasoning calls (stored in llm_calls).
- Generated Tokens: Counts tokens from script generation attempts (stored within tool_traces).
run_record_to_tool_trace(...) [Module Helper]
Functionality: Implements the transformation logic for sandbox telemetry.
How it works:
- Plan Mode: Maps every attempt 1:1 to a ScriptGenAttempt.
- Interactive Mode: Flattens the nested step-attempt structure into a single chronological list of attempts, assigning a "Global Attempt Number" to each script generation.
- Latency Calculation: In interactive mode, it sums the durations of the last successful attempt of every step to provide a realistic execution latency metric.
- Result Capture: Aggregates all intermediate results (emitted via emit_intermediate) and the final structured data into the trace.
6. Token Accounting Logic
The module implements a strict "Split-Sum" strategy:
- Reasoning Tokens: Tokens consumed by the LLM when deciding what to do.
- Coding Tokens: Tokens consumed by the LLM (or a specialized coder model) when writing the actual Python script for the sandbox.
- Slot Extraction Tokens: Tokens consumed when the agent is trying to parse user inputs into specific form fields.
By summing these across llm_calls and tool_traces, the agent provides a complete financial and resource usage picture for the turn.
7. Error Handling
- Attribute Safety: record_llm_ok uses getattr() with defaults of 0 or "" for response objects. This prevents the telemetry system from crashing if a mock LLM or an edge-case provider response is missing standard fields.
- JSON Serialization: All data captured here is designed to be compatible with the MessageStore persistence logic, which relies on JSON serialization of the Pydantic models.
8. Remarks
- Interactive Mode Trace: Note that in "Interactive Mode," multiple Python scripts might be generated for a single task. The ToolTrace resulting from this mode includes all of them, allowing developers to see the step-by-step "thought process" of the sandbox logic.
- Correlation: The started_at_ms field is captured the moment the class is instantiated. This provides a baseline to calculate the total wall-clock time for the turn.