Telemetry | Jazzmine Core

Agent: telemetry

1. Behavior and Context

In the jazzmine architecture, TurnTelemetry acts as the transient state for observability:

Turn-Scoped Lifecycle: An instance is created at the exact moment Agent.chat() is entered and discarded once the turn's data is safely written to the MessageStore.
Multisystem Capture: It records data from disparate systems: LLM providers (latency and tokens), the Sandbox orchestrator (scripts and results), and the Working Memory (slot and flow transitions).
Mode Awareness: It contains specialized logic to flatten the complex step-attempt hierarchy of "Interactive Mode" tool execution into a linear "Tool Trace" suitable for auditing.

2. Purpose

Audit Fidelity: Ensuring every line of code generated by the LLM and every response from the sandbox is preserved for post-mortem analysis.
Precise Token Accounting: Aggregating usage from "Reasoning" calls (the main loop) and "Coding" calls (script generation) to provide an accurate total turn cost.
Latency Attribution: Providing timestamps for every sub-operation to help identify bottlenecks in the agent's response time.
Event Reconstruction: Linking high-level flow changes to the specific tool results that triggered them.

3. High-Level API (Internal Implementation)

While TurnTelemetry is primarily an internal component, understanding its API is vital for extending the agent's logging capabilities.

Example: Recording an LLM interaction

python

# Assuming 'telemetry' is an instance of TurnTelemetry
from jazzmine.core.agent.telemetry import TurnTelemetry

telemetry = TurnTelemetry()

# 1. Record a successful LLM call
telemetry.record_llm_ok(
    purpose="agent_loop",
    response=llm_response_obj,  # Contains text, model, and usage
    latency_ms=450
)

# 2. Add a trace from a sandbox execution (RunRecord)
telemetry.add_tool_trace(
    run=orchestrator_run_record,
    task_index=1,
    description="Retrieve account balance"
)

# 3. Access aggregated totals
print(f"Total Prompt Tokens: {telemetry.total_prompt_tokens()}")

5. Detailed Functionality

TurnTelemetry [Class]

record_llm_ok(purpose, response, latency_ms)

Functionality: Extracts metadata from a successful LLMResponse and appends an LLMCallRecord to the internal list.

purpose: String label (e.g., "enhancement", "agent_loop", "slot_extraction").
latency_ms: Measured network/generation time.

record_llm_error(purpose, error, model="")

Functionality: Records a failed attempt to reach an LLM provider. This ensures that even if the agent fails, the audit trail shows the exact exception that occurred.

add_tool_trace(run, task_index, description)

Functionality: The primary bridge between the ToolOrchestrator and the Agent. It takes a low-level RunRecord (which contains raw sandbox logs and scripts) and transforms it into a high-level ToolTrace.

total_prompt_tokens() / total_completion_tokens()

Functionality: Performs a turn-wide summation.

Direct Tokens: Counts tokens from reasoning calls (stored in llm_calls).
Generated Tokens: Counts tokens from script generation attempts (stored within tool_traces).

run_record_to_tool_trace(...) [Module Helper]

Functionality: Implements the transformation logic for sandbox telemetry.

How it works:

Plan Mode: Maps every attempt 1:1 to a ScriptGenAttempt.
Interactive Mode: Flattens the nested step-attempt structure into a single chronological list of attempts, assigning a "Global Attempt Number" to each script generation.
Latency Calculation: In interactive mode, it sums the durations of the last successful attempt of every step to provide a realistic execution latency metric.
Result Capture: Aggregates all intermediate results (emitted via emit_intermediate) and the final structured data into the trace.

6. Token Accounting Logic

The module implements a strict "Split-Sum" strategy:

Reasoning Tokens: Tokens consumed by the LLM when deciding what to do.
Coding Tokens: Tokens consumed by the LLM (or a specialized coder model) when writing the actual Python script for the sandbox.
Slot Extraction Tokens: Tokens consumed when the agent is trying to parse user inputs into specific form fields.

By summing these across llm_calls and tool_traces, the agent provides a complete financial and resource usage picture for the turn.

7. Error Handling

Attribute Safety: record_llm_ok uses getattr() with defaults of 0 or "" for response objects. This prevents the telemetry system from crashing if a mock LLM or an edge-case provider response is missing standard fields.
JSON Serialization: All data captured here is designed to be compatible with the MessageStore persistence logic, which relies on JSON serialization of the Pydantic models.

8. Remarks

Interactive Mode Trace: Note that in "Interactive Mode," multiple Python scripts might be generated for a single task. The ToolTrace resulting from this mode includes all of them, allowing developers to see the step-by-step "thought process" of the sandbox logic.
Correlation: The started_at_ms field is captured the moment the class is instantiated. This provides a baseline to calculate the total wall-clock time for the turn.