1. Behavior and Context
In the jazzmine architecture, the utils module acts as a "Data Sanitizer" for the LLM providers.
- Fallback Estimation: When using local models (via LocalLLM) or providers that omit usage metadata, the module provides a heuristic estimator.
- Standardization: It maps various provider-specific dictionaries into the unified LLMUsage dataclass defined in the types module.
2. Purpose
- Usage Consistency: To provide a single source of truth for creating LLMUsage objects.
- Predictability: To ensure that even if an API call fails to return metadata, the system can still estimate the "weight" of the turn for context-window management.
- Abstraction: To keep provider-specific parsing logic out of the core agent loop.
3. High-Level API
The utility functions are used internally by classes like OpenAICompatibleLLM and GeminiLLM, but they can be used independently for pre-computation.
Example: Estimating costs before a call
from jazzmine.core.llm.utils import estimate_tokens, normalize_usage
prompt = "Translate the following text to French: 'Hello world'"
# Get a quick estimate of tokens
tokens = estimate_tokens(prompt)
print(f"Estimated prompt tokens: {tokens}")
# Create a usage object manually
usage = normalize_usage(prompt=prompt, completion="Bonjour le monde", provider_usage=None)
print(f"Total Turn Tokens: {usage.total_tokens}")4. Detailed Functionality
estimate_tokens(text: str) -> int
Functionality: Performs a conservative heuristic estimation of the number of tokens in a string.
Parameters:
- text (str): The raw string to measure.
How it works: It uses a common industry "rule of thumb" where approximately 4 characters equate to 1 token (based on Byte-Pair Encoding averages). It ensures a minimum of 1 token is returned for non-empty strings.
normalize_usage(...)
Functionality: Constructs a standardized LLMUsage object from either raw strings or provider-provided metadata.
Parameters:
| Parameter | Type | Description |
|---|---|---|
| prompt | str | The original input text sent to the model. |
| completion | str | The resulting text generated by the model. |
| provider_usage | Optional[dict] | The raw usage dictionary returned by the API (e.g., OpenAI's usage field). |
How it works:
- Check Provider Data: If provider_usage is present, it attempts to extract prompt_tokens, completion_tokens, total_tokens, and cost directly from the dict keys.
- Fallback to Estimation: If no provider data is available, it calls estimate_tokens on both the prompt and completion strings to fill in the metrics.
5. Error Handling
- Missing Dictionary Keys: normalize_usage uses .get(..., 0) when reading from provider dictionaries. This prevents KeyError if a specific provider uses slightly different naming conventions or omits a field (like cost).
- Type Safety: The functions assume the input text is a string. Passing None or other types will result in a standard Python TypeError.
6. Remarks
- Accuracy Warning: estimate_tokens is a heuristic, not an exact count. Every model (Llama, GPT, Claude) uses a different tokenizer. For precise billing or strict context-limit enforcement, always rely on the provider_usage data if available.
- BPE Approximation: The estimation logic is specifically tuned to approximate GPT-style tokenization, which is the most common standard for the providers supported by jazzmine.