LLM
Core reference

LLM Core: utils

The utils module provides the helper logic required to handle token accounting and data normalization across different LLM providers. In a multi-provider environment, not every API returns token usage in the same format—or at all. This module ensures that the framework has a consistent baseline for tracking consumption and costs.

1. Behavior and Context

In the jazzmine architecture, the utils module acts as a "Data Sanitizer" for the LLM providers.

  • Fallback Estimation: When using local models (via LocalLLM) or providers that omit usage metadata, the module provides a heuristic estimator.
  • Standardization: It maps various provider-specific dictionaries into the unified LLMUsage dataclass defined in the types module.

2. Purpose

  • Usage Consistency: To provide a single source of truth for creating LLMUsage objects.
  • Predictability: To ensure that even if an API call fails to return metadata, the system can still estimate the "weight" of the turn for context-window management.
  • Abstraction: To keep provider-specific parsing logic out of the core agent loop.

3. High-Level API

The utility functions are used internally by classes like OpenAICompatibleLLM and GeminiLLM, but they can be used independently for pre-computation.

Example: Estimating costs before a call

python
from jazzmine.core.llm.utils import estimate_tokens, normalize_usage

prompt = "Translate the following text to French: 'Hello world'"
# Get a quick estimate of tokens
tokens = estimate_tokens(prompt)
print(f"Estimated prompt tokens: {tokens}")

# Create a usage object manually
usage = normalize_usage(prompt=prompt, completion="Bonjour le monde", provider_usage=None)
print(f"Total Turn Tokens: {usage.total_tokens}")

4. Detailed Functionality

estimate_tokens(text: str) -> int

Functionality: Performs a conservative heuristic estimation of the number of tokens in a string.

Parameters:

  • text (str): The raw string to measure.

How it works: It uses a common industry "rule of thumb" where approximately 4 characters equate to 1 token (based on Byte-Pair Encoding averages). It ensures a minimum of 1 token is returned for non-empty strings.


normalize_usage(...)

Functionality: Constructs a standardized LLMUsage object from either raw strings or provider-provided metadata.

Parameters:

ParameterTypeDescription
promptstrThe original input text sent to the model.
completionstrThe resulting text generated by the model.
provider_usageOptional[dict]The raw usage dictionary returned by the API (e.g., OpenAI's usage field).

How it works:

  1. Check Provider Data: If provider_usage is present, it attempts to extract prompt_tokens, completion_tokens, total_tokens, and cost directly from the dict keys.
  2. Fallback to Estimation: If no provider data is available, it calls estimate_tokens on both the prompt and completion strings to fill in the metrics.

5. Error Handling

  • Missing Dictionary Keys: normalize_usage uses .get(..., 0) when reading from provider dictionaries. This prevents KeyError if a specific provider uses slightly different naming conventions or omits a field (like cost).
  • Type Safety: The functions assume the input text is a string. Passing None or other types will result in a standard Python TypeError.

6. Remarks

  • Accuracy Warning: estimate_tokens is a heuristic, not an exact count. Every model (Llama, GPT, Claude) uses a different tokenizer. For precise billing or strict context-limit enforcement, always rely on the provider_usage data if available.
  • BPE Approximation: The estimation logic is specifically tuned to approximate GPT-style tokenization, which is the most common standard for the providers supported by jazzmine.