LLM Providers
Core reference

LLM Providers: CohereLLM

The CohereLLM provider integrates Cohere’s powerful family of models (such as Command R and Command R+) into the jazzmine framework. These models are specifically optimized for long-context understanding, Retrieval-Augmented Generation (RAG), and complex tool-use scenarios. This provider utilizes the Cohere V2 Chat API, ensuring compatibility with the latest features and improved message structuring.

1. Behavior and Context

In the jazzmine architecture, CohereLLM functions as a specialized high-reasoning backend.

  • V2 Protocol: It communicates with the https://api.cohere.com/v2 endpoint, moving away from older legacy structures to a more standardized chat completion format.
  • Dual-Client Management: Like other providers, it maintains both an httpx.Client and httpx.AsyncClient to handle synchronous background tasks and asynchronous user-facing chat loops without blocking.
  • Exact Token Mapping: Cohere provides high-fidelity token usage metadata in its response. This provider extracts input_tokens and output_tokens directly, allowing for precise cost tracking and context management.

2. Purpose

  • Enterprise Reasoning: Leveraging models optimized for business logic and structured data extraction.
  • RAG Optimization: Ideal for agents that rely heavily on EpisodicMemory and large context windows, as Command R is built to cite and reason over long documents.
  • Cost Efficiency: Providing a balance of high performance and competitive pricing for the "Agent Reasoning" loop.

3. High-Level API Examples

Example: Initializing the Cohere Provider

python
from jazzmine.core.llm import CohereLLM

# Initialize with the Command R+ model for maximum intelligence
llm = CohereLLM(
    model="command-r-plus",
    api_key="your-cohere-api-key",
    temperature=0.3,
    max_tokens=2048,
    timeout=30.0
)

# Standard async generation
response = await llm.agenerate(messages)
print(f"Cohere says: {response.text}")
print(f"Turn cost: {response.usage.total_tokens} tokens")

4. Detailed Functionality

__init__(api_key, model, **kwargs)

Functionality: Sets up the API credentials and initializes the asynchronous connection pool.

Parameters:

  • api_key (str): Your Cohere API key.
  • model (str): The model ID. Defaults to "command-r-plus".
  • **kwargs: Inherited parameters like temperature, max_tokens, and timeout.

_prepare_payload(messages, stream) [Internal]

Functionality: Converts the framework's MessagePart list into the Cohere V2 JSON schema.

How it works: It maps the standard roles (user, assistant, system) directly to the Cohere messages array and injects the model, stream status, and sampling configuration.


_parse_response(data, start_time) [Internal]

Functionality: Navigates the Cohere V2 response tree to extract text and usage data.

How it works:

  • Text Extraction: Accesses the text content via the path MessagePart -> content[0] -> text.
  • Usage Normalization: It maps Cohere's input_tokens and output_tokens fields to the standardized LLMUsage object.

stream / astream

Functionality: Processes real-time token events from the Cohere API.

How it works: It listens for JSON events emitted by the /chat endpoint. It specifically identifies events of type: "content-delta". It then extracts the incremental text change from the delta object and yields it to the caller.


5. Error Handling

  • LLMRateLimitError: Explicitly caught when the Cohere API returns a 429 status code, indicating that the plan's quota has been exceeded.
  • LLMTimeoutError: Raised if the request exceeds the seconds specified in the constructor (defaulting to the httpx timeout).
  • LLMInternalError: Raised for general server-side issues or non-200 status codes not covered by specific error types.
  • JSON Resilience: Both the standard and streaming parsers include try...except blocks for JSONDecodeError, ensuring that malformed partial chunks do not crash the entire agent turn.

6. Remarks

  • V2 Endpoint: This provider targets https://api.cohere.com/v2. If you are using an older version of the Cohere API, ensure you check for endpoint compatibility.
  • Command R optimization: When using this provider for the main agent loop, it is recommended to set a lower temperature (e.g., 0.1 to 0.3) to maximize the consistency of the tool-calling logic.
  • Context Management: Always call await llm.aclose() at the end of your session to ensure the httpx clients are closed properly.