LLM Providers: CohereLLM

1. Behavior and Context

In the jazzmine architecture, CohereLLM functions as a specialized high-reasoning backend.

V2 Protocol: It communicates with the https://api.cohere.com/v2 endpoint, moving away from older legacy structures to a more standardized chat completion format.
Dual-Client Management: Like other providers, it maintains both an httpx.Client and httpx.AsyncClient to handle synchronous background tasks and asynchronous user-facing chat loops without blocking.
Exact Token Mapping: Cohere provides high-fidelity token usage metadata in its response. This provider extracts input_tokens and output_tokens directly, allowing for precise cost tracking and context management.

2. Purpose

Enterprise Reasoning: Leveraging models optimized for business logic and structured data extraction.
RAG Optimization: Ideal for agents that rely heavily on EpisodicMemory and large context windows, as Command R is built to cite and reason over long documents.
Cost Efficiency: Providing a balance of high performance and competitive pricing for the "Agent Reasoning" loop.

3. High-Level API Examples

Example: Initializing the Cohere Provider

python

from jazzmine.core.llm import CohereLLM

# Initialize with the Command R+ model for maximum intelligence
llm = CohereLLM(
    model="command-r-plus",
    api_key="your-cohere-api-key",
    temperature=0.3,
    max_tokens=2048,
    timeout=30.0
)

# Standard async generation
response = await llm.agenerate(messages)
print(f"Cohere says: {response.text}")
print(f"Turn cost: {response.usage.total_tokens} tokens")

4. Detailed Functionality

init(api_key, model, **kwargs)

Functionality: Sets up the API credentials and initializes the asynchronous connection pool.

Parameters:

api_key (str): Your Cohere API key.
model (str): The model ID. Defaults to "command-r-plus".
**kwargs: Inherited parameters like temperature, max_tokens, and timeout.

_prepare_payload(messages, stream) [Internal]

Functionality: Converts the framework's MessagePart list into the Cohere V2 JSON schema.

How it works: It maps the standard roles (user, assistant, system) directly to the Cohere messages array and injects the model, stream status, and sampling configuration.

_parse_response(data, start_time) [Internal]

Functionality: Navigates the Cohere V2 response tree to extract text and usage data.

How it works:

Text Extraction: Accesses the text content via the path MessagePart -> content[0] -> text.
Usage Normalization: It maps Cohere's input_tokens and output_tokens fields to the standardized LLMUsage object.

stream / astream

Functionality: Processes real-time token events from the Cohere API.

How it works: It listens for JSON events emitted by the /chat endpoint. It specifically identifies events of type: "content-delta". It then extracts the incremental text change from the delta object and yields it to the caller.

5. Error Handling

LLMRateLimitError: Explicitly caught when the Cohere API returns a 429 status code, indicating that the plan's quota has been exceeded.
LLMTimeoutError: Raised if the request exceeds the seconds specified in the constructor (defaulting to the httpx timeout).
LLMInternalError: Raised for general server-side issues or non-200 status codes not covered by specific error types.
JSON Resilience: Both the standard and streaming parsers include try...except blocks for JSONDecodeError, ensuring that malformed partial chunks do not crash the entire agent turn.

6. Remarks

V2 Endpoint: This provider targets https://api.cohere.com/v2. If you are using an older version of the Cohere API, ensure you check for endpoint compatibility.
Command R optimization: When using this provider for the main agent loop, it is recommended to set a lower temperature (e.g., 0.1 to 0.3) to maximize the consistency of the tool-calling logic.
Context Management: Always call await llm.aclose() at the end of your session to ensure the httpx clients are closed properly.