LLM Providers
Core reference

LLM Providers: GeminiLLM

The GeminiLLM provider connects the jazzmine framework to Google’s Gemini family of models (e.g., Gemini 1.5 Pro, Gemini 1.5 Flash) via the Google AI (Generative Language) API. Gemini models are highly optimized for efficiency and are notable for their massive context windows and strong performance in multimodal and reasoning tasks.

1. Behavior and Context

Google's API has a distinct architecture that GeminiLLM abstracts for the framework:

  • Authentication via URL: Unlike other providers that use Bearer tokens in headers, Gemini requires the API key as a query parameter in the request URL (?key=...).
  • Role Translation: Google uses the role "model" instead of "assistant". GeminiLLM automatically maps jazzmine roles to the correct Google equivalents.
  • Native System Instructions: It supports Google's native systemInstruction field, which provides stronger adherence to core personality and safety constraints than simply placing instructions in the general message history.
  • Safety Filtering: Google implements aggressive safety filters. If a prompt or a response is flagged, the API returns a "blocked" status instead of text, which this provider handles as a specific error.

2. Purpose

  • Large Context Windows: Ideal for agents that need to process extremely long documents or extensive conversation histories (up to 2 million tokens).
  • Cost Efficiency: Gemini 1.5 Flash offers near-instant responses at a very low cost, making it perfect for high-frequency tasks like message enhancement.
  • Safety Compliance: Leverages Google's built-in safety infrastructure to ensure agent responses remain within defined ethical boundaries.

3. High-Level API Examples

Example: Basic Initialization

python
from jazzmine.core.llm import GeminiLLM

# Initialize for Gemini 1.5 Flash
llm = GeminiLLM(
    model="gemini-1.5-flash",
    api_key="AIza...", # Your Google AI Studio Key
    temperature=0.7,
    max_tokens=2048,
    timeout=30.0
)

# Standard async generation
response = await llm.agenerate(messages)
print(response.text)

4. Detailed Functionality

__init__(api_key, model, base_url, ...)

Functionality: Configures the endpoint and initializes the asynchronous HTTP clients.

Parameters:

  • api_key (str): Your Google AI Studio API key.
  • model (str): The model ID (e.g., "gemini-1.5-pro").
  • base_url (str): Defaults to https://generativelanguage.googleapis.com.

_prepare_payload(messages) [Internal]

Functionality: Translates a list of MessagePart objects into the Google contents and systemInstruction format.

How it works:

  • System Extraction: Filters all system role messages and packages them into the systemInstruction block.
  • Assistant Mapping: Converts the role "assistant" to "model".
  • Content Nesting: Wraps text in Google's required {"parts": [{"text": "..."}]} array structure.

_parse_response(data, start_time) [Internal]

Functionality: Processes the model's response candidates and handles safety rejections.

How it works:

  • Candidate Validation: Retrieves text from the first generated candidate.
  • Safety Logic: If no candidates are returned or if the finishReason is "SAFETY", it raises an LLMInvalidRequestError containing the feedback from Google's filters.

stream / astream

Functionality: Implements Google's Server-Sent Events (SSE) streaming using the :streamGenerateContent endpoint.

How it works: It parses incoming JSON chunks from the stream and yields the incremental text found within the deep path candidates[0].content.parts[0].text.


5. Error Handling

  • LLMInvalidRequestError: Raised when Google's safety filters block a response or if the prompt is deemed inappropriate.
  • LLMInternalError: Raised if Google returns a 500 or 503 status code, indicating service interruptions.
  • Blocked Feedback: When a request is blocked, the exception includes the promptFeedback dictionary, which helps developers identify which safety category (e.g., harassment, hate speech) triggered the block.

6. Remarks

  • API Version: This provider uses the v1beta API endpoint to ensure access to the latest features like System Instructions.
  • Google AI vs. Vertex AI: This class is designed specifically for Google AI Studio keys. Google Cloud Vertex AI uses a different authentication method (IAM) and is not compatible with this specific provider.
  • Token Usage: Since Gemini models often have unique tokenizers, if the API does not return explicit usage data, the provider falls back to the framework's character-based estimator.