1. Behavior and Context
Google's API has a distinct architecture that GeminiLLM abstracts for the framework:
- Authentication via URL: Unlike other providers that use Bearer tokens in headers, Gemini requires the API key as a query parameter in the request URL (?key=...).
- Role Translation: Google uses the role "model" instead of "assistant". GeminiLLM automatically maps jazzmine roles to the correct Google equivalents.
- Native System Instructions: It supports Google's native systemInstruction field, which provides stronger adherence to core personality and safety constraints than simply placing instructions in the general message history.
- Safety Filtering: Google implements aggressive safety filters. If a prompt or a response is flagged, the API returns a "blocked" status instead of text, which this provider handles as a specific error.
2. Purpose
- Large Context Windows: Ideal for agents that need to process extremely long documents or extensive conversation histories (up to 2 million tokens).
- Cost Efficiency: Gemini 1.5 Flash offers near-instant responses at a very low cost, making it perfect for high-frequency tasks like message enhancement.
- Safety Compliance: Leverages Google's built-in safety infrastructure to ensure agent responses remain within defined ethical boundaries.
3. High-Level API Examples
Example: Basic Initialization
from jazzmine.core.llm import GeminiLLM
# Initialize for Gemini 1.5 Flash
llm = GeminiLLM(
model="gemini-1.5-flash",
api_key="AIza...", # Your Google AI Studio Key
temperature=0.7,
max_tokens=2048,
timeout=30.0
)
# Standard async generation
response = await llm.agenerate(messages)
print(response.text)4. Detailed Functionality
__init__(api_key, model, base_url, ...)
Functionality: Configures the endpoint and initializes the asynchronous HTTP clients.
Parameters:
- api_key (str): Your Google AI Studio API key.
- model (str): The model ID (e.g., "gemini-1.5-pro").
- base_url (str): Defaults to https://generativelanguage.googleapis.com.
_prepare_payload(messages) [Internal]
Functionality: Translates a list of MessagePart objects into the Google contents and systemInstruction format.
How it works:
- System Extraction: Filters all system role messages and packages them into the systemInstruction block.
- Assistant Mapping: Converts the role "assistant" to "model".
- Content Nesting: Wraps text in Google's required {"parts": [{"text": "..."}]} array structure.
_parse_response(data, start_time) [Internal]
Functionality: Processes the model's response candidates and handles safety rejections.
How it works:
- Candidate Validation: Retrieves text from the first generated candidate.
- Safety Logic: If no candidates are returned or if the finishReason is "SAFETY", it raises an LLMInvalidRequestError containing the feedback from Google's filters.
stream / astream
Functionality: Implements Google's Server-Sent Events (SSE) streaming using the :streamGenerateContent endpoint.
How it works: It parses incoming JSON chunks from the stream and yields the incremental text found within the deep path candidates[0].content.parts[0].text.
5. Error Handling
- LLMInvalidRequestError: Raised when Google's safety filters block a response or if the prompt is deemed inappropriate.
- LLMInternalError: Raised if Google returns a 500 or 503 status code, indicating service interruptions.
- Blocked Feedback: When a request is blocked, the exception includes the promptFeedback dictionary, which helps developers identify which safety category (e.g., harassment, hate speech) triggered the block.
6. Remarks
- API Version: This provider uses the v1beta API endpoint to ensure access to the latest features like System Instructions.
- Google AI vs. Vertex AI: This class is designed specifically for Google AI Studio keys. Google Cloud Vertex AI uses a different authentication method (IAM) and is not compatible with this specific provider.
- Token Usage: Since Gemini models often have unique tokenizers, if the API does not return explicit usage data, the provider falls back to the framework's character-based estimator.