LLM Providers: GeminiLLM

1. Behavior and Context

Google's API has a distinct architecture that GeminiLLM abstracts for the framework:

Authentication via URL: Unlike other providers that use Bearer tokens in headers, Gemini requires the API key as a query parameter in the request URL (?key=...).
Role Translation: Google uses the role "model" instead of "assistant". GeminiLLM automatically maps jazzmine roles to the correct Google equivalents.
Native System Instructions: It supports Google's native systemInstruction field, which provides stronger adherence to core personality and safety constraints than simply placing instructions in the general message history.
Safety Filtering: Google implements aggressive safety filters. If a prompt or a response is flagged, the API returns a "blocked" status instead of text, which this provider handles as a specific error.

2. Purpose

Large Context Windows: Ideal for agents that need to process extremely long documents or extensive conversation histories (up to 2 million tokens).
Cost Efficiency: Gemini 1.5 Flash offers near-instant responses at a very low cost, making it perfect for high-frequency tasks like message enhancement.
Safety Compliance: Leverages Google's built-in safety infrastructure to ensure agent responses remain within defined ethical boundaries.

3. High-Level API Examples

Example: Basic Initialization

python

from jazzmine.core.llm import GeminiLLM

# Initialize for Gemini 1.5 Flash
llm = GeminiLLM(
    model="gemini-1.5-flash",
    api_key="AIza...", # Your Google AI Studio Key
    temperature=0.7,
    max_tokens=2048,
    timeout=30.0
)

# Standard async generation
response = await llm.agenerate(messages)
print(response.text)

4. Detailed Functionality

init(api_key, model, base_url, ...)

Functionality: Configures the endpoint and initializes the asynchronous HTTP clients.

Parameters:

api_key (str): Your Google AI Studio API key.
model (str): The model ID (e.g., "gemini-1.5-pro").
base_url (str): Defaults to https://generativelanguage.googleapis.com.

_prepare_payload(messages) [Internal]

Functionality: Translates a list of MessagePart objects into the Google contents and systemInstruction format.

How it works:

System Extraction: Filters all system role messages and packages them into the systemInstruction block.
Assistant Mapping: Converts the role "assistant" to "model".
Content Nesting: Wraps text in Google's required {"parts": [{"text": "..."}]} array structure.

_parse_response(data, start_time) [Internal]

Functionality: Processes the model's response candidates and handles safety rejections.

How it works:

Candidate Validation: Retrieves text from the first generated candidate.
Safety Logic: If no candidates are returned or if the finishReason is "SAFETY", it raises an LLMInvalidRequestError containing the feedback from Google's filters.

stream / astream

Functionality: Implements Google's Server-Sent Events (SSE) streaming using the :streamGenerateContent endpoint.

How it works: It parses incoming JSON chunks from the stream and yields the incremental text found within the deep path candidates[0].content.parts[0].text.

5. Error Handling

LLMInvalidRequestError: Raised when Google's safety filters block a response or if the prompt is deemed inappropriate.
LLMInternalError: Raised if Google returns a 500 or 503 status code, indicating service interruptions.
Blocked Feedback: When a request is blocked, the exception includes the promptFeedback dictionary, which helps developers identify which safety category (e.g., harassment, hate speech) triggered the block.

6. Remarks

API Version: This provider uses the v1beta API endpoint to ensure access to the latest features like System Instructions.
Google AI vs. Vertex AI: This class is designed specifically for Google AI Studio keys. Google Cloud Vertex AI uses a different authentication method (IAM) and is not compatible with this specific provider.
Token Usage: Since Gemini models often have unique tokenizers, if the API does not return explicit usage data, the provider falls back to the framework's character-based estimator.