1. Behavior and Context
In the jazzmine architecture, the Summarizer acts as a "Post-Processor."
- Trigger Logic: It is designed to be called after every message ingestion. It evaluates the count of "unsummarized" messages (those with episode_id = 0) and triggers a run only when a specific threshold is crossed.
- Non-Blocking Execution: To maintain low latency for the user, the summarization process runs as a "fire-and-forget" background task.
- Episode Segmentation: It uses the is_continuation flag (generated by the MessageEnhancer) to find natural breaks in conversation. It also enforces a "hard cap" on episode size to prevent context window overflow.
- Contextual Overlap: It implements a "sliding window" overlap, where the tail of one episode is included in the prompt for the next to provide the LLM with the necessary transition context.
2. Purpose
- Context Compression: Transforming dozens of chat turns into 1-2 sentence "short summaries" for fast vector recall and 3-8 sentence "long summaries" for deep context.
- Memory Hygiene: Organizing raw database records into logical chapters (Episodes).
- Behavioral Synthesis: Aggregating episode-level metrics, such as average user sentiment, tools invoked, and flows activated, to help the agent understand high-level interaction patterns.
- Cost Efficiency: Reducing the number of tokens the agent must "read" from history by providing distilled summaries instead of full transcripts.
3. High-Level API (Usage)
The ConversationSummarizer requires an LLM, an EpisodicMemory instance (from the Rust core), and a MessageStore.
Example: Initializing and Triggering
from jazzmine.core.conversation_summarizer import ConversationSummarizer
# 1. Setup the summarizer
summarizer = ConversationSummarizer(
llm=my_llm,
episodic_memory=my_episodic_rust_obj,
message_store=my_store,
trigger=10, # Start summarizing after 10 new messages
max_episode_size=20, # Max length of one episode
overlap=2 # Carry over 2 messages of context
)
# 2. Trigger the check (usually called in the Agent's chat loop)
# This returns immediately and runs the work in the background.
await summarizer.maybe_summarize(
conversation_id="conv_123",
user_id="user_888",
agent_id="support_bot_01"
)4. Detailed Functionality
Episode [Internal Dataclass]
A structured container used to calculate metrics for a message segment before it is summarized.
- core_messages: The subset of messages that belong strictly to this episode (excluding overlap).
- flows_activated / tools_invoked: Unique lists of skills used during the episode segment.
- average_sentiment / sentiment_variance: Mathematical aggregates of the user's emotional state throughout the episode.
maybe_summarize(conversation_id, user_id, agent_id)
Functionality: Determines if a summarization run is needed and initiates it under a lock.
Process:
- Queries the store for messages where episode_id == 0.
- If the count is less than the trigger threshold, it exits.
- Acquires an asyncio.Lock specific to that conversation_id to prevent concurrent summarization of the same turns.
- Executes _safe_run to perform the LLM work.
run(conversation_id, user_id, agent_id)
Functionality: The core orchestration of the summarization logic.
How it works:
- Fetches all unsummarized messages for the session.
- Passes them to _segment to split the list into Episode objects.
- Calculates the current episode counter for the session.
- Iterates through new episodes, calling _process_episode for each.
_segment(messages) [Private]
Functionality: Divides a flat list of messages into logical clusters.
Rules:
- Topic Shift: If is_continuation is False, a new episode starts (provided the current one has reached min_episode_size).
- Hard Cap: If the current segment reaches max_episode_size, it is closed regardless of topic.
- Overlap: When an episode is closed, the last n messages (defined by overlap) are cloned and used as the "header" for the next episode.
_process_episode(...) [Private]
Functionality: Generates the summary and updates the databases.
How it works:
- Calls the LLM with the formatted transcript of the episode.
- Requests a JSON response containing short_summary and long_summary.
- Calls episodic_memory.memorize(...) to save the vectors and telemetry to Qdrant.
- Updates every core message in the MessageStore with the new episode_id so they are never summarized again.
5. Error Handling
- LLM Failures: If the LLM returns malformed JSON or times out, the summarizer logs a warning and uses the first 1000 characters of the raw transcript as a "fallback" summary. This ensures the episode is still indexed and searchable.
- Concurrency: Using per-conversation locks in _locks ensures that if multiple messages arrive in milliseconds, only one summarization task is active at a time for that user.
- Validation: The constructor raises ValueError if overlap is larger than the max_episode_size.
6. Remarks
- Episodic Identity: While overlap messages are included in the LLM prompt for better summary quality, their episode_id in the database remains unchanged. Only "core" messages receive the new ID.
- Resource Management: In high-concurrency environments, ensure the trigger value is high enough (e.g., 15-20) to avoid excessive LLM costs, but low enough to keep the agent's memory fresh.
- Drain Requirement: Before stopping your application, ensure you await Agent.drain(), which waits for all active ConversationSummarizer background tasks to complete.