Conversation Summarizer | Jazzmine Core

1. Behavior and Context

In the jazzmine architecture, the Summarizer acts as a "Post-Processor."

Trigger Logic: It is designed to be called after every message ingestion. It evaluates the count of "unsummarized" messages (those with episode_id = 0) and triggers a run only when a specific threshold is crossed.
Non-Blocking Execution: To maintain low latency for the user, the summarization process runs as a "fire-and-forget" background task.
Episode Segmentation: It uses the is_continuation flag (generated by the MessageEnhancer) to find natural breaks in conversation. It also enforces a "hard cap" on episode size to prevent context window overflow.
Contextual Overlap: It implements a "sliding window" overlap, where the tail of one episode is included in the prompt for the next to provide the LLM with the necessary transition context.

2. Purpose

Context Compression: Transforming dozens of chat turns into 1-2 sentence "short summaries" for fast vector recall and 3-8 sentence "long summaries" for deep context.
Memory Hygiene: Organizing raw database records into logical chapters (Episodes).
Behavioral Synthesis: Aggregating episode-level metrics, such as average user sentiment, tools invoked, and flows activated, to help the agent understand high-level interaction patterns.
Cost Efficiency: Reducing the number of tokens the agent must "read" from history by providing distilled summaries instead of full transcripts.

3. High-Level API (Usage)

The ConversationSummarizer requires an LLM, an EpisodicMemory instance (from the Rust core), and a MessageStore.

Example: Initializing and Triggering

python

from jazzmine.core.conversation_summarizer import ConversationSummarizer

# 1. Setup the summarizer
summarizer = ConversationSummarizer(
    llm=my_llm,
    episodic_memory=my_episodic_rust_obj,
    message_store=my_store,
    trigger=10,             # Start summarizing after 10 new messages
    max_episode_size=20,    # Max length of one episode
    overlap=2               # Carry over 2 messages of context
)

# 2. Trigger the check (usually called in the Agent's chat loop)
# This returns immediately and runs the work in the background.
await summarizer.maybe_summarize(
    conversation_id="conv_123",
    user_id="user_888",
    agent_id="support_bot_01"
)

4. Detailed Functionality

Episode [Internal Dataclass]

A structured container used to calculate metrics for a message segment before it is summarized.

core_messages: The subset of messages that belong strictly to this episode (excluding overlap).
flows_activated / tools_invoked: Unique lists of skills used during the episode segment.
average_sentiment / sentiment_variance: Mathematical aggregates of the user's emotional state throughout the episode.

maybe_summarize(conversation_id, user_id, agent_id)

Functionality: Determines if a summarization run is needed and initiates it under a lock.

Process:

Queries the store for messages where episode_id == 0.
If the count is less than the trigger threshold, it exits.
Acquires an asyncio.Lock specific to that conversation_id to prevent concurrent summarization of the same turns.
Executes _safe_run to perform the LLM work.

run(conversation_id, user_id, agent_id)

Functionality: The core orchestration of the summarization logic.

How it works:

Fetches all unsummarized messages for the session.
Passes them to _segment to split the list into Episode objects.
Calculates the current episode counter for the session.
Iterates through new episodes, calling _process_episode for each.

_segment(messages) [Private]

Functionality: Divides a flat list of messages into logical clusters.

Rules:

Topic Shift: If is_continuation is False, a new episode starts (provided the current one has reached min_episode_size).
Hard Cap: If the current segment reaches max_episode_size, it is closed regardless of topic.
Overlap: When an episode is closed, the last n messages (defined by overlap) are cloned and used as the "header" for the next episode.

_process_episode(...) [Private]

Functionality: Generates the summary and updates the databases.

How it works:

Calls the LLM with the formatted transcript of the episode.
Requests a JSON response containing short_summary and long_summary.
Calls episodic_memory.memorize(...) to save the vectors and telemetry to Qdrant.
Updates every core message in the MessageStore with the new episode_id so they are never summarized again.

5. Error Handling

LLM Failures: If the LLM returns malformed JSON or times out, the summarizer logs a warning and uses the first 1000 characters of the raw transcript as a "fallback" summary. This ensures the episode is still indexed and searchable.
Concurrency: Using per-conversation locks in _locks ensures that if multiple messages arrive in milliseconds, only one summarization task is active at a time for that user.
Validation: The constructor raises ValueError if overlap is larger than the max_episode_size.

6. Remarks

Episodic Identity: While overlap messages are included in the LLM prompt for better summary quality, their episode_id in the database remains unchanged. Only "core" messages receive the new ID.
Resource Management: In high-concurrency environments, ensure the trigger value is high enough (e.g., 15-20) to avoid excessive LLM costs, but low enough to keep the agent's memory fresh.
Drain Requirement: Before stopping your application, ensure you await Agent.drain(), which waits for all active ConversationSummarizer background tasks to complete.