Memory
Core reference

EpisodicMemory

EpisodicMemory serves as the "autobiographical" record of a jazzmine agent. It is designed to store and retrieve high-fidelity summaries of past conversation segments, known as "episodes." By utilizing a sophisticated hybrid search strategy—combining multiple dense embeddings with sparse keyword vectors—it allows an agent to recall conceptually similar past experiences and specific factual details from previous interactions.

1. Behavior and Context

In the jazzmine ecosystem, EpisodicMemory is the primary store for long-term context.

  • Hybrid Architecture: Every episode is indexed by three distinct vectors: a short_summary (dense), a long_summary (dense), and a bm25 (sparse) vector.
  • Dual-Backend Flexibility: Like the Procedural memory, it can be configured to use local ONNX models for privacy or remote APIs (OpenAI, Gemini, etc.) for performance.
  • Fusion Retrieval: It implements Reciprocal Rank Fusion (RRF) to merge results from its three internal search streams, ensuring that the most relevant episodes surface regardless of whether they match semantically or via specific keywords.
  • User/Agent Isolation: Every operation is strictly filtered by user_id and agent_id to prevent data leakage between different users or agents.

2. Purpose

  • Experience Recall: Allowing agents to "remember" what was discussed with a specific user in previous sessions.
  • Continuity: Providing context from past turns to help resolve current ambiguous queries.
  • Behavioral Analytics: Storing metadata like flows_activated, tools_invoked, and user_sentiment to build a profile of user interaction over time.
  • Context Compression: Storing distilled summaries instead of raw chat logs to keep the LLM's context window efficient.

3. High-Level API (Python)

EpisodicMemory is a Python-exposed class that requires a QdrantManager for database connectivity.

Example: Initialization and Storage

python
from memory import EpisodicMemory, QdrantManager

# 1. Setup Infrastructure
mgr = QdrantManager(url="http://localhost:6334", vector_size=384)

# 2. Initialize Episodic Memory (Local ONNX example)
episodic = EpisodicMemory(
    qdrant_manager=mgr,
    tokenizer_path="./models/tokenizer.json",
    model_dir="./models/bge-small",
    quantized=True
)

# 3. Store a conversation episode
await episodic.memorize(
    short_summary="User asked about password resets.",
    long_summary="The user was frustrated because they couldn't log in. I guided them through the recovery tool.",
    start_index=0,       # Message index in the raw store
    end_index=15,        # Message index end
    user_id="user_123",
    agent_id="support_bot",
    conversation_id="conv_abc",
    overlap=2,           # Context overlap with previous episode
    flows_activated=["reset_password_flow"],
    tools_invoked=["send_email_tool"],
    timestamp_begin=1712690000,
    timestamp_end=1712691000,
    average_user_sentiment=-0.5,
    user_sentiment_variance=0.1
)

# Example: Hybrid Recall

# Search for relevant history across all conversations for this user
results = await episodic.recall(
    user_id="user_123",
    agent_id="support_bot",
    conversation_id="conv_xyz", # Current conversation
    query="Did we talk about security before?",
    previous_conversation_limit=5,
    same_conversation_limit=2
)

# Accessing results
for ep in results["previous_conversations"]:
    print(f"Past Summary: {ep['short_summary_text']} (Score: {ep['score']})")

4. Detailed Functionality

EpisodicMemory(...) [Constructor]

Initializes the memory module and the internal Embedder service.

Parameters:

ParameterTypeDescription
qdrant_managerPy<QdrantManager>The manager providing the Qdrant client.
tokenizer_pathstrPath to the tokenizer.json for BM25.
model_dirOptional[str]Path to local ONNX model files (Local Backend).
quantizedboolIf True, uses INT8 models (Local Backend).
api_keyOptional[str]API key for remote providers (Remote Backend).
providerstrCloud provider name (e.g., "openai", "cohere").
hidden_sizeintVector dimension (e.g., 384, 1536).

memorize(...)

Functionality: Encodes an interaction segment and persists it to Qdrant.

  • Concurrency Logic: It attempts to generate dense embeddings for both summaries concurrently. If the underlying model engine (ORT) encounters a shape‑mismatch error, it automatically falls back to sequential processing for maximum reliability.
  • Vector Construction:
  • short_summary: Dense vector.
  • long_summary: Dense vector.
  • bm25: Sparse vector generated from the concatenation of both summaries.
  • Payload: Stores all provided metadata and the original summary text.

recall(...)

Functionality: Performs a three‑way hybrid search using RRF.

  • Stage 1: Generates dense and sparse vectors for the query string.
  • Stage 2: Executes three parallel searches in Qdrant (short_summary field, long_summary field, and bm25 field).
  • Stage 3: Merges results based on their ranks in each list. RRF ensures that an episode appearing in multiple search streams ranks higher.
  • Result Categorization: Returns a dictionary with two keys:
  • same_conversation: Matches belonging to the current conversation_id.
  • previous_conversations: Matches from other sessions for the same user.

5. Error Handling

  • PyRuntimeError: Raised if Qdrant is unreachable, the collection is missing, or the embedding process fails (e.g., ONNX session panic).
  • PyValueError: Raised during initialization if neither a model_dir nor an api_key is provided.
  • Serialization Errors: Occur if non‑JSON‑serializable types are passed into list‑based payload fields.

6. Remarks

  • RRF Tuning: The recall method accepts short_weight, long_weight, and bm25_weight. By default, BM25 has the highest weight (0.5), as keyword matches for specific entities or IDs are often more “truthful” in episodic history than semantic “vibes.”
  • Identity: Point IDs are generated using Uuid::new_v4(). Every call to memorize creates a new entry, unlike SemanticMemory which is deterministic.
  • RRF Constant: The rrf_k parameter (default 10) controls the “smoothness” of the rank fusion; lower values prioritize the absolute top‑ranked results across streams.