1. Behavior and Context
In the jazzmine ecosystem, EpisodicMemory is the primary store for long-term context.
- Hybrid Architecture: Every episode is indexed by three distinct vectors: a short_summary (dense), a long_summary (dense), and a bm25 (sparse) vector.
- Dual-Backend Flexibility: Like the Procedural memory, it can be configured to use local ONNX models for privacy or remote APIs (OpenAI, Gemini, etc.) for performance.
- Fusion Retrieval: It implements Reciprocal Rank Fusion (RRF) to merge results from its three internal search streams, ensuring that the most relevant episodes surface regardless of whether they match semantically or via specific keywords.
- User/Agent Isolation: Every operation is strictly filtered by user_id and agent_id to prevent data leakage between different users or agents.
2. Purpose
- Experience Recall: Allowing agents to "remember" what was discussed with a specific user in previous sessions.
- Continuity: Providing context from past turns to help resolve current ambiguous queries.
- Behavioral Analytics: Storing metadata like flows_activated, tools_invoked, and user_sentiment to build a profile of user interaction over time.
- Context Compression: Storing distilled summaries instead of raw chat logs to keep the LLM's context window efficient.
3. High-Level API (Python)
EpisodicMemory is a Python-exposed class that requires a QdrantManager for database connectivity.
Example: Initialization and Storage
from memory import EpisodicMemory, QdrantManager
# 1. Setup Infrastructure
mgr = QdrantManager(url="http://localhost:6334", vector_size=384)
# 2. Initialize Episodic Memory (Local ONNX example)
episodic = EpisodicMemory(
qdrant_manager=mgr,
tokenizer_path="./models/tokenizer.json",
model_dir="./models/bge-small",
quantized=True
)
# 3. Store a conversation episode
await episodic.memorize(
short_summary="User asked about password resets.",
long_summary="The user was frustrated because they couldn't log in. I guided them through the recovery tool.",
start_index=0, # Message index in the raw store
end_index=15, # Message index end
user_id="user_123",
agent_id="support_bot",
conversation_id="conv_abc",
overlap=2, # Context overlap with previous episode
flows_activated=["reset_password_flow"],
tools_invoked=["send_email_tool"],
timestamp_begin=1712690000,
timestamp_end=1712691000,
average_user_sentiment=-0.5,
user_sentiment_variance=0.1
)
# Example: Hybrid Recall
# Search for relevant history across all conversations for this user
results = await episodic.recall(
user_id="user_123",
agent_id="support_bot",
conversation_id="conv_xyz", # Current conversation
query="Did we talk about security before?",
previous_conversation_limit=5,
same_conversation_limit=2
)
# Accessing results
for ep in results["previous_conversations"]:
print(f"Past Summary: {ep['short_summary_text']} (Score: {ep['score']})")4. Detailed Functionality
EpisodicMemory(...) [Constructor]
Initializes the memory module and the internal Embedder service.
Parameters:
| Parameter | Type | Description |
|---|---|---|
| qdrant_manager | Py<QdrantManager> | The manager providing the Qdrant client. |
| tokenizer_path | str | Path to the tokenizer.json for BM25. |
| model_dir | Optional[str] | Path to local ONNX model files (Local Backend). |
| quantized | bool | If True, uses INT8 models (Local Backend). |
| api_key | Optional[str] | API key for remote providers (Remote Backend). |
| provider | str | Cloud provider name (e.g., "openai", "cohere"). |
| hidden_size | int | Vector dimension (e.g., 384, 1536). |
memorize(...)
Functionality: Encodes an interaction segment and persists it to Qdrant.
- Concurrency Logic: It attempts to generate dense embeddings for both summaries concurrently. If the underlying model engine (ORT) encounters a shape‑mismatch error, it automatically falls back to sequential processing for maximum reliability.
- Vector Construction:
- short_summary: Dense vector.
- long_summary: Dense vector.
- bm25: Sparse vector generated from the concatenation of both summaries.
- Payload: Stores all provided metadata and the original summary text.
recall(...)
Functionality: Performs a three‑way hybrid search using RRF.
- Stage 1: Generates dense and sparse vectors for the query string.
- Stage 2: Executes three parallel searches in Qdrant (short_summary field, long_summary field, and bm25 field).
- Stage 3: Merges results based on their ranks in each list. RRF ensures that an episode appearing in multiple search streams ranks higher.
- Result Categorization: Returns a dictionary with two keys:
- same_conversation: Matches belonging to the current conversation_id.
- previous_conversations: Matches from other sessions for the same user.
5. Error Handling
- PyRuntimeError: Raised if Qdrant is unreachable, the collection is missing, or the embedding process fails (e.g., ONNX session panic).
- PyValueError: Raised during initialization if neither a model_dir nor an api_key is provided.
- Serialization Errors: Occur if non‑JSON‑serializable types are passed into list‑based payload fields.
6. Remarks
- RRF Tuning: The recall method accepts short_weight, long_weight, and bm25_weight. By default, BM25 has the highest weight (0.5), as keyword matches for specific entities or IDs are often more “truthful” in episodic history than semantic “vibes.”
- Identity: Point IDs are generated using Uuid::new_v4(). Every call to memorize creates a new entry, unlike SemanticMemory which is deterministic.
- RRF Constant: The rrf_k parameter (default 10) controls the “smoothness” of the rank fusion; lower values prioritize the absolute top‑ranked results across streams.