Memory
Core reference

SemanticMemory

SemanticMemory is a specialized, high-efficiency memory module designed for storing domain-specific terminology, technical abbreviations, and company jargon. Unlike the Episodic and Procedural modules, SemanticMemory is BM25-only (Sparse). It intentionally avoids dense embeddings to prioritize exact and near-exact keyword matching, ensuring that unique terms like "JWT", "KYC", or "k8s" are retrieved with 100% precision without the "semantic noise" often introduced by dense vector models.

1. Behavior and Context

In the jazzmine architecture, SemanticMemory acts as the agent's "Living Glossary."

  • Keyword Saliency: It uses BM25 with IDF (Inverse Document Frequency) weighting. This naturally handles term rarity: uncommon internal project names score highly when mentioned, while common stop-words are downweighted.
  • Deterministic Identity: Entry IDs are computed using a UUID5 hash of the agent_id and the lowercased key. This makes the memory inherently idempotent: upserting the same key twice simply overwrites the previous definition, and deleting a term requires no prior search.
  • Lightweight Footprint: Since it only requires a tokenizer and no heavy ONNX models or API calls for embedding, it is the fastest and most resource-light memory component in the framework.

2. Purpose

  • Jargon Resolution: Providing the agent with exact definitions for industry or company-specific terms that may not exist in a general-purpose LLM's training data.
  • Alias Mapping: Linking multiple surface forms (e.g., "JSON Web Token" and "jwt") to a single canonical definition.
  • Deterministic Knowledge Management: Allowing for reliable "forgetting" or updating of specific terminology by key name.
  • Contextual Grounding: Ensuring the agent uses the correct internal terminology when generating responses.

3. High-Level API (Python)

SemanticMemory is exposed to Python and requires a QdrantManager and a path to a standard tokenizer file.

Example: Defining and Retrieving Jargon

python
from memory import SemanticMemory, QdrantManager

# 1. Setup Infrastructure
mgr = QdrantManager(url="http://localhost:6334", vector_size=384)

# 2. Initialize Semantic Memory (No model_dir needed!)
semantic = SemanticMemory(
    qdrant_manager=mgr,
    tokenizer_path="./models/tokenizer.json"
)

# 3. Memorize a term with aliases
await semantic.memorize(
    key="PII",
    value="Personally Identifiable Information",
    category="abbreviation",
    agent_id="compliance_bot",
    aliases=["private data", "sensitive info"],
    description="Any data that could be used to identify a specific individual."
)

# 4. Recall based on a query
results = await semantic.recall(
    query="How do we handle private data?",
    agent_id="compliance_bot",
    top_k=3,
    score_threshold=1.5
)

for term in results:
    print(f"Found: {term['key']} -> {term['value']} (BM25 Score: {term['score']})")

4. Detailed Functionality

SemanticMemory(qdrant_manager, tokenizer_path) [Constructor]

Initializes the memory module. It loads the tokenizer from the filesystem.

  • Tokenizer Requirement: This should be the same tokenizer used by your other memory modules to ensure token consistency, though SemanticMemory uses it strictly for term frequency counting.

memorize(...)

Functionality: Stores or overwrites a term definition.

  • Indexing Text: It joins the key, value, every alias, and the description into a single block of text for tokenization. This maximizes recall; searching for a word in the description will still surface the key.
  • Deterministic ID: Generates a UUID5 from "{agent_id}::{key.to_lowercase()}".
  • Wait for Consistency: Uses wait=true during the Qdrant upsert to ensure the term is searchable immediately after the call returns.

recall(query, agent_id, top_k=5, score_threshold=None)

Functionality: Retrieves the most relevant term definitions.

  • Scoring: Results are ranked by BM25 score.
  • Filtering: Strictly scoped to the provided agent_id.
  • Thresholding: The score_threshold is highly recommended for SemanticMemory. A score of 1.0 to 3.0 usually filters out irrelevant “near‑miss” keyword matches.

forget(key, agent_id) / forget_many(keys, agent_id)

Functionality: Permanently deletes definitions.

  • Efficiency: Because IDs are deterministic, these methods do not perform a search. They compute the target UUID(s) and send a direct delete command to Qdrant.

list_all(agent_id, category=None)

Functionality: Scrolls through all definitions for an agent.

  • Sorting: Returns a list of dictionaries sorted alphabetically by key.
  • Filtering: Can optionally return only terms within a specific category (e.g., "technical").

entry_count(agent_id)

Functionality: Returns the exact number of terms stored for a specific agent.


5. Error Handling

  • PyRuntimeError (Tokenization): Raised if the provided string is incompatible with the tokenizer (rare) or if the tokenizer file was corrupted.
  • PyRuntimeError (Qdrant): Raised if the batch delete or scroll operations fail due to network or database‑side timeouts.
  • Case Insensitivity: The system automatically lowercases the key for ID generation, but preserves the original casing in the payload for display.

6. Remarks

  • Why No Dense Vectors? SemanticMemory handles “Entity‑like” data. If you use dense vectors for "JWT", a search for "Session Token" might retrieve it. While conceptually similar, in a technical context, they are different things. BM25 ensures that if the user didn’t mention the term or its specific aliases, the agent doesn’t guess incorrectly.
  • Alias Power: Use aliases extensively. Adding "k8s" as an alias for "Kubernetes" allows SemanticMemory to act as a bridge between user shorthand and formal documentation.
  • Memory Usage: This is the most memory‑efficient module. Thousands of terms can be stored and searched with negligible RAM usage on the agent side, as only the tokenizer is resident in memory.