Skip to main content

On This Page

Build a Persistent AI Agent OS with Hierarchical Memory and FAISS Retrieval

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

How to Build an EverMem-Style Persistent AI Agent OS with Hierarchical Memory, FAISS Vector Retrieval, SQLite Storage, and Automated Memory Consolidation

Michal Sutter introduces a persistent AI agent architecture that combines short-term conversational context with long-term vector retrieval. The system utilizes FAISS for semantic search and SQLite for structured metadata, ensuring consistent behavior across multiple interaction turns.

Why This Matters

Standard LLM agents are often stateless or limited by context window constraints, leading to a loss of critical user preferences and facts over time. By implementing hierarchical memory with automated consolidation, developers can simulate a persistent memory OS that prioritizes high-value information without exceeding token limits. This architecture addresses the technical reality of context decay by using importance scoring and vector-based long-term memory (LTM) to ensure specific user signals like preferences and decisions are maintained. The provided model triggers consolidation at 1,400 tokens, demonstrating a scalable approach to managing agent memory in production environments.

Key Insights

  • FAISS Vector Retrieval: Employs sentence-transformers/all-MiniLM-L6-v2 to perform semantic searches across long-term memory stores.
  • SQLite Metadata Persistence: Stores structured records including timestamps, importance scores (0.0 to 1.0), and specific memory signals like preference or task.
  • Automated Consolidation: Triggers a summary of the top 18 high-importance memories once the system reaches a threshold of 1,400 tokens or every 8 turns.
  • Importance Scoring Algorithm: Calculates scores based on text length, role bonuses (user vs assistant), presence of digits, and explicit metadata pins.
  • Hierarchical Memory Structure: Maintains a rolling short-term context (STM) of up to 10 turns while retrieving top-K relevant long-term memories for every query.

Working Examples

Core initialization and importance scoring logic for the EverMem-style Agent OS.

class EverMemAgentOS:
    def __init__(self, workdir="/content/evermem_agent_os", db_name="evermem.sqlite", embedding_model="sentence-transformers/all-MiniLM-L6-v2", gen_model="google/flan-t5-small", stm_max_turns=10, ltm_topk=6):
        self.workdir = workdir
        self.embedder = SentenceTransformer(embedding_model)
        self.tokenizer = AutoTokenizer.from_pretrained(gen_model)
        self.model = AutoModelForSeq2SeqLM.from_pretrained(gen_model)
        self._init_db()
        self._init_faiss()

    def _importance_score(self, role, text, meta):
        base = 0.35
        length_bonus = min(0.45, math.log1p(len(text)) / 20.0)
        role_bonus = 0.08 if role == "user" else 0.03
        signal_bonus = 0.18 if meta.get("signal") in {"decision", "preference", "fact", "task"} else 0.0
        return float(min(1.0, base + length_bonus + role_bonus + signal_bonus))

    def add_memory(self, role, text, meta=None):
        mid = f"m:{_sha(f'{_now_ts()}::{role}::{text[:80]}')}"
        importance = self._importance_score(role, text, meta or {})
        # ... SQL insert and FAISS index update logic ...

Practical Applications

  • Personalized Assistant (EverMem-style): Uses pinned metadata and high importance scores (0.95+) to ensure user preferences, such as concise response styles, are never forgotten. Pitfall: Over-retrieval of irrelevant LTM can pollute the prompt context if top-K is set too high.
  • Task Management Agent: Periodically consolidates multiple session notes into a compact memory summary under 520 characters to preserve long-horizon goals. Pitfall: Using lightweight models like flan-t5-small for consolidation may lead to loss of technical nuances compared to larger LLMs.

References:

Continue reading

Next article

IP Geolocation Guide: Accuracy Metrics and Engineering Best Practices

Related Content