Skip to main content

On This Page

Scaling LLM Knowledge Bases: Why RAG is Necessary After 100 Articles

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Karpathy’s Obsidian Wiki Broke at 100 Articles - RAG Fixed It

Zafer Dace implemented Andrej Karpathy’s LLM wiki workflow but found it failed once the vault reached approximately 80 to 100 articles. At this scale, the total token count hit 200-400K, causing the LLM to provide ‘confident but wrong’ answers by blending disparate notes.

Why This Matters

Technical workflows often fail at retrieval rather than generation. While models like Claude or Gemini offer 200K+ context windows, ‘skimming’ occurs when large volumes of notes, raw documents, and indices are stuffed into a single prompt. This leads to information blending—where the model hallucinates connections between unrelated concepts—making RAG essential for maintaining accuracy as a knowledge base grows beyond 50-80 entries. The ‘intelligence’ of a knowledge base depends on what is placed in front of the model, not just the model’s inherent reasoning capabilities.

Key Insights

  • Context Window Saturation: At 100 Obsidian articles (avg 500 tokens each), total context including raw docs and indices can reach 400K tokens, exceeding Claude’s 200K window.
  • Semantic Search Efficiency: Implementing RAG reduces token load from ~50,000 to ~2,500 for a 100-article wiki, representing a 20-40x reduction in compute overhead.
  • Chunk-Level Indexing: Splitting markdown into sections based on H1-H3 headers using ChromaDB prevents the model from blending unrelated articles like ReAct and Chain of Thought.
  • Automated Synchronization: Using PostToolUse hooks in Claude Code allows for automatic re-indexing of files upon saving, ensuring the local vector database remains current.
  • Discovery vs. Retrieval: Obsidian’s graph view is optimized for discovery and finding unknown connections, whereas RAG is required for high-precision retrieval of known information.

Working Examples

Basic semantic query implementation using ChromaDB for local retrieval.

import chromadb
client = chromadb.PersistentClient(path="chroma_db")
collection = client.get_collection("wiki")
results = collection.query(query_texts=["how does the ReAct pattern work"], n_results=5)

Claude Code hook for automatic re-indexing on file edits.

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Write|Edit",
      "command": "python3 /path/to/reindex_file.py /path/to/vault \"$FILE_PATH\"",
      "timeout": 5000
    }]
  }
}

Practical Applications

  • Engineering wikis with 100+ articles: Use ChromaDB to retrieve the top 5 relevant chunks for precise technical answers instead of loading full vaults.
  • Pitfall: Loading the entire wiki into the context window leads to ‘degrading’ or ‘unreliable’ answers once the article count exceeds 100 entries.
  • Automated Knowledge Management: Implementing Python-based re-indexing hooks to keep local LLM memory updated without manual metadata intervention.
  • Pitfall: Relying solely on graph views for information retrieval; while good for discovery, they lack the semantic precision of RAG for specific technical queries.

References:

Continue reading

Next article

Mastering Kubernetes Architectures Through the Computer Lab Analogy

Related Content