Skip to main content

On This Page

Implementing RAG: Solving LLM Hallucinations with Retrieval Augmented Generation

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

The Complete RAG Pipeline

Retrieval Augmented Generation (RAG) provides LLMs access to external documents to prevent factual fabrication. It allows systems to cite exact source passages rather than relying solely on static training data.

Why This Matters

Standard LLMs generate text based on training data, leading to confident but incorrect ‘hallucinations’ when internal or recent company policies are queried. While fine-tuning updates model weights for style and behavior, it is expensive and cannot easily cite sources; RAG solves this by treating the model as a reasoning engine over a dynamic, instantly updatable knowledge base.

Key Insights

  • RAG vs Fine-Tuning: Use fine-tuning for behavior/style changes and RAG for factual knowledge and frequently changing data.
  • Chunking Strategy: Paragraph-aware chunking generally preserves semantic units better than fixed-size splitting, with recommended sizes of 300-600 characters.
  • Vector Indexing: The process involves splitting documents into chunks, converting them into embeddings (e.g., using all-MiniLM-L6-v2), and storing them in a vector database like ChromaDB.
  • Evaluation Frameworks: Production RAG quality is measured via RAGAS, which automatically evaluates faithfulness, answer relevancy, and context precision.

Working Examples

Sentence-aware chunking implementation to preserve semantic boundaries.

import re
from typing import List

def chunk_by_sentences(text: str, max_chunk_size: int = 500) -> List[str]:
    sentences = re.split(r'(?<=[.!?])\s+', text.strip())
    chunks = []
    current = ""
    for sentence in sentences:
        if len(current) + len(sentence) <= max_chunk_size:
            current += " " + sentence if current else sentence
        else:
            if current:
                chunks.append(current.strip())
            current = sentence
    if current:
        chunks.append(current.strip())
    return chunks

Implementing a full RAG pipeline using LangChain abstractions.

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain_community.llms import HuggingFacePipeline
from transformers import pipeline as hf_pipeline
docs = [Document(page_content=content, metadata={'source': name}) for name, content in knowledge_base.items()]
splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50, separators=['\n\n', '\n', '. ', ' ', ''])
chunks = splitter.split_documents(docs)
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={'k': 3})
gen_pipe = hf_pipeline('text2text-generation', model='google/flan-t5-base', max_new_tokens=200)
llm = HuggingFacePipeline(pipeline=gen_pipe)
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, chain_type='stuff', return_source_documents=True)

Practical Applications

References:

Continue reading

Next article

Grounding LLMs in Maritime Data: Using MCP for Port Intelligence

Related Content