AI Agents from Scratch Part 4: Human-in-the-Loop Validation (Research Report Generator)

Previously in This Series

We’ve built tools and memory. Our agent can search the web, extract content, and remember what it’s done.

But here’s the scary part: it runs autonomously.

What if it searches for the wrong things? What if it extracts incorrect facts? What if the final report misrepresents the sources? Today, we add the guardrails.

The Series:

Understanding the ReAct Pattern
Building the Tool System
State Management & Memory Architecture
Human-in-the-Loop Validation (You are here)
The Agent Core & Loop
Complete Agent & Best Practices

Why Autonomous Agents Are Dangerous

Without oversight, agents can:

Go down rabbit holes — Wasting time and tokens on irrelevant tangents
Compound errors — A wrong assumption early ruins everything downstream
Lose user trust — If users can’t see or influence the process, they won’t rely on it

The solution: Human-in-the-Loop (HITL) checkpoints.

Checkpoint Comparison

Without human checkpoints, autonomous agents risk going off-track with no opportunity for course correction. When a user requests research on AI in healthcare, an unchecked agent might misinterpret the request and spend valuable time researching the wrong topic, delivering a lengthy report on an unrelated subject like AI in gaming. This wastes computational resources and frustrates users who have no visibility or control over the process.

With checkpoints strategically placed throughout the workflow, users maintain oversight at critical decision points. When the agent proposes its research plan, the user can immediately catch misunderstandings and provide corrections. This human-in-the-loop validation ensures the agent stays aligned with user intent, prevents costly mistakes from compounding, and builds trust through transparency and control.

Where to Insert Checkpoints

Too few checkpoints = loss of control.
Too many checkpoints = tedious interruptions.

The sweet spots:

Checkpoint	Why It Matters
After planning	Before expensive operations, validate the approach
Source selection	Which articles are worth reading in full?
Fact verification	Remove incorrect or biased information
Outline approval	Ensure the structure matches expectations
Draft review	Final chance to request changes

Building the Checkpoint System

We’ll use the rich library for beautiful terminal output:

# human_loop.py
from rich.console import Console
from rich.panel import Panel
from rich.prompt import Prompt, Confirm
from rich.table import Table
from rich.markdown import Markdown
from typing import Optional

console = Console()

class HumanCheckpoint:
    """
    Handles human-in-the-loop interactions.
    Every major decision goes through here.
    """

Checkpoint 1: Plan Approval

After the agent creates a research plan, users review it:

@staticmethod
def approve_plan(questions: list[str], queries: list[str]) -> tuple[list[str], list[str], bool]:
    """
    Show research plan and get user approval/modification.
    Returns (modified_questions, modified_queries, approved).
    """
    console.print(Panel.fit(
        "[bold cyan]🔬 RESEARCH PLAN[/bold cyan]\n\n"
        "The agent has created a research plan. Please review:",
        title="Checkpoint 1: Planning"
    ))

    # Display questions
    console.print("\n[bold]Research Questions:[/bold]")
    for i, q in enumerate(questions, 1):
        console.print(f"  {i}. {q}")

    # Display queries
    console.print("\n[bold]Search Queries:[/bold]")
    for i, q in enumerate(queries, 1):
        console.print(f"  {i}. {q}")

    console.print()

    # User decision
    choice = Prompt.ask(
        "What would you like to do?",
        choices=["approve", "modify", "add", "reject"],
        default="approve"
    )

    if choice == "approve":
        return questions, queries, True

    elif choice == "modify":
        console.print("\n[dim]Enter new questions (one per line, empty line to finish):[/dim]")
        new_questions = []
        while True:
            q = Prompt.ask("Question", default="")
            if not q:
                break
            new_questions.append(q)

        console.print("\n[dim]Enter new search queries (one per line, empty line to finish):[/dim]")
        new_queries = []
        while True:
            q = Prompt.ask("Query", default="")
            if not q:
                break
            new_queries.append(q)

        return (
            new_questions if new_questions else questions,
            new_queries if new_queries else queries,
            True
        )

    elif choice == "add":
        # Add to existing lists
        console.print("\n[dim]Add additional questions:[/dim]")
        while True:
            q = Prompt.ask("Question (empty to stop)", default="")
            if not q:
                break
            questions.append(q)

        console.print("\n[dim]Add additional queries:[/dim]")
        while True:
            q = Prompt.ask("Query (empty to stop)", default="")
            if not q:
                break
            queries.append(q)

        return questions, queries, True

    else:  # reject
        return [], [], False

Users can:

✅ Approve — Proceed with the plan
✏️ Modify — Replace questions/queries entirely
➕ Add — Append to existing lists
❌ Reject — Abort and restart

Checkpoint 2: Source Selection

Not all search results are worth reading. Let users choose:

@staticmethod
def select_sources(sources: list[dict]) -> list[dict]:
    """Show search results and let user select which to fetch."""
    console.print(Panel.fit(
        "[bold cyan]📚 SOURCE SELECTION[/bold cyan]\n\n"
        "These sources were found. Select which ones to read in detail:",
        title="Checkpoint 2: Source Selection"
    ))

    # Build a nice table
    table = Table(show_header=True, header_style="bold")
    table.add_column("#", width=4)
    table.add_column("Title", width=40)
    table.add_column("URL", width=30)
    table.add_column("Snippet", width=50)

    for i, source in enumerate(sources, 1):
        table.add_row(
            str(i),
            source.get("title", "")[:40],
            source.get("url", "")[:30],
            source.get("snippet", "")[:50]
        )

    console.print(table)

    selection = Prompt.ask(
        "\nEnter source numbers to fetch (comma-separated, or 'all')",
        default="all"
    )

    if selection.lower() == "all":
        return sources

    try:
        indices = [int(x.strip()) - 1 for x in selection.split(",")]
        return [sources[i] for i in indices if 0 <= i < len(sources)]
    except (ValueError, IndexError):
        console.print("[yellow]Invalid selection, using all sources[/yellow]")
        return sources

This is huge for efficiency. Instead of fetching 20 pages, users might select 5 that look most relevant.

Checkpoint 3: Fact Verification

The agent extracts facts from sources. But LLMs hallucinate. Users verify:

@staticmethod
def review_facts(facts: list[dict]) -> list[dict]:
    """Show extracted facts for review. User can remove incorrect ones."""
    console.print(Panel.fit(
        "[bold cyan]✓ FACT REVIEW[/bold cyan]\n\n"
        "Review the extracted facts. Remove any that are incorrect:",
        title="Checkpoint 3: Fact Verification"
    ))

    for i, fact in enumerate(facts, 1):
        console.print(f"\n[bold]Fact {i}:[/bold]")
        console.print(f"  {fact['fact']}")
        console.print(f"  [dim]Source: {fact['source_url']}[/dim]")

    console.print()
    remove = Prompt.ask(
        "Enter fact numbers to REMOVE (comma-separated, or 'none')",
        default="none"
    )

    if remove.lower() == "none":
        return facts

    try:
        remove_indices = set(int(x.strip()) - 1 for x in remove.split(","))
        return [f for i, f in enumerate(facts) if i not in remove_indices]
    except ValueError:
        return facts

This is critical for trust. Users see exactly what goes into the report and can remove anything suspicious.

Checkpoint 4: Outline Approval

Before writing, confirm the structure:

@staticmethod
def approve_outline(outline: list[str]) -> tuple[list[str], bool]:
    """Show report outline for approval."""
    console.print(Panel.fit(
        "[bold cyan]📝 REPORT OUTLINE[/bold cyan]\n\n"
        "Review the proposed structure:",
        title="Checkpoint 4: Outline Review"
    ))

    for i, section in enumerate(outline, 1):
        console.print(f"  {i}. {section}")

    choice = Prompt.ask(
        "\nApprove this outline?",
        choices=["yes", "modify", "no"],
        default="yes"
    )

    if choice == "yes":
        return outline, True
    elif choice == "modify":
        console.print("\n[dim]Enter new outline (one section per line, empty to finish):[/dim]")
        new_outline = []
        while True:
            section = Prompt.ask("Section", default="")
            if not section:
                break
            new_outline.append(section)
        return new_outline if new_outline else outline, True
    else:
        return outline, False

Checkpoint 5: Draft Review

The most interactive checkpoint—users can revise multiple times:

@staticmethod
def review_draft(draft: str) -> tuple[str, str]:
    """Show draft and get feedback. Returns (feedback, action)."""
    console.print(Panel.fit(
        "[bold cyan]📄 DRAFT REVIEW[/bold cyan]",
        title="Checkpoint 5: Draft Review"
    ))

    # Render markdown
    console.print(Markdown(draft))

    console.print()
    action = Prompt.ask(
        "What would you like to do?",
        choices=["approve", "revise", "expand", "shorten", "restart"],
        default="approve"
    )

    feedback = ""
    if action in ["revise", "expand", "shorten"]:
        feedback = Prompt.ask("Provide specific feedback")

    return feedback, action

Actions available:

✅ Approve — Done, finalize
✏️ Revise — Make specific changes
➕ Expand — Add more detail to sections
➖ Shorten — Condense the content
🔄 Restart — Scrap and redo

Utility Methods

Some helper methods for consistent UI:

@staticmethod
def get_user_input(prompt: str, default: str = "") -> str:
    """Generic user input with nice formatting."""
    return Prompt.ask(f"[bold]{prompt}[/bold]", default=default)

@staticmethod
def confirm(message: str) -> bool:
    """Yes/no confirmation."""
    return Confirm.ask(message)

@staticmethod
def show_progress(phase: str, message: str):
    """Show progress indicator."""
    console.print(f"[bold blue]⚙ {phase}:[/bold blue] {message}")

@staticmethod
def show_error(message: str):
    """Show error message."""
    console.print(f"[bold red]✗ Error:[/bold red] {message}")

@staticmethod
def show_success(message: str):
    """Show success message."""
    console.print(f"[bold green]✓[/bold green] {message}")

The Flow with Checkpoints

Here’s how the research agent now flows:

Agent Flow with Checkpoints

The complete research workflow integrates five strategic checkpoints that keep humans in control throughout the entire process. Starting with user input defining the research topic, the agent enters the planning phase where the LLM generates research questions and search queries. Checkpoint 1 allows users to approve or modify this plan before any expensive operations begin.

Once approved, the searching phase executes the queries, followed by Checkpoint 2 where users select which sources are worth reading in detail. This prevents wasting resources on irrelevant pages. The agent then fetches the selected pages and moves to the synthesizing phase, extracting key facts from the content.

Checkpoint 3 provides fact verification, letting users remove incorrect or biased information before it enters the report. The writing phase begins with outline creation, validated at Checkpoint 4 to ensure proper structure. After drafting the report, Checkpoint 5 enables iterative review where users can request revisions, expansions, or condensing until satisfied. Finally, the agent saves the approved final report, completing a workflow where human oversight prevents errors at every critical decision point.

What’s Coming Next

We have all the pieces:

✅ Tools (Part 2)
✅ State & Memory (Part 3)
✅ Human Checkpoints (Part 4)

In Part 5, we build the Agent Core—the actual loop that ties everything together:

The system prompt that guides the LLM
Tool execution and result handling
The ReAct loop in code
Phase handlers that orchestrate the workflow

This is where it all comes together into a working agent.

Key Takeaways

Autonomous agents are risky — They can go off-track with no way to correct
Strategic checkpoints — After planning, before expensive operations, when finalizing
Give users options — Approve, modify, add, reject
Show your work — Display exactly what the agent found/extracted
Enable iteration — Draft review should loop until satisfied

Ready to build the brain? Continue to Part 5: The Agent Core →

On This Page