How I Built an AI System That Writes Full-Length Books

What if you could describe a book you wanted, like “a technical deep-dive into distributed systems for senior engineers” or “a dark fantasy novel about mercenaries and ancient gods,” and have an AI system actually write it? Not a summary. Not an outline. A complete, coherent, publication-ready book.

That’s what I built. You can see the generated books here. And the interesting part isn’t that it works; it’s how it works.

The Core Insight: Specialists Beat Generalists

Early experiments with a single prompt (“write me a book about X”) failed spectacularly. The output was shallow, repetitive, and lost coherence after a few thousand words. The model would forget earlier decisions, contradict itself, and produce that unmistakable AI slop we’ve all learned to recognize.

The breakthrough came from treating book writing like a publishing house, not a one-person operation. Real books aren’t written by one person doing everything. They’re produced by specialists: an author who writes, a researcher who gathers materials, an editor who ensures quality, and a project manager who keeps everything on track.

So I built five specialized agents:

The Architect designs the book’s structure: chapters, sections, style rules, and what each part must accomplish
The Librarian researches each section, gathering facts from both internal knowledge and internet searches
The Writer transforms research into prose
The Editor reviews every draft for quality, hallucination, and consistency
The Polisher applies final copy-editing before publication

Each agent has a narrow job, specific instructions, and structured outputs. The Librarian doesn’t write. The Writer doesn’t research. The Editor doesn’t edit; it only judges.

Structuring a Book as a Dependency Graph

Books have structure. Chapter 3 might depend on concepts introduced in Chapter 2. A fantasy novel’s climax depends on character development from earlier chapters. A technical book’s advanced section assumes readers completed the fundamentals.

I modeled this as a directed acyclic graph (DAG). Each node represents a chapter, section, or subsection. Nodes have explicit prerequisites: other nodes that must be completed first.

CH1 ──────────────────▶ CH2 ──────────────────▶ CH3
 │                       │                       │
 ├──▶ CH1-S1             ├──▶ CH2-S1             ├──▶ CH3-S1
 │                       │
 └──▶ CH1-S2             └──▶ CH2-S2
       │                       │
       └──▶ CH1-S2-SUB1        └──▶ CH2-S2-SUB1

The Architect generates this structure upfront, specifying for each node:

A precise goal (“reader implements LRU and LFU caches, understands eviction trade-offs”)
Specific requirements (“Observer pattern with 3 examples, sequence diagram”)
Boundary constraints (“Skip performance optimization, that’s Chapter 5”)
Research queries if external data is needed

This exhaustive specification is crucial. Vague goals like “explain caching” produce vague content. Precise goals produce precise content.

The Generation Loop

For each node in the graph, the system runs a tight loop:

Research → Write → Review → (Pass or Retry)

The Librarian gathers facts, definitions, code examples, and citations. It combines internal LLM knowledge with optional internet search, producing a dense “context packet” of verified information.

The Writer transforms this packet into prose, following strict rules: use only the provided facts, cite external sources with numbered references, never invent content.

The Editor scores the draft on two dimensions: goal fulfillment (did it accomplish what the node specified?) and style compliance (does it follow the book’s voice, avoid forbidden phrases, format citations correctly?). It outputs one of three verdicts: PASS, PASS_WITH_EDITS, or FAIL.

Failed drafts trigger a retry. The Editor’s feedback tells the Librarian where to research deeper or tells the Writer what to fix. After three failed attempts, the system keeps the best draft and moves on, flagging it for human review.

Preventing Hallucination

This is where most AI writing systems fail. Without guardrails, models confidently generate fake citations, fabricated statistics, and invented quotes.

I implemented four layers of hallucination prevention:

Layer 1: Source Validation The Librarian can only cite URLs that actually appeared in search results. If it claims a fact comes from a webpage, that webpage must exist in the provided search data.

Layer 2: Writer Constraints The Writer receives a context packet and must write only from that material. No reaching into its training data for convenient facts.

Layer 3: Editor Verification The Editor cross-checks every factual claim against the context packet. Citation indices must match actual sources. Claims without backing get flagged.

Layer 4: Programmatic Override Even when the Editor LLM says “PASS,” the code can override. Missing required materials? Automatic FAIL. Style score below threshold? Downgrade to PASS_WITH_EDITS.

This layered approach catches most fabrications. The system isn’t perfect—no AI system is—but it’s vastly better than naive generation.

Shared State for Coherence

A 60,000-word book needs consistency. Character names can’t change. Technical terminology must stay stable. Facts established in Chapter 1 can’t be contradicted in Chapter 8.

Three mechanisms maintain coherence:

Truth Table: As the Editor approves content, it extracts atomic facts like “the protagonist is 34 years old” or “the API uses JWT authentication.” These become immutable constraints for later nodes.

Terminology Registry: Technical terms and character names get locked in with their definitions. Later agents must use these exact terms.

Shared Materials: Code examples, interfaces, and citations from earlier chapters become available for later chapters to reference or extend.

When writing Chapter 5, the system has access to every verified fact, term, and code block from Chapters 1-4. This context grows with the book, maintaining coherence across tens of thousands of words.

Genre Flexibility

The same pipeline works for radically different books. The key is the style guide.

For a technical book on distributed systems:

Voice: rigorous and prescriptive
All code in Java 21+
Citations required for external claims
Forbidden phrases include “basically” and “it’s important to note”

For a fantasy novel:

Voice: immersive and evocative
Third-person limited POV, present tense
No citations (it’s fiction)
Internet search disabled

For a history book:

Voice: analytical narrative
Chicago-style citations required
Primary sources quoted with attribution
Temporal accuracy verified against truth table

The Architect generates appropriate style guides based on the input theme. Downstream agents enforce these rules strictly.

Research Integration

For non-fiction, accurate research matters. The system integrates web search with semantic reranking.

The flow:

Execute search queries against a search API
Fetch and extract content from result pages
Chunk long documents and embed them
Semantically rerank chunks against the original query
Return the most relevant snippets to the Librarian

PDFs and documents get converted to markdown. Low-relevance results get filtered. Duplicate content gets deduplicated.

The Librarian receives cleaned, ranked snippets and synthesizes them with internal knowledge. It distinguishes between internally-known facts (no citation needed) and externally-verified facts (citation required).

Resumable Generation

A medium-length book might take hours to generate. Interruptions happen. The system supports resumption through deterministic run IDs.

When you request a book, the system hashes your input parameters into a unique identifier. If a partial generation exists with that ID, it picks up where it left off. Same inputs always produce the same run ID, making generation idempotent.

Book state (completed nodes, truth tables, terminology) persists to disk after each successful node. Power failure at Chapter 7? Restart and continue from Chapter 7.

What I Learned

Structured outputs change everything. Getting JSON instead of freeform text from LLMs (via Pydantic models) makes the pipeline tractable. You can validate, transform, and route structured data. You can’t do that with paragraphs.

Explicit specifications beat implicit understanding. Telling an LLM “write a good chapter” fails. Telling it “the reader must be able to implement an LRU cache after reading this, you must include 3 code examples, and you must not discuss performance optimization” succeeds.

Multi-agent systems need hard constraints. LLM-based agents are unreliable. They skip steps, hallucinate confidence, and drift from instructions. Programmatic validation catches what they miss.

State management is the hard part. Generating one good paragraph is easy. Maintaining consistency across 50,000 words is hard. The truth table, terminology registry, and shared materials are what make this work.

Trade-offs and Limitations

This isn’t magic. The system has real constraints:

Quality ceiling: Output quality depends on the underlying models. Better models produce better books.
Cost: A full book run uses significant token volume. This isn’t cheap.
Human review still needed: The system catches many errors but not all. Final human review remains essential.
Creative limits: Fiction produced this way is competent but rarely surprising. Human creativity still wins for genuine artistry.

Want to see what the system produces? Check out the generated books.

The sweet spot is technical and educational content: books where accuracy, structure, and completeness matter more than lyrical prose. Handbooks, tutorials, historical surveys, reference guides—these work well.

The Bigger Picture

This project convinced me that useful AI systems are less about raw model capability and more about architecture. A single LLM call, no matter how good the model, can’t produce a coherent book. But orchestrate multiple specialized calls with shared state, validation layers, and iterative refinement? Now you’re getting somewhere.

The same pattern (specialized agents with structured communication, explicit state management, and programmatic guardrails) applies beyond book writing. It’s a template for any complex generation task where coherence and accuracy matter.

The models will keep getting better. The architecture patterns that make them useful? Those we need to figure out now.

On This Page

The Core Insight: Specialists Beat Generalists

Structuring a Book as a Dependency Graph

The Generation Loop

Preventing Hallucination

Shared State for Coherence

Genre Flexibility

Research Integration

Resumable Generation

What I Learned

Trade-offs and Limitations

The Bigger Picture

Continue reading

Related Content

Why Clean Architecture is a Maintainability Nightmare

Codexity Part 2: Query Rewriting with LLMs

Codexity Part 6: Small Model Inference with llama-cpp-python