Why Event Sourcing

CRUD vs Event Sourcing comparison

CRUD systems store only the current state. Every UPDATE destroys the previous value. Event sourcing stores the sequence of state transitions, preserving the complete history. The diagram above captures the fundamental difference: destructive mutation versus append-only accumulation.

A customer places an order. Payment succeeds. Inventory is reserved. Then the customer calls support and says the order total is wrong. The support agent opens the admin panel. The order shows $47.98. The customer says it should be $42.98 because a promotion was applied. The database has one row in the orders table with total = 47.98. There is no record of the promotion. There is no record of the price calculation. There is no record of when the total changed or why.

The support agent escalates. An engineer queries the database. The orders table has 14 columns. Every UPDATE overwrites the previous value. The engineer checks the audit log. The audit log captures that a row was modified but not what the previous state was, because the audit trigger was written to capture row identity, not row content. The promotion service has no log of the calculation because it returns a response and discards the computation.

This is a $5 discrepancy. The investigation costs the company two hours of engineering time and erodes the customer’s trust. Multiply this by the number of orders processed daily, and the cost becomes structural.

Event sourcing solves this problem by never overwriting state. Every state change is recorded as an immutable event. The current state of any entity is derived by replaying its events in order. The $5 discrepancy investigation becomes trivial: read the event stream for the order, find the PromotionApplied event (or its absence), and the answer is immediate.

But event sourcing introduces its own costs. Storage grows continuously because nothing is deleted. Read queries become complex because the current state must be derived from events. Schema changes require careful migration strategies because old events persist forever. Projections, the read models built from events, require dedicated engineering effort to maintain.

This chapter establishes the problems that justify event sourcing, introduces the domain that runs through every chapter, states the opinions that drive every design decision, and draws the line between justified complexity and accidental complexity.

What CRUD Systems Cannot Express

A CRUD system stores current state. An UPDATE statement replaces the previous value. This design decision, made at the storage layer, propagates upward into every aspect of the system.

Lost history. When a customer changes their shipping address after order placement but before fulfilment, a CRUD system overwrites the address. If the fulfilment team shipped to the old address because they read it before the update, there is no record of what address they saw. The database contains only the current address. Reconstructing what happened requires correlating application logs, fulfilment service logs, and timestamps, assuming those logs exist and are queryable.

Inability to replay state. A finance team needs to know the state of all orders at the end of last quarter for reconciliation. A CRUD system cannot answer this question because the orders table contains only the current state. Some teams solve this with a separate reporting database that captures snapshots, but that database is a parallel system with its own consistency problems.

Race conditions on concurrent updates. Two support agents modify the same order simultaneously. One applies a discount. The other changes the shipping method. With a CRUD system, the last write wins unless the application implements optimistic locking. With optimistic locking, one agent’s change is rejected and must be retried. Neither agent sees a complete picture of the order’s history to understand why the conflict occurred.

Coupling between write and read concerns. The order table serves the customer-facing API, the admin panel, the fulfilment dashboard, and the analytics pipeline. Each consumer needs different columns, different joins, different aggregations. The table’s schema is a compromise that serves none of them well. Adding a column for analytics impacts the write path’s performance. Indexing for the admin panel’s search queries slows down order placement.

These are not theoretical problems. They are the daily reality of any system that manages entities with complex lifecycles.

The E-Commerce Order Management Platform

Every chapter in this book uses the same domain: an e-commerce order management platform with five bounded contexts that surface every event sourcing challenge naturally.

Order Management. The core aggregate. An order is placed, confirmed, modified, fulfilled, and potentially refunded. The order lifecycle has at least eight distinct states and multiple transition paths. An order can be partially fulfilled. An order can be partially refunded. These partial states are where CRUD systems struggle and event sourcing provides clarity.

Payment Processing. Payments are authorized, captured, partially refunded, and disputed. A payment is associated with an order but has its own lifecycle. A chargeback on a payment triggers a compensating action on the order. This cross-aggregate coordination is where sagas become necessary.

Inventory Reservation. When an order is placed, inventory is reserved. When an order is cancelled, the reservation is released. When inventory is physically depleted below the reserved quantity (damaged goods, miscounts), the system must reconcile. This bounded context surfaces the tension between strong consistency (reserve atomically with order placement) and eventual consistency (reserve asynchronously and handle failures).

Fulfilment Dispatch. Once payment is captured and inventory is confirmed, a fulfilment request is created. Fulfilment involves picking, packing, shipping, and delivery confirmation. Each step generates events. A fulfilment failure after partial shipment requires compensating events on the order.

Refund Handling. Refunds are requested, approved, processed, and completed. A refund can be full or partial. A partial refund against a partially fulfilled order requires calculating which items were fulfilled, which were returned, and what the correct refund amount is. This calculation depends on the full history of the order, not just its current state.

Each bounded context has its own aggregate, its own event stream, and its own read models. Cross-context coordination happens through domain events published to Kafka via the outbox pattern. The domain is rich enough to surface every challenge in the book: multi-step workflows, multiple projection targets, event schema evolution, and storage growth.

The Four Opinions

This book has a point of view and defends it consistently.

Build It from Scratch Before Using a Framework

Every core component in this book, the event store, the aggregate repository, the projection engine, the saga coordinator, is implemented in plain Java 21 and PostgreSQL before any framework is introduced. This is not an academic exercise. It is the difference between understanding a system and configuring one.

An engineer who has written an optimistic concurrency check against an event stream, handled the ConcurrentModificationException, and reasoned about the retry logic understands immediately why Axon Framework’s @Aggregate annotation exists and what it is hiding. The annotation manages aggregate loading, event application, concurrency control, and snapshotting. That is a lot of hidden behavior. An engineer who starts with the annotation and encounters a concurrency failure in production has no mental model for diagnosing the problem.

Axon Framework is referenced throughout this book as a production alternative. It is good software. It solves real problems. But it is referenced after the internals are established, not as the starting point.

PostgreSQL Is the Event Store

EventStoreDB is a dedicated database designed for event sourcing. It provides built-in stream management, subscriptions, projections, and optimistic concurrency. It is a fine piece of engineering.

For the majority of Java applications, a PostgreSQL table with an append-only constraint, a stream identifier, a sequence number, and a JSONB payload column is a correct, production-capable event store. The team already knows PostgreSQL. The operations team already monitors PostgreSQL. The backup strategy already covers PostgreSQL. The connection pooling is already configured.

The threshold for introducing EventStoreDB is specific: when your event throughput exceeds what a single PostgreSQL instance can handle for appends (typically above 50,000 events per second sustained), or when you need built-in cross-service event subscriptions with catch-up semantics that Kafka’s consumer group model does not match. Below those thresholds, a dedicated event store adds operational complexity without proportional benefit.

CQRS Without Event Sourcing Is Valid

CQRS, separating the write model from the read model, is independently valuable. A system that writes to a normalized relational schema and projects to denormalized read models via change data capture is a CQRS system. It does not require event sourcing.

Event sourcing adds three capabilities beyond CQRS: a complete audit trail, the ability to replay state at any point in time, and the ability to build new read models from historical events without backfilling from the write model. If your system does not need any of these three capabilities, event sourcing is accidental complexity. CQRS with a traditional database and change data capture is the simpler, correct choice.

This book states this plainly here and returns to it in chapter 16. Every chapter that introduces event sourcing complexity acknowledges the simpler CQRS-only alternative and the specific condition that justifies the additional complexity.

The Outbox Pattern Is Non-Negotiable

Publishing events to Kafka after writing them to the event store involves two systems: PostgreSQL and Kafka. Writing to both in sequence is a dual write. If the PostgreSQL transaction commits and the Kafka publish fails, the event store and the message broker are inconsistent. If the Kafka publish succeeds and the PostgreSQL transaction rolls back, consumers process an event that never happened.

The transactional outbox solves this by writing events to an outbox table within the same PostgreSQL transaction that writes to the event store. A separate process reads the outbox table and publishes to Kafka. The publish can be retried safely because consumers are idempotent. The outbox table and the event store are always consistent because they share a transaction.

This is not one option among many. It is the only correct approach for bridging the event store and an external message broker. Chapter 10 implements it in full.

The Cost of Event Sourcing

Event sourcing is not free. Every chapter that introduces a capability also introduces a cost. Here are the costs stated upfront.

Storage growth. Events are never deleted (in the general case). An order that goes through placement, confirmation, payment, fulfilment, and delivery generates at least five events. An order that is modified, partially fulfilled, and partially refunded generates twenty or more. Multiply by order volume and time, and the event store grows continuously. Chapter 13 covers retention, archiving, and the storage planning that production systems require.

Eventual consistency. The read model is always behind the write model by the time it takes to process an event and update the projection. For most use cases, this delay is milliseconds. For a projection that fell behind during a traffic spike, this delay can be hours. Chapter 7 through 9 cover projection engineering in depth.

Schema evolution complexity. An event stored today must be readable in five years. The OrderPlaced event written with version 1 of the schema must be processable by version 5 of the application. This requires careful schema design, version tracking, and upcasting chains. Chapter 12 covers this as a first-class engineering problem.

Projection maintenance. Every read model is a separate piece of infrastructure that must be built, deployed, monitored, and rebuilt when the projection logic changes. A CRUD system has one source of truth. An event-sourced system has one write model and N read models, each with its own consistency characteristics and failure modes.

Debugging complexity. When a projection shows incorrect data, the debugging path is: inspect the projection state, inspect the events that fed the projection, inspect the projection logic, and determine whether the bug is in the event production, the projection logic, or a schema mismatch. This is more complex than debugging a CRUD system where the data and the logic are co-located.

If these costs do not buy you something specific, audit trails, temporal queries, or event-driven integration, they are pure overhead.

What This Book Is Not

This book is not an Axon Framework tutorial. Axon is referenced as a production alternative after the internals are established. The goal is understanding, not framework proficiency.

This book is not a DDD theory textbook. Domain-driven design concepts like aggregates, bounded contexts, and domain events are used throughout, but they are applied to a specific implementation, not discussed as abstract patterns.

This book is not a Kafka operations guide. Kafka configuration is shown where it intersects with event publishing, but Kafka cluster management, partition strategies, and consumer group coordination beyond what event sourcing requires are out of scope.

This book is not a reactive programming manual. The implementations use blocking I/O with Spring MVC, not WebFlux. Reactive event processing is a valid approach, but it adds a dimension of complexity orthogonal to event sourcing. One new paradigm per book.

The reader already builds Java services with Spring Boot and PostgreSQL. This book shows what changes when state is derived from events instead of overwritten in place, builds every component from scratch to establish genuine understanding, and delivers a production-grade implementation using the tools the reader already knows.

The Road Ahead

Part I establishes the conceptual foundation: what events are, what an event store is, and why the write side and read side separate. Part II builds the write side from scratch: aggregates, the event store, and snapshotting. Part III builds the read side: projections, multiple read model targets, and projection rebuilding. Part IV connects the system to the outside world: Kafka, sagas, and schema evolution. Part V covers production operations: storage, debugging, and observability. Part VI makes the hard decisions: when not to use event sourcing and how to migrate incrementally.

Every chapter follows the same structure: the problem, the mechanism, the from-scratch implementation, what the implementation reveals, the production path with Spring Boot, and the test. No section is skipped. No section is reordered.