Skip to main content
data systems from the ground up

Polyglot Persistence Without the Polyglot Disaster

5 min read Chapter 21 of 36

Polyglot Persistence Without the Polyglot Disaster

The Black Box

The logistics platform uses PostgreSQL, Kafka, ClickHouse, and Redis. The team added each store for a legitimate reason. Now they maintain four different backup strategies, four monitoring dashboards, four upgrade procedures, and four sets of access controls. A new team member spends two weeks learning the data architecture before writing a feature. The operational cost of four databases is not four times one database. It is closer to eight times, because the interactions between stores create their own failure modes.

The Mechanism

Every additional database introduces five categories of operational cost:

  1. Deployment. Each database needs its own container, configuration, health check, and upgrade path. A PostgreSQL upgrade requires downtime planning. A Kafka upgrade requires rolling restarts with ISR awareness. A ClickHouse upgrade requires testing with production data volumes. These cannot be combined.

  2. Monitoring. PostgreSQL needs pg_stat_activity, pg_stat_replication, and autovacuum monitoring. Kafka needs consumer lag, ISR health, and partition balance monitoring. ClickHouse needs merge status and query queue monitoring. Redis needs memory usage and eviction rate monitoring. Four separate alerting configurations.

  3. Backup and recovery. PostgreSQL uses pg_basebackup and WAL archiving. Kafka relies on replication (and optionally MirrorMaker). ClickHouse uses clickhouse-backup. Redis uses RDB snapshots and AOF. Four recovery procedures to test quarterly.

  4. Security. Four separate authentication systems, TLS configurations, and access control policies. A security audit covers four attack surfaces instead of one.

  5. Consistency. Data flows between stores. When the flow breaks, stores disagree. Debugging cross-store inconsistencies requires understanding all four stores’ consistency models.

Why Dual Writes Fail

The tempting approach: when a package status changes, write to PostgreSQL AND Redis AND Kafka in the application code.

// BLACK BOX: dual write pattern (DO NOT USE)
void updateStatus(String packageId, String status) {
    // Write 1: PostgreSQL
    jdbc.update("UPDATE packages SET status = ? WHERE package_id = ?",
        status, packageId);

    // Write 2: Redis
    redis.set("pkg:status:" + packageId, status);

    // Write 3: Kafka
    producer.send(new ProducerRecord<>("package-events", packageId, status));
}
// If Write 1 succeeds and Write 2 fails:
//   PostgreSQL has the new status. Redis has the old status. Kafka never gets the event.
//   No transaction spans all three systems.
//   Wrapping in a try-catch and rolling back PostgreSQL on Redis failure
//   does not help: the Kafka write may have already succeeded.

Dual writes fail because there is no distributed transaction across PostgreSQL, Redis, and Kafka. Any write can fail independently, leaving the stores inconsistent. Retry logic helps but does not guarantee consistency: a crash between Write 1 and Write 2 is unrecoverable without a separate coordination mechanism.

The Single Source of Truth Pattern

// MECHANISM: single source of truth with derived stores
void updateStatus(String packageId, String status) {
    // Write to ONE store: PostgreSQL (the source of truth)
    jdbc.update("UPDATE packages SET status = ? WHERE package_id = ?",
        status, packageId);
    // That is it. The application does not write to Redis or Kafka.
}

// Debezium (covered in CH9) captures the PostgreSQL WAL change
// and publishes it to Kafka automatically.
// A Kafka consumer updates Redis.
// A Kafka consumer writes to ClickHouse.
// All derived stores are eventually consistent with PostgreSQL.
// If any derived store falls behind, it catches up from the Kafka topic.
// If a derived store is lost entirely, it is rebuilt from Kafka or PostgreSQL.

The application writes to one place. Everything else is derived. If the derivation pipeline fails, the source of truth is intact, and the pipeline can be replayed.

The Observable Consequence

Operational cost comparison:

MetricSingle PostgreSQLPostgreSQL + Kafka + ClickHouse + Redis
Deployment configs14
Monitoring dashboards14
Backup procedures14
Cross-store consistency issues/month02-5
Time to diagnose a data inconsistencyN/A2-8 hours
Time for new engineer to understand data layer1 week3 weeks

The numbers argue for minimalism. Every database added must justify its operational cost through a measurable improvement in a metric that matters: query latency, write throughput, or cost savings that exceed the operational overhead.

The Decision Rule

Add a new database only when the current architecture has a measurable bottleneck that the new store directly addresses. “It might be useful” is not sufficient. “The analytics query takes 4 minutes and causes transactional latency spikes, and a column store reduces it to 1.2 seconds” is sufficient.

When you do add a database, use CDC (Chapter 9) to flow data from the source of truth, not dual writes. Accept eventual consistency for derived stores. Define and measure the staleness budget: how many seconds of lag between the source and derived store is acceptable.

Consider consolidation when a derived store is underutilized. If ClickHouse is queried 48 times per day and costs $500/month to operate, evaluate whether PostgreSQL with a materialized view (refreshed nightly) can serve those 48 queries at acceptable latency. If it can, remove ClickHouse.