Skip to main content
surviving the spike

Data Replication and Consistency Across Regions

7 min read Chapter 62 of 66

Data Replication and Consistency Across Regions

The Symptom

The EU-West region is live. DNS geo-routing sends European riders to eu-west. Latency dropped from 520ms to 85ms. The product team is celebrating. Then a rider in Berlin updates their payment method. They open the app in London two hours later. The old payment method is showing. They update it again. Fly to New York. Old payment method.

The support team files three tickets in one week: “User profile changes not persisting.” The changes are persisting. They are persisting in the region where the user made them. The other regions have not received the update yet because the replication subscription was paused by a schema mismatch nobody noticed for 9 hours.

Multi-region is live. Multi-region consistency is not.

The Cause

PostgreSQL logical replication is asynchronous. Changes on the publisher (US-East) are streamed to subscribers (EU-West) with a delay that ranges from sub-second under normal conditions to minutes or hours when things go wrong. “Things going wrong” includes: schema changes applied to the publisher but not the subscriber, network interruptions, subscriber falling behind on a large batch insert, or the subscriber’s max_worker_processes being too low.

The ride-hailing platform has four categories of data, each with different consistency requirements:

Data Type         Scope      Consistency Need     Replication Strategy
Trip data         Regional   Strong in-region     No replication (stays local)
Driver location   Regional   Eventual, < 1s       No replication (regional Redis)
User profiles     Global     Eventual, < 5s       PG logical replication
Fare config       Global     Eventual, < 60s      PG logical replication
Surge zones       Global     Eventual, < 60s      PG logical replication
Payment methods   Global     Eventual, < 5s       PG logical replication

Trip data is regional. A trip from Berlin Mitte to Kreuzberg exists only in EU-West. A trip from Manhattan to JFK exists only in US-East. Replicating trip data globally wastes bandwidth and creates unnecessary conflict potential.

Driver location is regional and ephemeral. A driver in Berlin is irrelevant to a rider in New York. Each region’s Redis holds only its drivers.

User profiles, fare configuration, and surge zone definitions are global. A rider who creates an account in Berlin and travels to New York needs their profile in both regions.

The Baseline

Initial replication setup: the publication exists, the subscription exists, but monitoring does not.

-- BOTTLENECK: Replication with no monitoring
-- On US-East (publisher)
CREATE PUBLICATION ride_platform_pub
    FOR TABLE user_profiles, fare_config, surge_zones,
              payment_methods
    WITH (publish = 'insert, update, delete');

-- On EU-West (subscriber)
CREATE SUBSCRIPTION ride_platform_sub
    CONNECTION 'host=pg-us-east.internal port=5432
                dbname=rides user=replicator'
    PUBLICATION ride_platform_pub;

-- No lag monitoring
-- No alerting
-- No conflict resolution strategy
-- Schema changes break the subscription silently

When the subscription breaks, the subscriber stops receiving updates. No error in the application logs. No alert. The data drifts until a user reports it.

The Fix

Replication Lag Monitoring

-- SCALED: Monitor replication lag on the publisher
-- This query runs on US-East and reports lag per subscriber
SELECT
    client_addr,
    application_name,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes,
    write_lag,
    flush_lag,
    replay_lag
FROM pg_stat_replication;

Prometheus Exporter for Replication Lag

# SCALED: postgres_exporter query for replication lag
pg_replication:
  query: |
    SELECT
      application_name,
      EXTRACT(EPOCH FROM replay_lag) AS replay_lag_seconds,
      pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
    FROM pg_stat_replication
  metrics:
    - application_name:
        usage: "LABEL"
    - replay_lag_seconds:
        usage: "GAUGE"
        description: "Replication lag in seconds"
    - lag_bytes:
        usage: "GAUGE"
        description: "Replication lag in bytes"

Alert at 5 Seconds

# SCALED: Prometheus alert for replication lag
groups:
  - name: replication
    rules:
      - alert: ReplicationLagHigh
        expr: >
          pg_replication_replay_lag_seconds{
            application_name="ride_platform_sub"
          } > 5
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: >
            Replication lag to
            {{ $labels.application_name }}
            is {{ $value }}s
          runbook: >
            Check subscriber status with
            SELECT * FROM pg_stat_subscription;

      - alert: ReplicationLagCritical
        expr: >
          pg_replication_replay_lag_seconds{
            application_name="ride_platform_sub"
          } > 30
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: >
            Replication lag to
            {{ $labels.application_name }}
            exceeds 30s. Possible subscription failure.

Data Partitioning: Regional vs Global

// SCALED: Repository layer that routes writes correctly
@Repository
public class TripRepository {

    private final R2dbcEntityTemplate localDb;
    private final RegionRouter regionRouter;

    // Trips write to the LOCAL region's database only.
    // A trip in Berlin writes to EU-West PG.
    // A trip in NYC writes to US-East PG.
    // No cross-region replication for trip data.
    public Mono<Trip> save(Trip trip) {
        trip.setRegion(regionRouter.getCurrentRegion());
        return localDb.insert(trip);
    }

    // Trip queries are region-scoped.
    // A rider in Berlin sees only their EU trips.
    // If they need US trip history, a cross-region
    // API call fetches it on demand (rare).
    public Flux<Trip> findByRiderId(String riderId) {
        return localDb.select(Trip.class)
            .matching(query(where("riderId").is(riderId)
                .and("region").is(
                    regionRouter.getCurrentRegion())))
            .all();
    }
}
// SCALED: Global data writes to US-East primary,
// replicates to EU-West via PG logical replication
@Repository
public class UserProfileRepository {

    private final R2dbcEntityTemplate primaryDb;
    private final R2dbcEntityTemplate localDb;
    private final RegionRouter regionRouter;
    private final KafkaTemplate<String, ProfileUpdateEvent>
        kafkaTemplate;

    public Mono<UserProfile> save(UserProfile profile) {
        if (regionRouter.isPrimary()) {
            // US-East: write directly, replication
            // handles EU-West
            return primaryDb.update(profile);
        } else {
            // EU-West: write to local DB
            // (which accepts writes for profiles)
            // AND publish event for cache invalidation
            return localDb.update(profile)
                .doOnSuccess(p -> kafkaTemplate.send(
                    "profile-updates",
                    p.getUserId(),
                    new ProfileUpdateEvent(
                        p.getUserId(),
                        regionRouter.getCurrentRegion(),
                        Instant.now())));
        }
    }
}

Conflict Resolution: Last-Write-Wins

-- SCALED: Conflict resolution for payment methods
-- PG logical replication does not handle conflicts
-- automatically. The application must.

-- Add a last_modified column for conflict resolution
ALTER TABLE payment_methods
    ADD COLUMN last_modified TIMESTAMPTZ
    DEFAULT now();

-- Application-level conflict resolution:
-- When a profile update arrives via replication AND
-- a local update exists, the later timestamp wins.
// SCALED: Last-write-wins conflict resolution
@KafkaListener(
    topics = "profile-updates",
    groupId = "${app.region}-profile-sync")
public class ProfileConflictResolver {

    private final UserProfileRepository profileRepo;
    private final ReactiveRedisTemplate<String, String> redis;

    @KafkaHandler
    public void onProfileUpdate(ProfileUpdateEvent event) {
        // Check if local version is newer
        profileRepo.findById(event.getUserId())
            .flatMap(local -> {
                if (local.getLastModified()
                        .isAfter(event.getTimestamp())) {
                    // Local is newer, ignore remote update
                    return Mono.empty();
                }
                // Remote is newer, invalidate cache
                // Next read fetches from replicated PG
                return redis.delete(
                    "profile:" + event.getUserId());
            })
            .subscribe();
    }
}

Redis: Regional Instances, Never Replicated

// SCALED: Driver location is purely regional
@Service
public class DriverLocationService {

    private final ReactiveRedisTemplate<String, String>
        regionalRedis;

    // Driver locations are stored in the regional Redis
    // only. A driver in Berlin exists in EU-West Redis.
    // US-East Redis does not know about them.
    public Mono<Boolean> updateLocation(
            String driverId, double lat, double lng) {
        return regionalRedis.opsForGeo()
            .add("drivers:active",
                new Point(lng, lat), driverId)
            .map(added -> added > 0);
    }

    // Nearby driver search is always regional.
    // "Find drivers near me" only finds drivers
    // in the same region.
    public Flux<String> findNearby(
            double lat, double lng, double radiusKm) {
        return regionalRedis.opsForGeo()
            .radius("drivers:active",
                new Circle(new Point(lng, lat),
                    new Distance(radiusKm,
                        Metrics.KILOMETERS)))
            .map(result -> result.getContent().getName());
    }
}

CockroachDB and Spanner offer globally consistent reads and writes. They pay for this with higher write latency (consensus across regions) and significantly higher cost. For the ride-hailing platform, eventual consistency with explicit per-data-type SLOs is the right tradeoff. A rider seeing a 3-second-old profile is acceptable. A rider seeing a 3-second-old driver location is also acceptable (the driver moved 30 meters). A rider not seeing their trip at all because it exists in another region is not acceptable, which is why trip data is regional and queries are region-scoped.

The Proof

Replication lag monitoring after deploying the full setup:

Steady State (72-hour observation):
  Replication lag (avg):       0.3 seconds
  Replication lag (p99):       1.8 seconds
  Replication lag (max):       4.2 seconds
  Subscription interruptions:  0
  Alert fires:                 0

Profile update propagation test:
  Update in US-East, read in EU-West:
    p50: 0.8 seconds
    p99: 2.1 seconds

  Update in EU-West, cache invalidation:
    p50: 1.2 seconds (Kafka cross-region)
    p99: 3.4 seconds

Consistency SLO compliance:
  Driver location  (< 1s):    99.97% met
  User profiles    (< 5s):    99.94% met
  Fare config      (< 60s):   100% met
  Surge zones      (< 60s):   100% met

Every data type meets its consistency SLO. The monitoring catches drift before users do. The conflict resolution handles the rare case of simultaneous profile updates across regions. The separation of regional data (trips, driver location) from global data (profiles, config) keeps replication bandwidth manageable and conflict potential low.