Read Replicas and Connection Routing

The Symptom

The trip history endpoint serves 4,200 RPS during evening surge. Every request hits the primary PostgreSQL instance. Read queries compete with write transactions for CPU, buffer pool pages, and disk I/O. The primary’s CPU sits at 91% and connection pool acquisition time spikes to 1,800ms at p99. Write latency for trip completion (the revenue-critical path) degrades from 12ms to 340ms because reads are starving writes.

The Cause

The ride-hailing platform’s read/write ratio is 19:1. For every trip creation, there are 19 trip history lookups, driver earnings checks, and dashboard polls. All 20 queries hit the same PostgreSQL instance. The primary does double duty: accepting WAL writes for trip events and serving full table scans for history queries. Read replicas exist for this exact pattern. Route the 95% of read traffic elsewhere. Let the primary do what only the primary can do: write.

The Baseline

# load-tests/replica_comparison_locustfile.py
from locust import HttpUser, task, between, LoadTestShape

class TripUser(HttpUser):
    wait_time = between(0.3, 0.8)

    @task(19)
    def get_trip_history(self):
        rider_id = f"rider-{self.environment.runner.user_count % 50000}"
        self.client.get(
            f"/api/trips/history?riderId={rider_id}&page=0&size=20",
            name="/api/trips/history [READ]"
        )

    @task(1)
    def complete_trip(self):
        self.client.post(
            "/api/trips/complete",
            json={
                "tripId": f"trip-{self.environment.runner.user_count}-{id(self)}",
                "fare": 18.75,
                "distance": 6.2
            },
            name="/api/trips/complete [WRITE]"
        )


class ReplicaTestShape(LoadTestShape):
    stages = [
        {"duration": 60,  "users": 200,  "spawn_rate": 20},
        {"duration": 120, "users": 400,  "spawn_rate": 20},
        {"duration": 180, "users": 600,  "spawn_rate": 20},
        {"duration": 240, "users": 800,  "spawn_rate": 20},
        {"duration": 300, "users": 1000, "spawn_rate": 20},
    ]

    def tick(self):
        run_time = self.get_run_time()
        for stage in self.stages:
            if run_time < stage["duration"]:
                return (stage["users"], stage["spawn_rate"])
        return None

Single primary, no replicas, 50M rows in the trips table:

Users	Read RPS	Write RPS	p99 Read (ms)	p99 Write (ms)	PG CPU
200	350	19	180	22	41%
400	620	33	540	85	68%
600	780	41	1,400	210	84%
800	810	43	2,600	340	91%
1000	790	38	5,200	890	95%

Write p99 tripled from 22ms to 890ms. Not because writes got slower. Because reads consumed all available CPU and I/O, forcing writes to wait.

The Fix

Step 1: DataSource Routing

Spring’s AbstractRoutingDataSource selects a DataSource based on a lookup key. The key is determined per-transaction: readOnly = true routes to a replica, everything else routes to primary.

// SCALED: Enum for routing targets
public enum DataSourceType {
    PRIMARY,
    REPLICA
}

// SCALED: ThreadLocal holds the routing key for the current transaction
public class DataSourceContext {

    private static final ThreadLocal<DataSourceType> CONTEXT =
        ThreadLocal.withInitial(() -> DataSourceType.PRIMARY);

    public static void set(DataSourceType type) {
        CONTEXT.set(type);
    }

    public static DataSourceType get() {
        return CONTEXT.get();
    }

    public static void clear() {
        CONTEXT.remove();
    }
}

// SCALED: Routes to primary or replica based on ThreadLocal
public class ReadWriteRoutingDataSource extends AbstractRoutingDataSource {

    @Override
    protected Object determineCurrentLookupKey() {
        return DataSourceContext.get();
    }
}

Step 2: AOP Aspect for Automatic Routing

An aspect intercepts @Transactional(readOnly = true) methods and sets the routing context before the transaction begins.

// SCALED: AOP sets routing key before transaction opens connection
@Aspect
@Component
@Order(Ordered.HIGHEST_PRECEDENCE)
public class ReadOnlyRoutingAspect {

    @Around("@annotation(transactional)")
    public Object route(ProceedingJoinPoint joinPoint,
                        Transactional transactional) throws Throwable {
        if (transactional.readOnly()) {
            DataSourceContext.set(DataSourceType.REPLICA);
        } else {
            DataSourceContext.set(DataSourceType.PRIMARY);
        }
        try {
            return joinPoint.proceed();
        } finally {
            DataSourceContext.clear();
        }
    }
}

Step 3: DataSource Configuration

Two HikariCP pools: one for the primary, one for replicas. The replica pool uses round-robin across two replica hosts.

// SCALED: Separate connection pools for primary and replicas
@Configuration
public class DataSourceConfig {

    @Bean
    public DataSource dataSource(
            @Qualifier("primaryDataSource") DataSource primary,
            @Qualifier("replicaDataSource") DataSource replica) {

        var router = new ReadWriteRoutingDataSource();
        var dataSources = Map.<Object, Object>of(
            DataSourceType.PRIMARY, primary,
            DataSourceType.REPLICA, replica
        );
        router.setTargetDataSources(dataSources);
        router.setDefaultTargetDataSource(primary);
        return router;
    }

    @Bean("primaryDataSource")
    @ConfigurationProperties("spring.datasource.primary")
    public DataSource primaryDataSource() {
        return DataSourceBuilder.create()
            .type(HikariDataSource.class)
            .build();
    }

    @Bean("replicaDataSource")
    @ConfigurationProperties("spring.datasource.replica")
    public DataSource replicaDataSource() {
        return DataSourceBuilder.create()
            .type(HikariDataSource.class)
            .build();
    }
}

# application.yml
spring:
  datasource:
    primary:
      jdbc-url: jdbc:postgresql://pg-primary:5432/ridehail
      username: app
      password: ${DB_PASSWORD}
      hikari:
        maximum-pool-size: 20
        connection-timeout: 5000
    replica:
      jdbc-url: jdbc:postgresql://pg-replica-1:5432/ridehail,pg-replica-2:5432/ridehail
      username: app_readonly
      password: ${DB_READONLY_PASSWORD}
      hikari:
        maximum-pool-size: 40
        connection-timeout: 5000
        read-only: true

The replica pool is larger (40 vs 20) because it handles 19x the traffic. The read-only: true on HikariCP sets the JDBC connection to read-only mode, which PostgreSQL uses to reject any accidental writes.

Step 4: Replication Lag Monitoring

Streaming replication has lag. Under normal load, lag is sub-millisecond. Under write bursts (surge pricing recalculation, batch trip imports), lag can hit seconds. Monitor it.

-- Run on the primary: check replication lag per replica
SELECT
    client_addr,
    state,
    sent_lsn,
    replay_lsn,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes,
    replay_lag
FROM pg_stat_replication;

Expose it as a Prometheus metric:

// SCALED: Custom gauge for replication lag per replica
@Component
public class ReplicationLagMonitor {

    private final JdbcTemplate primaryJdbc;
    private final MeterRegistry registry;

    @Scheduled(fixedRate = 5000)
    public void checkReplicationLag() {
        primaryJdbc.query(
            """
            SELECT client_addr,
                   EXTRACT(EPOCH FROM replay_lag) AS lag_seconds
            FROM pg_stat_replication
            """,
            rs -> {
                while (rs.next()) {
                    Gauge.builder("pg.replication.lag.seconds")
                        .tag("replica", rs.getString("client_addr"))
                        .register(registry)
                        .set(rs.getDouble("lag_seconds"));
                }
            }
        );
    }
}

Alert when lag exceeds 500ms:

# prometheus/alerts.yml
- alert: PostgresReplicationLagHigh
  expr: pg_replication_lag_seconds > 0.5
  for: 30s
  labels:
    severity: warning
  annotations:
    summary: "Replication lag on {{ $labels.replica }} is {{ $value }}s"

Step 5: Read-Your-Writes Consistency

A rider completes a trip and immediately opens trip history. The trip was written to the primary 50ms ago. The replica has not replayed it yet. The rider sees stale data. They tap refresh. Still stale. They contact support.

Fix: route reads to the primary for users who have written in the last N seconds.

// SCALED: Track recent writers, route their reads to primary
@Component
public class RecentWriterTracker {

    private final RedisTemplate<String, String> redis;
    private static final Duration CONSISTENCY_WINDOW = Duration.ofSeconds(5);

    public void markWriter(String userId) {
        String key = "recent-writer:" + userId;
        redis.opsForValue().set(key, "1", CONSISTENCY_WINDOW);
    }

    public boolean isRecentWriter(String userId) {
        return Boolean.TRUE.equals(redis.hasKey("recent-writer:" + userId));
    }
}

// SCALED: Modified routing aspect checks for recent writers
@Aspect
@Component
@Order(Ordered.HIGHEST_PRECEDENCE)
public class ReadOnlyRoutingAspect {

    private final RecentWriterTracker recentWriterTracker;

    @Around("@annotation(transactional)")
    public Object route(ProceedingJoinPoint joinPoint,
                        Transactional transactional) throws Throwable {
        if (transactional.readOnly()) {
            String userId = SecurityContextHolder.getContext()
                .getAuthentication().getName();

            // Recent writers read from primary for consistency
            if (recentWriterTracker.isRecentWriter(userId)) {
                DataSourceContext.set(DataSourceType.PRIMARY);  // BOTTLENECK: bypasses replica
            } else {
                DataSourceContext.set(DataSourceType.REPLICA);  // SCALED: normal read path
            }
        } else {
            DataSourceContext.set(DataSourceType.PRIMARY);
        }
        try {
            return joinPoint.proceed();
        } finally {
            DataSourceContext.clear();
        }
    }
}

// SCALED: Trip completion marks the rider as a recent writer
@Service
public class TripCompletionService {

    private final RecentWriterTracker recentWriterTracker;
    private final TripRepository tripRepository;

    @Transactional
    public Trip completeTrip(TripCompletionRequest request) {
        Trip trip = tripRepository.findById(request.getTripId())
            .orElseThrow();
        trip.complete(request.getFare(), request.getDistance());
        Trip saved = tripRepository.save(trip);
        recentWriterTracker.markWriter(trip.getRiderId());
        return saved;
    }
}

The 5-second consistency window means ~2% of read traffic routes to primary (riders who completed a trip in the last 5 seconds). The other 98% goes to replicas. The primary is not overloaded. The rider sees their completed trip immediately.

The Proof

Same Locust test, now with 1 primary + 2 replicas + read-your-writes routing:

Users	Read RPS	Write RPS	p99 Read (ms)	p99 Write (ms)	Primary CPU	Replica CPU
200	380	20	32	14	8%	18%
400	750	40	38	16	11%	24%
600	1,120	59	42	18	14%	31%
800	1,480	78	48	19	16%	38%
1000	1,840	97	55	21	19%	44%

Metric	Single Primary	Primary + 2 Replicas	Improvement
Max read RPS	810	1,840	2.3x
p99 read at 800 users	2,600ms	48ms	54x
p99 write at 800 users	340ms	19ms	18x
Primary CPU at 800 users	91%	16%	5.7x reduction
Error rate at 1000 users	timeouts	0%	Eliminated

Write latency improved 18x without changing any write code. Removing read load from the primary freed CPU and I/O for writes. The replicas each run at 38% CPU at 800 users, meaning each replica has headroom for 2x more read traffic before needing a third replica.

The zero-code-change claim is real for the repository layer. The TripRepository interface has no idea it talks to replicas. The routing is entirely in the DataSource layer and the AOP aspect. Add a new @Transactional(readOnly = true) service method and it automatically routes to replicas.