Database Scaling: Read Replicas, Sharding, and When PostgreSQL Is Not the Problem

The Symptom

The trip history endpoint is slow. At 500 RPS, p99 latency is 3,400ms. The dashboard shows PostgreSQL CPU at 92%. The on-call engineer opens a ticket: “Need to shard the database.” The team spends three weeks evaluating sharding strategies. The fix turns out to be a composite index that takes 30 seconds to create.

This chapter covers the full spectrum of PostgreSQL scaling: read replicas for read-heavy workloads, sharding for write-heavy workloads, and the discipline to run EXPLAIN ANALYZE before doing either.

Database scaling progression from free optimizations to expensive infrastructure changes

The staircase shows the correct order for database scaling decisions. Steps 1 through 3—adding indexes, optimizing queries with EXPLAIN ANALYZE, and tuning connection pools—are free and take hours, not days. Steps 4 and 5—read replicas and sharding—cost real money and introduce operational complexity like replication lag and cross-shard joins. Most teams jump to step 4 or 5 when their actual problem is a missing composite index that takes 30 seconds to create. Run the free steps first. Measure after each one. Most platforms never need sharding.

The Cause

The ride-hailing platform has a common read/write ratio: 95% reads, 5% writes. Every rider checking trip history, every driver viewing earnings, every ops dashboard polling for active rides. All reads hitting a single PostgreSQL primary. The primary handles writes (trip creation, status updates, payments) and reads (trip history, driver earnings, analytics) on the same instance.

The CPU is high not because of write contention. The CPU is high because 10,000 concurrent read queries compete with write transactions for CPU cycles, buffer pool pages, and I/O bandwidth.

The Baseline

# load-tests/trip_history_locustfile.py
from locust import HttpUser, task, between, LoadTestShape

class TripHistoryUser(HttpUser):
    wait_time = between(0.5, 1.0)

    @task(9)
    def get_trip_history(self):
        rider_id = f"rider-{self.environment.runner.user_count % 10000}"
        self.client.get(
            f"/api/trips/history?riderId={rider_id}&page=0&size=20",
            name="/api/trips/history"
        )

    @task(1)
    def complete_trip(self):
        self.client.post(
            "/api/trips/complete",
            json={
                "tripId": f"trip-{self.environment.runner.user_count}",
                "fare": 24.50,
                "distance": 8.3
            },
            name="/api/trips/complete"
        )


class RampLoad(LoadTestShape):
    stages = [
        {"duration": 60,  "users": 100,  "spawn_rate": 20},
        {"duration": 120, "users": 300,  "spawn_rate": 20},
        {"duration": 180, "users": 500,  "spawn_rate": 20},
        {"duration": 240, "users": 800,  "spawn_rate": 20},
        {"duration": 300, "users": 1000, "spawn_rate": 20},
    ]

    def tick(self):
        run_time = self.get_run_time()
        for stage in self.stages:
            if run_time < stage["duration"]:
                return (stage["users"], stage["spawn_rate"])
        return None

Single PostgreSQL instance, 50M trips, no read replicas:

Users	RPS	p50 (ms)	p99 (ms)	PG CPU	Errors
100	180	45	210	28%	0%
300	490	120	890	61%	0%
500	680	340	2,100	85%	0.2%
800	720	1,200	3,400	92%	4.1%
1000	710	2,800	8,100	95%	12.3%

RPS flatlines at ~720. The database cannot push more throughput. Every read query after that point sits in queue.

The Fix: Three Layers

Three interventions, applied in order of complexity and operational cost:

Layer 1: Read Replicas

Route the 95% of read traffic to replicas. The primary handles only writes. Implementation: Spring’s AbstractRoutingDataSource with @Transactional(readOnly = true) driving the routing decision.

// SCALED: Read/write routing based on transaction type
public class ReadWriteRoutingDataSource extends AbstractRoutingDataSource {

    @Override
    protected Object determineCurrentLookupKey() {
        return TransactionSynchronizationManager.isCurrentTransactionReadOnly()
            ? DataSourceType.REPLICA
            : DataSourceType.PRIMARY;
    }
}

// SCALED: Service layer declares read-only intent
@Service
public class TripHistoryService {

    @Transactional(readOnly = true)
    public Page<TripSummary> getTripHistory(String riderId, Pageable pageable) {
        // This query routes to a replica automatically
        return tripRepository.findByRiderIdOrderByCompletedAtDesc(riderId, pageable);
    }

    @Transactional
    public Trip completeTrip(TripCompletionRequest request) {
        // This routes to primary automatically
        Trip trip = tripRepository.findById(request.getTripId())
            .orElseThrow();
        trip.complete(request.getFare(), request.getDistance());
        return tripRepository.save(trip);
    }
}

Section CH8-S1 covers the full implementation: DataSource configuration, AOP-based routing, replication lag monitoring, and read-your-writes consistency.

Layer 2: Index Optimization

Before sharding, run EXPLAIN ANALYZE. The trip history query:

-- BOTTLENECK: Sequential scan on 50M rows
EXPLAIN ANALYZE
SELECT * FROM trips
WHERE rider_id = 'rider-4821'
ORDER BY completed_at DESC
LIMIT 20;

-- Seq Scan on trips  (cost=0.00..1284567.00 rows=50000000 width=248)
--   Filter: (rider_id = 'rider-4821'::text)
--   Rows Removed by Filter: 49999482
-- Planning Time: 0.15 ms
-- Execution Time: 2,143.82 ms

The fix:

-- SCALED: Composite index for the exact query pattern
CREATE INDEX CONCURRENTLY idx_trips_rider_completed
ON trips (rider_id, completed_at DESC);

After the index:

-- Index Scan Backward using idx_trips_rider_completed on trips
--   Index Cond: (rider_id = 'rider-4821'::text)
-- Planning Time: 0.12 ms
-- Execution Time: 3.87 ms

From 2,143ms to 3.87ms. No sharding. No new infrastructure. One DDL statement.

Layer 3: Sharding (When You Actually Need It)

Sharding is for when a single PostgreSQL instance cannot handle the write throughput. Not the read throughput (replicas solve that). Not the query latency (indexes solve that). The write throughput.

For the ride-hailing platform, the decision point: when trip creation rate exceeds what a single primary can handle with WAL writes, index updates, and replication to replicas. For most platforms, that threshold is 10,000-50,000 writes/second on properly tuned hardware.

Section CH8-S2 covers sharding strategies, the ShardResolver implementation, and why the trip history problem was never a sharding problem.

The Proof

After read replicas + composite index, same Locust test:

Users	RPS	p50 (ms)	p99 (ms)	PG Primary CPU	Errors
100	195	4	18	6%	0%
300	580	5	22	9%	0%
500	960	6	28	12%	0%
800	1,520	7	35	15%	0%
1000	1,890	9	42	18%	0%

Metric	Before	After	Factor
Max RPS	720	1,890	2.6x
p99 at 500 users	2,100ms	28ms	75x
PG CPU at 800 users	92%	15%	6x reduction
Error rate at 1000 users	12.3%	0%	Eliminated

The 75x p99 improvement came from two changes: routing reads to replicas (removing load from primary) and adding a composite index (removing sequential scans from every query). Neither required sharding. Neither required new hardware beyond two replica instances.

The Decision Framework

Before proposing database scaling, walk through this checklist:

Run EXPLAIN ANALYZE on the slow queries. If any show sequential scans on tables with more than 1M rows, add indexes first.
Check connection pool metrics. If hikaricp_connections_acquire_seconds p99 > 50ms, the pool is undersized (Chapter 4).
Check cache hit rates. If the same query runs thousands of times per minute with identical parameters, add caching (Chapter 6).
Add read replicas if read RPS exceeds what a single instance can serve after indexes and caching are in place.
Shard only when write throughput on a single primary is the bottleneck, after all of the above.

Most teams jump to step 5. The ride-hailing platform’s trip history problem was step 1.