Horizontal Scaling and the State Problem
Horizontal Scaling and the State Problem
Adding pods should increase throughput linearly. Deploy 2 pods, handle 2x the traffic. Deploy 8, handle 8x. This is the promise of horizontal scaling and it is a lie whenever your service holds state in local memory.
The ride-hailing platform has three pieces of state that prevent linear scaling:
- Driver location cache: a
ConcurrentHashMap<String, DriverLocation>updated by location pings and queried by the matching algorithm. Each pod has a partial view of driver positions. - Surge pricing multiplier: computed every 30 seconds from driver supply and rider demand. Stored in a local variable. Each pod computes a different multiplier from its own traffic sample.
- HTTP session: rider authentication state stored in Tomcat’s session map, pinned to a specific pod by a session affinity cookie.
With 2 pods behind a round-robin load balancer, each pod sees roughly 50% of driver location updates. The matching algorithm on pod A does not know about drivers that sent their last update to pod B. Half the drivers are invisible to half the matching queries. The rider sees “no drivers available” while three drivers are 200 meters away, reporting their location to the other pod.
Scaling from 2 pods to 8 makes it worse. Each pod sees 12.5% of driver updates. The matching accuracy degrades proportionally.
The State Spectrum
Not all state is equally problematic. Classify it:
| State | Scope | Staleness tolerance | Fix |
|---|---|---|---|
| Driver location | Global, real-time | < 3 seconds | Redis GeoSet |
| Surge multiplier | Global, computed | < 30 seconds | Redis Hash |
| HTTP session | Per-user, sticky | Session duration | Spring Session + Redis |
| Request-scoped data | Per-request | None (ephemeral) | No fix needed |
| JVM-level cache (Caffeine) | Per-pod, warm | Minutes | Redis or accept inconsistency |
Request-scoped data is fine. It lives and dies with the request. JVM-level Caffeine caches are acceptable for non-critical data where staleness across pods is tolerable, for example, static configuration that changes weekly. Everything else must be externalized.
The diagram contrasts the two approaches. On the left, each server holds its own session data and the load balancer must route users to the same server (sticky sessions). If a server dies, those sessions are lost. On the right, all state lives in Redis, so any server can handle any request and failover is seamless. The throughput numbers at the bottom tell the story: with 8 pods, stateful architecture achieves only 3.1x throughput while stateless achieves 7.2x, approaching the ideal linear scaling.
The Cost of Externalization
Moving state from local memory to Redis introduces a network hop. Local ConcurrentHashMap.get() completes in ~100 nanoseconds. Redis GET completes in 1-3 milliseconds. That is a 10,000x slowdown per access.
This is the trade. Pods become stateless and scale linearly, but every state access costs 1-3ms instead of 100ns. The question is whether the linear scaling outweighs the per-request cost. For the ride-hailing platform, it does. At 8 pods, the aggregate throughput with externalized state is 7.2x the single-pod throughput. With local state, it is 3.1x because pod-local driver caches are incomplete and matching queries produce incorrect results that trigger retries.
The Locust test that proves this is in section CH3-S2. First, understand what sticky sessions cost.