Real-Time at Scale: WebSockets, SSE, and 100k Concurrent Connections

The Symptom

The ride-hailing platform polls for driver location updates. Every rider with an active booking sends GET /api/drivers/{id}/location every 2 seconds. During Friday evening peak, 50,000 riders are tracking drivers. That is 25,000 requests per second to a single endpoint that reads from Redis, serializes JSON, and sends an HTTP response with headers, cookies, and compression overhead. Each response is 180 bytes of location data wrapped in 1,200 bytes of HTTP overhead.

The polling endpoint’s p99 is 45ms. The infrastructure cost is the problem. 25,000 RPS of polling burns CPU on TLS handshakes, connection setup, request parsing, and response serialization. 85% of polls return the same location because the driver has not moved in 2 seconds. The server does the same work 25,000 times per second to deliver 3,750 actual updates.

The Cause

HTTP request-response is the wrong protocol for continuous data streams. Each poll carries the full cost of an HTTP transaction: TCP connection (or reuse negotiation), TLS, request headers, response headers, and connection teardown. For data that changes 3-5 times per second and flows in one direction (server to client), this overhead dominates.

Two protocols solve this: WebSockets and Server-Sent Events (SSE). They solve different problems.

SSE is HTTP-based, text-only, server-to-client. The browser opens a single HTTP connection. The server holds it open and pushes events as text/event-stream. The browser has a built-in EventSource API with automatic reconnection. Proxies and CDNs handle SSE because it is standard HTTP. The connection costs one file descriptor and ~8KB of memory on the server.

WebSocket is a full-duplex binary protocol. After an HTTP upgrade handshake, the connection switches to the WebSocket protocol. Both sides can send messages at any time. There is no built-in reconnection. Proxies require explicit WebSocket upgrade support. The connection costs one file descriptor and ~12KB of memory on the server.

The ride-hailing platform needs both:

Feature	Direction	Protocol	Reason
Driver location to rider	Server → Client	SSE	One-directional, reconnection matters, CDN-friendly
Ride acceptance by driver	Bidirectional	WebSocket	Driver sends accept/reject, server sends ride details
Surge pricing updates	Server → Client	SSE	Broadcast, one-directional, HTTP-compatible
Driver chat with rider	Bidirectional	WebSocket	Both sides send messages

Using WebSocket for everything is a common mistake. SSE has lower overhead, automatic reconnection, and better proxy compatibility. For server-to-client streaming, SSE is the right tool.

The Baseline

Current polling architecture:

Metric                          Value
Polling RPS (peak)              25,000
Avg response size (with headers) 1,380 bytes
Useful payload                  180 bytes
Payload efficiency              13%
Responses with new data         15%
CPU for polling endpoint        12 cores
Monthly bandwidth (polling)     2.8TB

85% of responses carry stale data. 87% of each response is HTTP overhead. The system works. The cost is waste.

Target architecture with SSE for driver location:

Metric                          Value
Concurrent SSE connections      50,000
Avg event size                  210 bytes (data + SSE framing)
Events with new data            100% (only sent on change)
Estimated CPU                   3 cores
Monthly bandwidth               180GB

The bandwidth drops from 2.8TB to 180GB because the server only sends data when the driver moves. The CPU drops from 12 cores to 3 because there is no request parsing, no header serialization, no TLS renegotiation per update.

The Fix

SSE for driver location streaming

Spring WebFlux serves SSE connections as reactive streams. Each connection is a Flux that emits ServerSentEvent objects:

// SCALED: SSE endpoint for driver location streaming
@RestController
public class DriverLocationSseController {

    private final DriverLocationService locationService;

    @GetMapping(value = "/api/sse/drivers/{driverId}/location",
                produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<ServerSentEvent<DriverLocation>> streamDriverLocation(
            @PathVariable String driverId) {

        return locationService.locationStream(driverId)
            .map(location -> ServerSentEvent.<DriverLocation>builder()
                .id(String.valueOf(location.timestamp()))
                .event("location")
                .data(location)
                .retry(Duration.ofSeconds(5))
                .build()
            )
            .doOnCancel(() ->
                log.info("SSE connection closed for driver {}", driverId)
            );
    }
}

The DriverLocationService exposes a Flux backed by Redis Pub/Sub:

// SCALED: Location stream from Redis Pub/Sub
@Service
public class DriverLocationService {

    private final ReactiveRedisTemplate<String, DriverLocation> redisTemplate;

    public Flux<DriverLocation> locationStream(String driverId) {
        String channel = "driver:location:" + driverId;

        return redisTemplate.listenToChannel(channel)
            .map(ReactiveSubscription.Message::getMessage)
            .distinctUntilChanged(DriverLocation::gridCell)
            .onBackpressureLatest();
    }
}

distinctUntilChanged(DriverLocation::gridCell) suppresses updates when the driver has not moved to a different grid cell. A grid cell is a ~50 meter square. Sub-cell movements are invisible on the rider’s map and waste bandwidth.

onBackpressureLatest() handles the case where the rider’s connection is slow: it drops intermediate locations and sends the latest. The rider sees the current position, not a replay of every past position.

WebSocket for ride acceptance

The driver app needs bidirectional communication: the server sends ride requests, the driver sends accept or reject. WebSocket is the correct protocol:

// SCALED: WebSocket handler for ride acceptance
@Component
public class RideAcceptanceHandler implements WebSocketHandler {

    private final RideMatchingService matchingService;
    private final ObjectMapper objectMapper;

    @Override
    public Mono<Void> handle(WebSocketSession session) {
        String driverId = extractDriverId(session);

        Flux<WebSocketMessage> outgoing = matchingService
            .rideRequestsForDriver(driverId)
            .map(request -> {
                String json = objectMapper.writeValueAsString(request);
                return session.textMessage(json);
            });

        Mono<Void> incoming = session.receive()
            .map(msg -> objectMapper.readValue(
                msg.getPayloadAsText(), RideResponse.class))
            .flatMap(response ->
                matchingService.processDriverResponse(driverId, response))
            .then();

        return Mono.zip(
            session.send(outgoing),
            incoming
        ).then();
    }

    private String extractDriverId(WebSocketSession session) {
        return session.getHandshakeInfo()
            .getHeaders()
            .getFirst("X-Driver-Id");
    }
}

// WebSocket configuration
@Configuration
public class WebSocketConfig {

    @Bean
    public HandlerMapping webSocketMapping(RideAcceptanceHandler handler) {
        Map<String, WebSocketHandler> map = Map.of(
            "/ws/rides", handler
        );
        SimpleUrlHandlerMapping mapping = new SimpleUrlHandlerMapping();
        mapping.setUrlMap(map);
        mapping.setOrder(-1);
        return mapping;
    }

    @Bean
    public WebSocketHandlerAdapter handlerAdapter() {
        return new WebSocketHandlerAdapter();
    }
}

The handler reads and writes concurrently using Mono.zip. Incoming messages (driver responses) process in parallel with outgoing messages (ride requests). The reactive pipeline handles backpressure: if ride requests arrive faster than the driver can respond, the pipeline buffers them in the reactive stream.

Redis Pub/Sub for cross-instance broadcasting

The ride-hailing platform runs 8 instances behind a load balancer. A driver’s location update arrives at instance 3. The 500 riders tracking that driver are spread across all 8 instances. Redis Pub/Sub broadcasts the update:

// SCALED: Publishing driver location to Redis channel
@Service
public class DriverLocationPublisher {

    private final ReactiveRedisTemplate<String, DriverLocation> redisTemplate;

    public Mono<Void> publishLocation(DriverLocation location) {
        String channel = "driver:location:" + location.driverId();
        return redisTemplate.convertAndSend(channel, location).then();
    }
}

Each instance subscribes to channels for the drivers its connected riders are tracking. When a rider opens an SSE connection for driver-123, that instance subscribes to driver:location:driver-123 on Redis. When the driver’s location update publishes to that channel, every subscribed instance receives it and pushes it to its local SSE connections.

This is the fan-out problem: 1 driver update reaches 500 riders across 8 instances. Redis handles the broadcast. Each instance filters to its local connections. The alternative, each instance querying Redis on a timer, reintroduces polling at the infrastructure level.

The Proof

After migrating from polling to SSE for driver location and WebSocket for ride acceptance:

Metric                     Polling       SSE/WS        Delta
Peak RPS (location)        25,000        0 (push)      -100%
Concurrent connections     0             52,000        N/A
CPU (location updates)     12 cores      2.8 cores     -77%
Monthly bandwidth          2.8TB         180GB         -94%
Update latency (p99)       2,000ms       45ms          -98%
  (poll interval)          (next poll)   (instant)

The update latency deserves explanation. With polling every 2 seconds, the average latency for a rider to see a driver’s new position is 1 second (half the poll interval). The p99 is 2 seconds (worst case: the driver moved right after the last poll). With SSE, the update arrives within the Redis Pub/Sub propagation time plus the SSE write time. The p99 is 45ms.

The Locust simulation for 100k concurrent SSE connections and the Kubernetes resource configuration for running this in production are covered in CH12-S1 and CH12-S2.

One constraint: Redis Pub/Sub delivers messages to all subscribers or none. There is no persistence. If an instance restarts and misses a location update, the rider’s map shows the last known position until the next update arrives (typically within 3-5 seconds). For location tracking, this is acceptable. For ride acceptance, where missing a message means a rider waits indefinitely, the WebSocket handler includes acknowledgment logic and the matching service has a timeout-and-reassign path.

Chapter 12-S1 covers the protocol comparison and implementation details. Chapter 12-S2 covers scaling to 100k connections with memory management, file descriptors, and Kubernetes configuration.