Async by Default: Kafka, Backpressure, and the Queue That Saved the Service
Async by Default: Kafka, Backpressure, and the Queue That Saved the Service
The Symptom
Friday evening surge. The trip completion endpoint handles 3,800 RPS. Every trip completion triggers four downstream operations inline: fare finalization, analytics aggregation, receipt email, and surge price recalculation. The endpoint’s p99 is 450ms. Riders stare at a spinner for half a second after every trip. Driver apps lag behind because the same thread pool is blocked processing analytics writes. At 4,200 RPS the thread pool saturates and the error rate hits 12%.
None of the downstream work requires a synchronous response. The rider does not wait for analytics. The rider does not wait for surge recalculation. The rider waits for the fare. Everything else is a side effect.
The top row shows the synchronous design: every trip completion forces the rider to wait while the API processes fare calculation, analytics writes, email notifications, and surge recalculation in sequence—450ms of latency, only 15ms of which the rider actually needs. The bottom row shows the async redesign: the API handles only fare calculation on the critical path and returns in 30ms, then publishes a trip completion event to Kafka. Independent consumers process analytics, notifications, and surge recalculation asynchronously, with zero impact on rider-facing latency. This decoupling also means a slow analytics service no longer cascades into rider-visible errors.
The Cause
The trip completion handler is a synchronous monolith inside a single HTTP request. One thread picks up the request, calculates the fare, writes the trip record, calls the analytics service, calls the notification service, triggers surge recalculation, and returns. Four network calls in sequence. Each one adds latency. Each one adds a failure mode. When the analytics service is slow (and it is always slow during surge), every trip completion pays for it.
// BOTTLENECK: Four synchronous calls in one request thread
@PostMapping("/api/trips/complete")
public ResponseEntity<TripReceipt> completeTrip(@RequestBody TripCompletionRequest request) {
Trip trip = tripService.finalizeFare(request); // 15ms - the only work the rider cares about
analyticsService.recordTrip(trip); // 120ms - writes to analytics DB
notificationService.sendReceipt(trip); // 85ms - calls email provider API
surgeService.recalculateZone(trip.getPickupZoneId()); // 230ms - recomputes zone pricing
return ResponseEntity.ok(trip.toReceipt());
}
Total: 450ms. The rider pays the latency tax for all four calls. The fare finalization takes 15ms. The other 435ms is wasted rider-facing latency.
The Baseline
# load-tests/trip_completion_sync_locustfile.py
from locust import HttpUser, task, between, LoadTestShape
import random
import string
class TripCompletionUser(HttpUser):
wait_time = between(0.1, 0.3)
@task
def complete_trip(self):
trip_id = ''.join(random.choices(string.ascii_lowercase, k=12))
self.client.post(
"/api/trips/complete",
json={
"tripId": trip_id,
"riderId": f"rider-{random.randint(1, 100000)}",
"driverId": f"driver-{random.randint(1, 20000)}",
"fare": round(random.uniform(5.0, 85.0), 2),
"distance": round(random.uniform(0.5, 30.0), 1),
"pickupZoneId": f"zone-{random.randint(1, 500)}"
},
name="/api/trips/complete [SYNC]"
)
class SyncTripShape(LoadTestShape):
stages = [
{"duration": 60, "users": 500, "spawn_rate": 50},
{"duration": 120, "users": 1000, "spawn_rate": 50},
{"duration": 180, "users": 2000, "spawn_rate": 50},
{"duration": 240, "users": 3000, "spawn_rate": 50},
{"duration": 300, "users": 4000, "spawn_rate": 50},
]
def tick(self):
run_time = self.get_run_time()
for stage in self.stages:
if run_time < stage["duration"]:
return (stage["users"], stage["spawn_rate"])
return None
Synchronous trip completion, all four downstream calls inline:
| Users | RPS | p50 (ms) | p99 (ms) | Error Rate | Tomcat Threads Active |
|---|---|---|---|---|---|
| 500 | 1,200 | 180 | 310 | 0% | 85 |
| 1000 | 2,100 | 220 | 380 | 0% | 142 |
| 2000 | 3,200 | 280 | 450 | 0.3% | 198 |
| 3000 | 3,600 | 350 | 680 | 4.1% | 200 (maxed) |
| 4000 | 3,400 | 520 | 1,200 | 12.4% | 200 (maxed) |
The thread pool maxes out at 200 threads. Each thread is blocked for 450ms on average. At 3,000+ users, requests queue up waiting for a thread. The error rate is thread pool rejection.
The Fix
The decision rule: if the rider is staring at a spinner waiting for the response, it stays synchronous. If the outcome is a side effect, it goes to Kafka.
Fare finalization: synchronous. The rider needs the fare amount. Analytics aggregation: async. No rider waits for analytics writes. Receipt email: async. The email arrives in seconds, not milliseconds. Surge recalculation: async. The next rider benefits, not the current one.
// SCALED: Trip completion publishes event, returns immediately after fare
@PostMapping("/api/trips/complete")
public ResponseEntity<TripReceipt> completeTrip(@RequestBody TripCompletionRequest request) {
Trip trip = tripService.finalizeFare(request); // 15ms - synchronous
kafkaTemplate.send("ride-completed", trip.getId(),
new RideCompletedEvent(trip)); // <1ms - fire and forget to Kafka
return ResponseEntity.ok(trip.toReceipt()); // Total: ~16ms
}
The Kafka producer config makes this safe:
# application.yml
spring:
kafka:
producer:
bootstrap-servers: kafka-1:9092,kafka-2:9092,kafka-3:9092
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
acks: all # Wait for all in-sync replicas
retries: 3 # Retry on transient failures
properties:
enable.idempotence: true # Prevent duplicate events on retry
max.in.flight.requests.per.connection: 5 # Safe with idempotence enabled
delivery.timeout.ms: 30000
linger.ms: 5 # Batch small, latency matters
batch.size: 16384
acks=all with enable.idempotence=true guarantees that the event reaches Kafka exactly once and is replicated before the producer gets an acknowledgment. The send() call returns a CompletableFuture. The endpoint does not block on it. If Kafka is down, the retry mechanism handles it.
Three consumers pick up the event independently:
// SCALED: Analytics consumer processes trip data asynchronously
@Component
public class TripAnalyticsConsumer {
@KafkaListener(
topics = "ride-completed",
groupId = "trip-analytics",
concurrency = "3"
)
public void onRideCompleted(RideCompletedEvent event) {
analyticsService.recordTrip(event.toTrip());
}
}
// SCALED: Notification consumer sends receipt asynchronously
@Component
public class NotificationConsumer {
@KafkaListener(
topics = "ride-completed",
groupId = "trip-notifications",
concurrency = "2"
)
public void onRideCompleted(RideCompletedEvent event) {
notificationService.sendReceipt(event.toTrip());
}
}
// SCALED: Surge consumer recalculates zone pricing asynchronously
@Component
public class SurgeRecalculationConsumer {
@KafkaListener(
topics = "ride-completed",
groupId = "surge-recalculation",
concurrency = "3"
)
public void onRideCompleted(RideCompletedEvent event) {
surgeService.recalculateZone(event.getPickupZoneId());
}
}
Each consumer group gets its own copy of every event. Analytics, notifications, and surge recalculation are fully decoupled. If the analytics consumer falls behind, the rider never knows. If the notification service throws an exception, the trip is already completed and the fare is already charged.
The Rebalance Storm
During the first deployment of consumers, a rebalance storm hit. Three consumers started, joined the group, and Kafka reassigned partitions. During rebalancing, no consumer in the group processes messages. For a group with 12 partitions and 3 consumers, the rebalance took 8 seconds. Then one consumer restarted due to a failed health check, triggering another rebalance. 30 seconds of zero throughput.
Fix: use the CooperativeStickyAssignor. Instead of revoking all partitions and reassigning, it incrementally migrates partitions. Rebalance time dropped from 8 seconds to under 200ms.
# application.yml
spring:
kafka:
consumer:
properties:
partition.assignment.strategy: org.apache.kafka.clients.consumer.CooperativeStickyAssignor
session.timeout.ms: 30000
heartbeat.interval.ms: 10000
max.poll.interval.ms: 300000
The Proof
Same Locust test, trip completion now publishes to Kafka instead of calling downstream services inline:
| Users | RPS | p50 (ms) | p99 (ms) | Error Rate | Tomcat Threads Active |
|---|---|---|---|---|---|
| 500 | 1,400 | 12 | 28 | 0% | 22 |
| 1000 | 2,800 | 14 | 32 | 0% | 38 |
| 2000 | 5,400 | 15 | 45 | 0% | 62 |
| 3000 | 7,800 | 16 | 58 | 0% | 84 |
| 4000 | 9,600 | 18 | 85 | 0% | 108 |
| Metric | Sync (Inline) | Async (Kafka) | Improvement |
|---|---|---|---|
| p99 at 2000 users | 450ms | 45ms | 10x |
| p99 at 4000 users | 1,200ms | 85ms | 14x |
| Max RPS before errors | 3,200 | 9,600+ | 3x |
| Threads at 4000 users | 200 (maxed) | 108 | 46% reduction |
| Error rate at 4000 users | 12.4% | 0% | Eliminated |
The endpoint does 16ms of actual work. Kafka’s send() adds less than 1ms. The thread is free in 17ms instead of 450ms. The same 200-thread pool now handles 9,600 RPS because each thread is occupied for 17ms instead of 450ms. That is a 26x improvement in thread utilization.
The downstream services still do the same work. Analytics still takes 120ms. Receipt emails still take 85ms. Surge recalculation still takes 230ms. But that work happens on consumer threads, not on request-serving threads. The rider does not pay for it.