Async by Default: Kafka, Backpressure, and the Queue That Saved the Service

The Symptom

Friday evening surge. The trip completion endpoint handles 3,800 RPS. Every trip completion triggers four downstream operations inline: fare finalization, analytics aggregation, receipt email, and surge price recalculation. The endpoint’s p99 is 450ms. Riders stare at a spinner for half a second after every trip. Driver apps lag behind because the same thread pool is blocked processing analytics writes. At 4,200 RPS the thread pool saturates and the error rate hits 12%.

None of the downstream work requires a synchronous response. The rider does not wait for analytics. The rider does not wait for surge recalculation. The rider waits for the fare. Everything else is a side effect.

Synchronous vs asynchronous trip completion processing flow

The top row shows the synchronous design: every trip completion forces the rider to wait while the API processes fare calculation, analytics writes, email notifications, and surge recalculation in sequence—450ms of latency, only 15ms of which the rider actually needs. The bottom row shows the async redesign: the API handles only fare calculation on the critical path and returns in 30ms, then publishes a trip completion event to Kafka. Independent consumers process analytics, notifications, and surge recalculation asynchronously, with zero impact on rider-facing latency. This decoupling also means a slow analytics service no longer cascades into rider-visible errors.

The Cause

The trip completion handler is a synchronous monolith inside a single HTTP request. One thread picks up the request, calculates the fare, writes the trip record, calls the analytics service, calls the notification service, triggers surge recalculation, and returns. Four network calls in sequence. Each one adds latency. Each one adds a failure mode. When the analytics service is slow (and it is always slow during surge), every trip completion pays for it.

// BOTTLENECK: Four synchronous calls in one request thread
@PostMapping("/api/trips/complete")
public ResponseEntity<TripReceipt> completeTrip(@RequestBody TripCompletionRequest request) {
    Trip trip = tripService.finalizeFare(request);           // 15ms - the only work the rider cares about
    analyticsService.recordTrip(trip);                       // 120ms - writes to analytics DB
    notificationService.sendReceipt(trip);                   // 85ms - calls email provider API
    surgeService.recalculateZone(trip.getPickupZoneId());    // 230ms - recomputes zone pricing
    return ResponseEntity.ok(trip.toReceipt());
}

Total: 450ms. The rider pays the latency tax for all four calls. The fare finalization takes 15ms. The other 435ms is wasted rider-facing latency.

The Baseline

# load-tests/trip_completion_sync_locustfile.py
from locust import HttpUser, task, between, LoadTestShape
import random
import string

class TripCompletionUser(HttpUser):
    wait_time = between(0.1, 0.3)

    @task
    def complete_trip(self):
        trip_id = ''.join(random.choices(string.ascii_lowercase, k=12))
        self.client.post(
            "/api/trips/complete",
            json={
                "tripId": trip_id,
                "riderId": f"rider-{random.randint(1, 100000)}",
                "driverId": f"driver-{random.randint(1, 20000)}",
                "fare": round(random.uniform(5.0, 85.0), 2),
                "distance": round(random.uniform(0.5, 30.0), 1),
                "pickupZoneId": f"zone-{random.randint(1, 500)}"
            },
            name="/api/trips/complete [SYNC]"
        )


class SyncTripShape(LoadTestShape):
    stages = [
        {"duration": 60,  "users": 500,  "spawn_rate": 50},
        {"duration": 120, "users": 1000, "spawn_rate": 50},
        {"duration": 180, "users": 2000, "spawn_rate": 50},
        {"duration": 240, "users": 3000, "spawn_rate": 50},
        {"duration": 300, "users": 4000, "spawn_rate": 50},
    ]

    def tick(self):
        run_time = self.get_run_time()
        for stage in self.stages:
            if run_time < stage["duration"]:
                return (stage["users"], stage["spawn_rate"])
        return None

Synchronous trip completion, all four downstream calls inline:

Users	RPS	p50 (ms)	p99 (ms)	Error Rate	Tomcat Threads Active
500	1,200	180	310	0%	85
1000	2,100	220	380	0%	142
2000	3,200	280	450	0.3%	198
3000	3,600	350	680	4.1%	200 (maxed)
4000	3,400	520	1,200	12.4%	200 (maxed)

The thread pool maxes out at 200 threads. Each thread is blocked for 450ms on average. At 3,000+ users, requests queue up waiting for a thread. The error rate is thread pool rejection.

The Fix

The decision rule: if the rider is staring at a spinner waiting for the response, it stays synchronous. If the outcome is a side effect, it goes to Kafka.

Fare finalization: synchronous. The rider needs the fare amount. Analytics aggregation: async. No rider waits for analytics writes. Receipt email: async. The email arrives in seconds, not milliseconds. Surge recalculation: async. The next rider benefits, not the current one.

// SCALED: Trip completion publishes event, returns immediately after fare
@PostMapping("/api/trips/complete")
public ResponseEntity<TripReceipt> completeTrip(@RequestBody TripCompletionRequest request) {
    Trip trip = tripService.finalizeFare(request);           // 15ms - synchronous
    kafkaTemplate.send("ride-completed", trip.getId(),
        new RideCompletedEvent(trip));                       // <1ms - fire and forget to Kafka
    return ResponseEntity.ok(trip.toReceipt());              // Total: ~16ms
}

The Kafka producer config makes this safe:

# application.yml
spring:
  kafka:
    producer:
      bootstrap-servers: kafka-1:9092,kafka-2:9092,kafka-3:9092
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
      acks: all # Wait for all in-sync replicas
      retries: 3 # Retry on transient failures
      properties:
        enable.idempotence: true # Prevent duplicate events on retry
        max.in.flight.requests.per.connection: 5 # Safe with idempotence enabled
        delivery.timeout.ms: 30000
        linger.ms: 5 # Batch small, latency matters
        batch.size: 16384

acks=all with enable.idempotence=true guarantees that the event reaches Kafka exactly once and is replicated before the producer gets an acknowledgment. The send() call returns a CompletableFuture. The endpoint does not block on it. If Kafka is down, the retry mechanism handles it.

Three consumers pick up the event independently:

// SCALED: Analytics consumer processes trip data asynchronously
@Component
public class TripAnalyticsConsumer {

    @KafkaListener(
        topics = "ride-completed",
        groupId = "trip-analytics",
        concurrency = "3"
    )
    public void onRideCompleted(RideCompletedEvent event) {
        analyticsService.recordTrip(event.toTrip());
    }
}

// SCALED: Notification consumer sends receipt asynchronously
@Component
public class NotificationConsumer {

    @KafkaListener(
        topics = "ride-completed",
        groupId = "trip-notifications",
        concurrency = "2"
    )
    public void onRideCompleted(RideCompletedEvent event) {
        notificationService.sendReceipt(event.toTrip());
    }
}

// SCALED: Surge consumer recalculates zone pricing asynchronously
@Component
public class SurgeRecalculationConsumer {

    @KafkaListener(
        topics = "ride-completed",
        groupId = "surge-recalculation",
        concurrency = "3"
    )
    public void onRideCompleted(RideCompletedEvent event) {
        surgeService.recalculateZone(event.getPickupZoneId());
    }
}

Each consumer group gets its own copy of every event. Analytics, notifications, and surge recalculation are fully decoupled. If the analytics consumer falls behind, the rider never knows. If the notification service throws an exception, the trip is already completed and the fare is already charged.

The Rebalance Storm

During the first deployment of consumers, a rebalance storm hit. Three consumers started, joined the group, and Kafka reassigned partitions. During rebalancing, no consumer in the group processes messages. For a group with 12 partitions and 3 consumers, the rebalance took 8 seconds. Then one consumer restarted due to a failed health check, triggering another rebalance. 30 seconds of zero throughput.

Fix: use the CooperativeStickyAssignor. Instead of revoking all partitions and reassigning, it incrementally migrates partitions. Rebalance time dropped from 8 seconds to under 200ms.

# application.yml
spring:
  kafka:
    consumer:
      properties:
        partition.assignment.strategy: org.apache.kafka.clients.consumer.CooperativeStickyAssignor
        session.timeout.ms: 30000
        heartbeat.interval.ms: 10000
        max.poll.interval.ms: 300000

The Proof

Same Locust test, trip completion now publishes to Kafka instead of calling downstream services inline:

Users	RPS	p50 (ms)	p99 (ms)	Error Rate	Tomcat Threads Active
500	1,400	12	28	0%	22
1000	2,800	14	32	0%	38
2000	5,400	15	45	0%	62
3000	7,800	16	58	0%	84
4000	9,600	18	85	0%	108

Metric	Sync (Inline)	Async (Kafka)	Improvement
p99 at 2000 users	450ms	45ms	10x
p99 at 4000 users	1,200ms	85ms	14x
Max RPS before errors	3,200	9,600+	3x
Threads at 4000 users	200 (maxed)	108	46% reduction
Error rate at 4000 users	12.4%	0%	Eliminated

The endpoint does 16ms of actual work. Kafka’s send() adds less than 1ms. The thread is free in 17ms instead of 450ms. The same 200-thread pool now handles 9,600 RPS because each thread is occupied for 17ms instead of 450ms. That is a 26x improvement in thread utilization.

The downstream services still do the same work. Analytics still takes 120ms. Receipt emails still take 85ms. Surge recalculation still takes 230ms. But that work happens on consumer threads, not on request-serving threads. The rider does not pay for it.