KEDA, Event-Driven Scaling, and the Metrics That Matter
KEDA, Event-Driven Scaling, and the Metrics That Matter
The Symptom
The trip analytics consumer processes driver location updates, fare calculations, and ride completions. Each event triggers an aggregation pipeline that updates real-time dashboards. During normal hours, 2 consumer pods handle 400 events/second with 50ms processing time per event. Consumer lag hovers around 100 messages.
Friday at 6:15 PM, trip volume triples. The event production rate jumps from 400 to 1,400 events/second. The 2 consumers still process 400 events/second total. The lag climbs: 1,000 per second of deficit times 60 seconds = 60,000 messages of lag in the first minute. By 6:47 PM, lag reaches 847,000 messages.
The real-time dashboard shows data from 32 minutes ago. The operations team sees surge pricing decisions based on stale demand data. Drivers are dispatched to areas where demand peaked half an hour ago. Riders in currently high-demand areas see no surge pricing because the analytics pipeline has not processed the current data.
There is no autoscaler watching this. HPA does not know Kafka exists. The consumer pods’ CPU is at 35%. Memory is at 28%. From Kubernetes’ perspective, everything is fine.
The Cause
HPA scales on Kubernetes-native metrics: CPU, memory, and custom metrics exposed via the metrics API. Kafka consumer lag is not a Kubernetes metric. It lives in the Kafka broker’s consumer group coordinator.
KEDA bridges this gap. It is a Kubernetes operator that creates and manages HPA resources based on external event sources. KEDA’s architecture:
Kafka Broker KEDA Kubernetes
+-----------+ query +------------+ create +-----+
| Consumer | <------------ | ScaledObj | --------> | HPA |
| Group Lag | | Controller | +-----+
+-----------+ +------------+ |
| v
+----------+ +-----------+
| Metrics | | Deployment|
| Adapter | | Scale |
+----------+ +-----------+
KEDA polls the external metric source (Kafka, Prometheus, RabbitMQ, and 50+ others) and exposes the metric to Kubernetes as an external metric. It then creates or patches an HPA resource that targets the deployment and uses the external metric for scaling decisions.
The scaling formula for the Kafka trigger:
desiredReplicas = ceil(currentLag / lagThreshold)
With a current lag of 847,000 and a lagThreshold of 1,000:
desiredReplicas = ceil(847000 / 1000) = 847
Capped at maxReplicaCount: 20 replicas. KEDA scales to 20 consumers, each processing ~70 events/second. Combined throughput: 1,400 events/second. Production rate matches consumption rate. Lag starts decreasing.
One critical detail: Kafka partitions limit parallelism. If the topic has 16 partitions, scaling to 20 consumers wastes 4 pods. 4 consumers sit idle because there are no partitions for them. The maxReplicaCount must not exceed the partition count.
The Baseline
Current state without KEDA:
Metric Normal Hours Friday Peak
Event production rate 400 eps 1,400 eps
Consumer replicas 2 2
Consumer throughput 400 eps 400 eps
Consumer lag ~100 847,000
Dashboard data staleness <1s 32 min
Surge pricing accuracy >95% <40%
Target state with KEDA:
Metric Normal Hours Friday Peak
Event production rate 400 eps 1,400 eps
Consumer replicas 2 15
Consumer throughput 400 eps 1,400 eps
Consumer lag ~100 <1,000
Dashboard data staleness <1s <3s
Surge pricing accuracy >95% >95%
The Fix
KEDA Kafka trigger for the trip analytics consumer
Install KEDA in the cluster:
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace
Create the ScaledObject:
# SCALED: KEDA ScaledObject for trip analytics consumer
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: trip-analytics-consumer
namespace: ridehailing
spec:
scaleTargetRef:
name: trip-analytics-consumer
minReplicaCount: 2
maxReplicaCount: 20
cooldownPeriod: 120
pollingInterval: 15
advanced:
restoreToOriginalReplicaCount: false
horizontalPodAutoscalerConfig:
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Pods
value: 5
periodSeconds: 30
scaleDown:
stabilizationWindowSeconds: 120
policies:
- type: Percent
value: 25
periodSeconds: 60
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-headless.kafka:9092
consumerGroup: trip-analytics
topic: trip-events
lagThreshold: "1000"
offsetResetPolicy: earliest
allowIdleConsumers: "false"
excludePersistentLag: "false"
Key configuration choices:
pollingInterval: 15 checks Kafka lag every 15 seconds. Faster polling (5s) causes flapping when lag oscillates around the threshold. Slower polling (60s) delays scaling by a full minute.
cooldownPeriod: 120 waits 2 minutes after the last scale event before scaling down. Friday surges have 5-10 minute lulls between peaks. Scaling down during a lull and back up during the next peak wastes startup time and increases lag.
allowIdleConsumers: "false" prevents scaling beyond the partition count. With 20 partitions on the trip-events topic, this caps effective scaling at 20. If the topic had 16 partitions, only 16 consumers would get assignments; the remaining 4 would idle.
KEDA Prometheus trigger for the driver matching service
The driver matching service receives HTTP requests to find available drivers for a ride request. During surge, pending match requests queue up. A Prometheus metric tracks the queue depth:
// SCALED: Prometheus metric for pending match requests
@Component
public class MatchingMetrics {
private final AtomicInteger pendingMatches;
private final MeterRegistry registry;
public MatchingMetrics(MeterRegistry registry) {
this.registry = registry;
this.pendingMatches = registry.gauge(
"matching_pending_requests",
new AtomicInteger(0)
);
}
public void incrementPending() {
pendingMatches.incrementAndGet();
}
public void decrementPending() {
pendingMatches.decrementAndGet();
}
}
# SCALED: KEDA Prometheus trigger for matching service
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: driver-matching-scaler
namespace: ridehailing
spec:
scaleTargetRef:
name: driver-matching
minReplicaCount: 3
maxReplicaCount: 30
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring:9090
metricName: matching_pending_total
query: |
sum(matching_pending_requests{namespace="ridehailing"})
threshold: "100"
activationThreshold: "10"
activationThreshold: "10" prevents scaling from zero when a trivial number of requests are pending. The threshold of 100 means KEDA targets one replica per 100 pending requests. At 500 pending requests: ceil(500/100) = 5 replicas.
Scale-to-zero for batch workloads
The nightly fare reconciliation job consumes events from a fare-reconciliation topic. During the day, the topic is empty. At midnight, a batch job publishes the day’s fare adjustments. Running consumer pods 24 hours for a workload that has events for 45 minutes is waste.
# SCALED: Scale-to-zero for fare reconciliation consumer
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: fare-reconciliation-consumer
namespace: ridehailing
spec:
scaleTargetRef:
name: fare-reconciliation
minReplicaCount: 0
maxReplicaCount: 10
cooldownPeriod: 300
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-headless.kafka:9092
consumerGroup: fare-reconciliation
topic: fare-reconciliation-events
lagThreshold: "500"
activationLagThreshold: "1"
minReplicaCount: 0 enables scale-to-zero. When no messages are pending, KEDA scales the deployment to 0 replicas. Zero pods running, zero resources consumed.
activationLagThreshold: "1" triggers the first scale-up when even a single message appears. The deployment goes from 0 to 1 replica as soon as the batch producer starts publishing. From there, normal lag-based scaling kicks in: at 5,000 pending messages, KEDA scales to ceil(5000/500) = 10 replicas.
The coldstart penalty: going from 0 to 1 takes 30-60 seconds (image pull, JVM startup, Kafka consumer group rebalance). For a nightly batch job, 60 seconds of startup delay is acceptable. For a latency-sensitive service, scale-to-zero is the wrong choice.
Combining HPA and KEDA
The ride-hailing platform uses both:
Service Scaler Metric Reason
rider-api HPA HTTP requests/sec Request-driven, synchronous
trip-analytics-consumer KEDA Kafka consumer lag Event-driven, async
driver-matching KEDA Prometheus pending count Queue depth
fare-reconciliation KEDA Kafka lag + zero-scale Batch, cost optimization
surge-pricing-calc VPA Memory recommendation Memory-bound, stateful
HPA and KEDA do not conflict because they target different deployments. Within a single deployment, use only one: KEDA creates its own HPA resource. Running both on the same deployment causes the two HPA resources to fight.
Locust: Friday evening scaling timeline
# SCALED: Locust simulating Friday evening event surge
from locust import HttpUser, task, between, events
import time, json
class AnalyticsEventProducer(HttpUser):
"""Simulates Kafka producer load by sending events via HTTP ingestion endpoint"""
wait_time = between(0.05, 0.2)
@task(5)
def driver_location_update(self):
self.client.post("/api/events/driver-location", json={
"driver_id": f"driver-{self.environment.runner.user_count % 5000}",
"lat": 40.7128 + (self.environment.runner.user_count % 100) * 0.001,
"lng": -74.0060 + (self.environment.runner.user_count % 100) * 0.001,
"timestamp": int(time.time() * 1000)
})
@task(3)
def trip_completion(self):
self.client.post("/api/events/trip-complete", json={
"trip_id": f"trip-{int(time.time() * 1000)}",
"fare": 24.50,
"distance_km": 8.3,
"duration_min": 22
})
@task(1)
def surge_update(self):
self.client.post("/api/events/surge-update", json={
"zone_id": f"zone-{self.environment.runner.user_count % 200}",
"multiplier": 1.8,
"demand_score": 85
})
Run the ramp:
locust -f locust_keda_test.py \
--host=https://event-ingestion.ridehailing.internal \
--users 5000 \
--spawn-rate 100 \
--run-time 600s \
--headless \
--csv=keda_scaling_test
Expected scaling timeline:
Time Event Rate Consumer Lag KEDA Replicas Action
T+0 400 eps 100 2 Baseline
T+30 800 eps 12,000 2 Lag building
T+45 1,000 eps 24,000 2→5 KEDA first scale
T+60 1,200 eps 35,000 5→8 KEDA second scale
T+90 1,400 eps 28,000 8→12 Lag stabilizing
T+120 1,400 eps 8,000 12→15 Catching up
T+180 1,400 eps 980 15 Steady state
T+360 800 eps 200 15 Cool-down wait
T+480 400 eps 100 15→8 Scale-down starts
T+600 400 eps 80 8→4 Scale-down continues
KEDA’s first scale event fires at T+45 when lag crosses 1,000. By T+180, 15 consumers have caught up with the 1,400 eps production rate. Lag stabilizes below 1,000 messages. The cooldown period (120s) prevents premature scale-down during brief traffic dips.
The Proof
After deploying KEDA for the analytics pipeline:
Metric Before KEDA After KEDA Delta
Peak consumer lag 847,000 980 -99.9%
Dashboard staleness (peak) 32 min <3s -99.8%
Surge pricing accuracy <40% >95% +137%
Consumer pods (peak) 2 15 +650%
Consumer pods (off-peak) 2 2 No change
Fare recon pods (idle) 2 (24/7) 0 -100%
Monthly compute cost $840 $620 -26%
The compute cost decreased despite running more pods during peak. The savings come from scale-to-zero on the fare reconciliation consumers (eliminated 2 pods running 23 hours/day with no work) and from KEDA scaling down the analytics consumers during off-peak hours instead of running a fixed over-provisioned fleet.
The dashboard now shows real-time data during Friday surge. Surge pricing reacts to current demand, not 32-minute-old demand. Drivers receive dispatch to areas where riders are requesting rides now, not where they were requesting rides half an hour ago.