The Scaling Ceiling Diagnostic
The Scaling Ceiling Diagnostic
The Symptom
The team has been fighting fires for three months. Every Friday evening surge pushes the ride-hailing platform past its limits. The on-call rotation is burning people out. The engineering manager schedules a meeting titled “Architecture Discussion.” Everyone knows what that means. Someone will say “microservices.” Someone will say “Rust.” Someone will say “rewrite.”
Before that meeting happens, the principal engineer runs the diagnostic. Ten steps. Cheapest first. The answer comes at step 1.
The Cause
Scaling problems cluster into two categories. Operational problems are misconfigurations, missing optimizations, absent infrastructure that can be added without changing the codebase’s structure. Architectural problems are structural limitations that cannot be fixed by tuning or adding infrastructure.
The distinction matters because the cost difference is three orders of magnitude. An index takes minutes. A pool size change takes a deploy. A cache layer takes a sprint. A full rewrite takes 12-18 months, and there is no guarantee it solves the problem.
The diagnostic checklist orders techniques by cost. Each step references the chapter where it was covered in depth. Stop when the SLO is met.
The diagnostic orders ten scaling techniques from cheapest to most expensive. Each step has an “SLO met?” checkpoint—if yes, stop immediately. Green steps (database indexes, pool sizing, thread config) cost nothing and take hours. Blue steps (caching, autoscaling) require minimal spend and take days. Yellow steps (async processing, rate limiting, circuit breakers) are medium-cost and take weeks. Red steps (read replicas, sharding) are expensive and take weeks to months. If all ten techniques are exhausted and the SLO is still violated, the problem is architectural and requires the decision framework in CH22-S2.
The Baseline
The fare calculation service. 10,000 RPS target. Current state:
// BOTTLENECK: Fare calculation service
// No indexes. Default pool sizes. No caching.
// No autoscaling. Synchronous surge pricing call.
@Service
public class FareCalculationService {
private final JdbcTemplate jdbcTemplate;
private final RestTemplate surgeClient;
public FareEstimate calculate(FareRequest request) {
// Step 1 candidate: this query does a seq scan
FareConfig config = jdbcTemplate.queryForObject(
"""
SELECT base_fare, per_km_rate, per_min_rate
FROM fare_config fc
JOIN surge_zones sz
ON fc.zone_id = sz.zone_id
JOIN time_factors tf
ON fc.time_slot = tf.time_slot
WHERE sz.zone_id = ?
AND tf.day_of_week = ?
AND tf.hour_of_day = ?
AND fc.vehicle_type = ?
AND fc.effective_date <= CURRENT_DATE
AND (fc.expiry_date IS NULL
OR fc.expiry_date > CURRENT_DATE)
""",
new Object[]{
request.getZoneId(),
request.getDayOfWeek(),
request.getHour(),
request.getVehicleType()
},
FareConfig.class);
// Step 2 candidate: pool exhaustion under load
// Step 6 candidate: synchronous HTTP call
double surge = surgeClient.getForObject(
"http://surge-service/api/surge/"
+ request.getZoneId(),
Double.class);
return new FareEstimate(
config.getBaseFare()
+ config.getPerKmRate()
* request.getDistanceKm()
+ config.getPerMinRate()
* request.getEstimatedMinutes(),
surge);
}
}
Baseline Locust at 10,000 RPS:
Before diagnostic:
/api/fares/estimate p50=450ms p99=2400ms errors=8.2%
PostgreSQL CPU: 98%
Connection pool waits: 4,200/sec
Thread pool queue depth: 890
The Fix
Step 1: Check Indexes
-- SCALED: The diagnostic starts here
-- Run EXPLAIN ANALYZE on the hot query
EXPLAIN (ANALYZE, BUFFERS) SELECT ...
-- Output: Seq Scan on fare_config (rows=340,000)
-- Cost: 45ms per execution
-- Fix: composite index
CREATE INDEX idx_fare_config_lookup
ON fare_config (zone_id, vehicle_type, effective_date)
INCLUDE (base_fare, per_km_rate, per_min_rate);
CREATE INDEX idx_time_factors_lookup
ON time_factors (day_of_week, hour_of_day, time_slot);
After Step 1:
/api/fares/estimate p50=35ms p99=280ms errors=0.3%
PostgreSQL CPU: 18%
Query time: 0.3ms (down from 45ms)
SLO (p99 < 500ms): MET
For the fare calculation, step 1 was sufficient. The diagnostic stops here.
The Counter-Example: Driver Matching
The driver matching service has a different problem. At 50,000 match requests per second:
// BOTTLENECK: Driver matching coupled to trip service
@Service
public class DriverMatchingService {
private final TripRepository tripRepository;
private final DriverRepository driverRepository;
public Mono<DriverMatch> findBestDriver(
MatchRequest request) {
// Problem 1: reads from the same database as trips
// Under trip write load, matching queries compete
// for connections
return driverRepository
.findAvailableInZone(request.getZoneId())
.collectList()
.flatMap(drivers -> {
// Problem 2: for each driver candidate,
// queries trip history to check reliability
return Flux.fromIterable(drivers)
.flatMap(driver ->
tripRepository
.countCompletedTrips(
driver.getId())
.map(count ->
new ScoredDriver(
driver, count)))
.sort(Comparator.comparing(
ScoredDriver::score).reversed())
.next();
});
}
}
Running the diagnostic on driver matching:
Step 1 (Indexes): Applied. p99 drops from 800ms to 400ms.
Still above 200ms SLO. Continue.
Step 2 (Pool sizes): Increased to 30. p99 drops to 350ms.
Still above SLO. Continue.
Step 3 (Threads): Optimized. p99 drops to 320ms.
Still above SLO. Continue.
Step 4 (Caching): Driver scores cached in Redis.
p99 drops to 180ms. SLO MET.
For driver matching, step 4 resolved it. Still no rewrite.
When the Diagnostic Reaches Step 10
If all 10 steps are applied and the SLO is still violated, three architectural indicators confirm the problem is structural:
Architectural Ceiling Indicators:
1. Deployment coupling
The monolith deploys in 35 minutes because every
component deploys together. Scaling the fare service
requires scaling everything else.
2. Conflicting scaling requirements
Driver matching needs 50 pods for CPU-intensive
scoring. Trip management needs 8 pods with large
memory for caching. Scaling the monolith gives both
components the wrong resource profile.
3. Cross-context data coupling
The driver matching query joins trip_history,
driver_profiles, and fare_config in a single query.
These are three bounded contexts forced into one
database. Sharding cannot separate them because
the joins require co-location.
// BOTTLENECK: Architectural ceiling
// These three concerns cannot be scaled independently
// in the current monolith
@Service
public class CoupledMonolith {
// Scaling requirement: 50 pods, CPU-heavy
public DriverMatch matchDriver(MatchRequest req) {
// Joins driver_profiles + trip_history + fare_config
// Cannot separate without breaking the join
}
// Scaling requirement: 8 pods, memory-heavy
public TripDetails getTrip(String tripId) {
// Large result sets, needs RAM for caching
}
// Scaling requirement: 20 pods, IO-heavy
public FareEstimate calculateFare(FareRequest req) {
// High RPS, needs many connections
}
}
When the diagnostic confirms an architectural ceiling, the response is targeted extraction (strangler fig), not a full rewrite. CH22-S2 covers the extraction process and the decision framework.
Locust Diagnostic in Action
Run the diagnostic as a series of Locust tests, applying one step at a time:
# SCALED: Locust diagnostic runner
from locust import HttpUser, task, between
import time
class DiagnosticUser(HttpUser):
wait_time = between(0.05, 0.1)
host = "http://rider-api:8080"
@task
def fare_estimate(self):
self.client.get(
"/api/fares/estimate"
+ "?zoneId=manhattan-midtown"
+ "&vehicleType=standard"
+ "&dayOfWeek=FRIDAY"
+ "&hour=18")
Diagnostic Results (cumulative):
Step Change p99 RPS Cap PG CPU Errors
Base No optimizations 2400ms 2,000 98% 8.2%
1 Add indexes 280ms 12,000 18% 0.3%
*** SLO MET - STOP ***
For fare calculation, one step. 12 seconds of work.
Diagnostic Results for driver matching (cumulative):
Step Change p99 RPS Cap PG CPU Errors
Base No optimizations 800ms 8,000 82% 4.1%
1 Add indexes 400ms 15,000 55% 1.2%
2 Pool size 15→30 350ms 20,000 55% 0.8%
3 Thread optimization 320ms 25,000 52% 0.5%
4 Redis score cache 180ms 50,000 30% 0.1%
*** SLO MET - STOP ***
Four steps. Three days of work. No rewrite.
The Proof
The diagnostic answered both questions without a rewrite discussion:
Fare calculation:
Problem: Missing index (operational)
Fix: 2 CREATE INDEX statements
Time: 12 seconds
Result: p99 from 2,400ms to 280ms
Driver matching:
Problem: Missing cache + suboptimal pools (operational)
Fix: Redis cache + pool tuning
Time: 3 days
Result: p99 from 800ms to 180ms
Proposed rewrite:
Estimated time: 18 months
Estimated cost: $1.2M (6 engineers)
Expected result: Same throughput (same patterns)
The diagnostic is a checklist, not a judgment call. Run it before every scaling discussion. Start at step 1. Stop when the SLO is met. If you reach step 10 and the SLO is still violated, you have data to justify the architectural conversation. Until then, the answer is operational.