The Scaling Ceiling Diagnostic

The Symptom

The team has been fighting fires for three months. Every Friday evening surge pushes the ride-hailing platform past its limits. The on-call rotation is burning people out. The engineering manager schedules a meeting titled “Architecture Discussion.” Everyone knows what that means. Someone will say “microservices.” Someone will say “Rust.” Someone will say “rewrite.”

Before that meeting happens, the principal engineer runs the diagnostic. Ten steps. Cheapest first. The answer comes at step 1.

The Cause

Scaling problems cluster into two categories. Operational problems are misconfigurations, missing optimizations, absent infrastructure that can be added without changing the codebase’s structure. Architectural problems are structural limitations that cannot be fixed by tuning or adding infrastructure.

The distinction matters because the cost difference is three orders of magnitude. An index takes minutes. A pool size change takes a deploy. A cache layer takes a sprint. A full rewrite takes 12-18 months, and there is no guarantee it solves the problem.

The diagnostic checklist orders techniques by cost. Each step references the chapter where it was covered in depth. Stop when the SLO is met.

Scaling diagnostic flowchart showing 10 techniques ordered by cost—free (green), low (blue), medium (yellow), high (red)—with SLO checkpoint after each step

The diagnostic orders ten scaling techniques from cheapest to most expensive. Each step has an “SLO met?” checkpoint—if yes, stop immediately. Green steps (database indexes, pool sizing, thread config) cost nothing and take hours. Blue steps (caching, autoscaling) require minimal spend and take days. Yellow steps (async processing, rate limiting, circuit breakers) are medium-cost and take weeks. Red steps (read replicas, sharding) are expensive and take weeks to months. If all ten techniques are exhausted and the SLO is still violated, the problem is architectural and requires the decision framework in CH22-S2.

The Baseline

The fare calculation service. 10,000 RPS target. Current state:

// BOTTLENECK: Fare calculation service
// No indexes. Default pool sizes. No caching.
// No autoscaling. Synchronous surge pricing call.
@Service
public class FareCalculationService {

    private final JdbcTemplate jdbcTemplate;
    private final RestTemplate surgeClient;

    public FareEstimate calculate(FareRequest request) {
        // Step 1 candidate: this query does a seq scan
        FareConfig config = jdbcTemplate.queryForObject(
            """
            SELECT base_fare, per_km_rate, per_min_rate
            FROM fare_config fc
            JOIN surge_zones sz
              ON fc.zone_id = sz.zone_id
            JOIN time_factors tf
              ON fc.time_slot = tf.time_slot
            WHERE sz.zone_id = ?
              AND tf.day_of_week = ?
              AND tf.hour_of_day = ?
              AND fc.vehicle_type = ?
              AND fc.effective_date <= CURRENT_DATE
              AND (fc.expiry_date IS NULL
                   OR fc.expiry_date > CURRENT_DATE)
            """,
            new Object[]{
                request.getZoneId(),
                request.getDayOfWeek(),
                request.getHour(),
                request.getVehicleType()
            },
            FareConfig.class);

        // Step 2 candidate: pool exhaustion under load
        // Step 6 candidate: synchronous HTTP call
        double surge = surgeClient.getForObject(
            "http://surge-service/api/surge/"
                + request.getZoneId(),
            Double.class);

        return new FareEstimate(
            config.getBaseFare()
                + config.getPerKmRate()
                    * request.getDistanceKm()
                + config.getPerMinRate()
                    * request.getEstimatedMinutes(),
            surge);
    }
}

Baseline Locust at 10,000 RPS:

Before diagnostic:
  /api/fares/estimate  p50=450ms  p99=2400ms  errors=8.2%
  PostgreSQL CPU: 98%
  Connection pool waits: 4,200/sec
  Thread pool queue depth: 890

The Fix

Step 1: Check Indexes

-- SCALED: The diagnostic starts here
-- Run EXPLAIN ANALYZE on the hot query
EXPLAIN (ANALYZE, BUFFERS) SELECT ...

-- Output: Seq Scan on fare_config (rows=340,000)
-- Cost: 45ms per execution

-- Fix: composite index
CREATE INDEX idx_fare_config_lookup
    ON fare_config (zone_id, vehicle_type, effective_date)
    INCLUDE (base_fare, per_km_rate, per_min_rate);

CREATE INDEX idx_time_factors_lookup
    ON time_factors (day_of_week, hour_of_day, time_slot);

After Step 1:
  /api/fares/estimate  p50=35ms   p99=280ms   errors=0.3%
  PostgreSQL CPU: 18%
  Query time: 0.3ms (down from 45ms)

SLO (p99 < 500ms): MET

For the fare calculation, step 1 was sufficient. The diagnostic stops here.

The Counter-Example: Driver Matching

The driver matching service has a different problem. At 50,000 match requests per second:

// BOTTLENECK: Driver matching coupled to trip service
@Service
public class DriverMatchingService {

    private final TripRepository tripRepository;
    private final DriverRepository driverRepository;

    public Mono<DriverMatch> findBestDriver(
            MatchRequest request) {
        // Problem 1: reads from the same database as trips
        // Under trip write load, matching queries compete
        // for connections
        return driverRepository
            .findAvailableInZone(request.getZoneId())
            .collectList()
            .flatMap(drivers -> {
                // Problem 2: for each driver candidate,
                // queries trip history to check reliability
                return Flux.fromIterable(drivers)
                    .flatMap(driver ->
                        tripRepository
                            .countCompletedTrips(
                                driver.getId())
                            .map(count ->
                                new ScoredDriver(
                                    driver, count)))
                    .sort(Comparator.comparing(
                        ScoredDriver::score).reversed())
                    .next();
            });
    }
}

Running the diagnostic on driver matching:

Step 1 (Indexes):     Applied. p99 drops from 800ms to 400ms.
                      Still above 200ms SLO. Continue.
Step 2 (Pool sizes):  Increased to 30. p99 drops to 350ms.
                      Still above SLO. Continue.
Step 3 (Threads):     Optimized. p99 drops to 320ms.
                      Still above SLO. Continue.
Step 4 (Caching):     Driver scores cached in Redis.
                      p99 drops to 180ms. SLO MET.

For driver matching, step 4 resolved it. Still no rewrite.

When the Diagnostic Reaches Step 10

If all 10 steps are applied and the SLO is still violated, three architectural indicators confirm the problem is structural:

Architectural Ceiling Indicators:

1. Deployment coupling
   The monolith deploys in 35 minutes because every
   component deploys together. Scaling the fare service
   requires scaling everything else.

2. Conflicting scaling requirements
   Driver matching needs 50 pods for CPU-intensive
   scoring. Trip management needs 8 pods with large
   memory for caching. Scaling the monolith gives both
   components the wrong resource profile.

3. Cross-context data coupling
   The driver matching query joins trip_history,
   driver_profiles, and fare_config in a single query.
   These are three bounded contexts forced into one
   database. Sharding cannot separate them because
   the joins require co-location.

// BOTTLENECK: Architectural ceiling
// These three concerns cannot be scaled independently
// in the current monolith
@Service
public class CoupledMonolith {

    // Scaling requirement: 50 pods, CPU-heavy
    public DriverMatch matchDriver(MatchRequest req) {
        // Joins driver_profiles + trip_history + fare_config
        // Cannot separate without breaking the join
    }

    // Scaling requirement: 8 pods, memory-heavy
    public TripDetails getTrip(String tripId) {
        // Large result sets, needs RAM for caching
    }

    // Scaling requirement: 20 pods, IO-heavy
    public FareEstimate calculateFare(FareRequest req) {
        // High RPS, needs many connections
    }
}

When the diagnostic confirms an architectural ceiling, the response is targeted extraction (strangler fig), not a full rewrite. CH22-S2 covers the extraction process and the decision framework.

Locust Diagnostic in Action

Run the diagnostic as a series of Locust tests, applying one step at a time:

# SCALED: Locust diagnostic runner
from locust import HttpUser, task, between
import time

class DiagnosticUser(HttpUser):
    wait_time = between(0.05, 0.1)
    host = "http://rider-api:8080"

    @task
    def fare_estimate(self):
        self.client.get(
            "/api/fares/estimate"
            + "?zoneId=manhattan-midtown"
            + "&vehicleType=standard"
            + "&dayOfWeek=FRIDAY"
            + "&hour=18")

Diagnostic Results (cumulative):

Step  Change              p99     RPS Cap  PG CPU  Errors
Base  No optimizations    2400ms  2,000    98%     8.2%
 1    Add indexes         280ms   12,000   18%     0.3%
      *** SLO MET - STOP ***

For fare calculation, one step. 12 seconds of work.

Diagnostic Results for driver matching (cumulative):

Step  Change              p99     RPS Cap  PG CPU  Errors
Base  No optimizations    800ms   8,000    82%     4.1%
 1    Add indexes         400ms   15,000   55%     1.2%
 2    Pool size 15→30     350ms   20,000   55%     0.8%
 3    Thread optimization 320ms   25,000   52%     0.5%
 4    Redis score cache   180ms   50,000   30%     0.1%
      *** SLO MET - STOP ***

Four steps. Three days of work. No rewrite.

The Proof

The diagnostic answered both questions without a rewrite discussion:

Fare calculation:
  Problem: Missing index (operational)
  Fix: 2 CREATE INDEX statements
  Time: 12 seconds
  Result: p99 from 2,400ms to 280ms

Driver matching:
  Problem: Missing cache + suboptimal pools (operational)
  Fix: Redis cache + pool tuning
  Time: 3 days
  Result: p99 from 800ms to 180ms

Proposed rewrite:
  Estimated time: 18 months
  Estimated cost: $1.2M (6 engineers)
  Expected result: Same throughput (same patterns)

The diagnostic is a checklist, not a judgment call. Run it before every scaling discussion. Start at step 1. Stop when the SLO is met. If you reach step 10 and the SLO is still violated, you have data to justify the architectural conversation. Until then, the answer is operational.