War-Room-Principal-Engineer

Surviving the Spike

Surviving the Spike

A Production Engineer's Guide to Scalable Web Services.

This book targets senior developers and architects who have shipped web services and watched them fall over under load. You know what a REST endpoint is. You know what a cache is. You have a running distributed system. This book explains why it does not scale and how to fix it.

Every chapter uses the same domain: a ride-hailing platform modelled on Uber's architecture. Driver location ingestion, rider matching, fare calculation, trip history, and surge pricing. Code examples, diagrams, and failure scenarios all refer to this platform. Abstract scalability problems become concrete and consistent across the book.

Three opinions run through every chapter:

Redis is the definitive caching layer. Not Memcached, not Hazelcast, not Caffeine alone. Redis, used correctly, solves the majority of caching problems. Every caching chapter uses Redis. When a different tool is genuinely better for a specific sub-problem, the narrow case is stated explicitly, then the book returns to Redis as the default.

Reactive over blocking for I/O-bound services. Spring WebFlux with Project Reactor is the correct default for services that spend most of their time waiting. Spring MVC is not wrong, but when the bottleneck is I/O, blocking threads is waste.

Measure first, always. No optimization chapter proceeds without establishing a baseline with Locust. An opinion without a number is a preference. A number without a baseline is noise.

Code examples use Java 21, Spring Boot 3, and Spring WebFlux. Redis examples use Lettuce. Kubernetes manifests accompany every infrastructure component. Locust scripts measure every claim. Every chapter follows the same structure: the symptom, the cause, a Locust baseline that reproduces the problem, the fix, and the same Locust test re-run with the delta shown and explained.

This book was generated using AI assistance.

22 Chapters

8h 42m total

104,212 words

May 24, 2026

About This Book

Voice War-Room-Principal-Engineer

Tone Direct, opinionated, battle-scarred. Write as a principal engineer who has been in the war room during a Black Friday incident and has the runbook scars to prove it. Specific. Direct. When a common industry pattern is overused or misapplied, say so by name and explain what it actually costs.

Categories

Performance Scalability Infrastructure Observability Reliability

Table of Contents

1

Measuring Before Optimizing: Latency Percentiles, Throughput, and the Lies Averages Tell

8 min read
Read Chapter
1. The Lies Averages Tell and What Percentiles Reveal
  5 min read
2. Building the Locust Baseline for the Ride-Hailing Platform
  7 min read
2

The Request Lifecycle Under Load: DNS, Load Balancer, JVM, Database, and Back

6 min read
Read Chapter
1. From DNS to the JVM: The First Half of the Request
  6 min read
2. From Database to Response: The Second Half and Where It Stalls
  6 min read
3

Horizontal Scaling and the State Problem

3 min read
Read Chapter
1. Session Affinity and Its Cost
  5 min read
2. Externalizing State to Redis
  6 min read
4

Connection Pools, Thread Pools, and Where Requests Go to Die

5 min read
Read Chapter
1. Connection Pool Sizing: The Math and the Mistakes
  5 min read
2. Thread Pool Exhaustion Under Load
  5 min read
5

Caching Layer One: HTTP Cache Controls and CDN Behavior

10 min read
Read Chapter
1. Cache-Control, ETag, and Last-Modified: The Headers That Save Your Origin
  9 min read
2. CDN Behavior and the Vary Header Trap
  11 min read
6

Caching Layer Two: Application and Query Result Caching with Redis

11 min read
Read Chapter
1. Spring Cache Abstraction with Redis
  10 min read
2. Cache-Aside vs Read-Through vs Write-Through
  13 min read
7

Caching Layer Three: Computed Aggregates, Invalidation, and What Goes Wrong

10 min read
Read Chapter
1. Computed Aggregate Caching and Invalidation Strategies
  8 min read
2. Cache Failure Modes and How to Survive Them
  14 min read
8

Database Scaling: Read Replicas, Sharding, and When PostgreSQL Is Not the Problem

7 min read
Read Chapter
1. Read Replicas and Connection Routing
  8 min read
2. Sharding Strategies and When the Database Is Not the Problem
  10 min read
9

Async by Default: Kafka, Backpressure, and the Queue That Saved the Service

8 min read
Read Chapter
1. Decoupling with Kafka: What to Make Async and What to Keep Synchronous
  9 min read
2. Backpressure, Consumer Lag, and Dead Letter Queues
  11 min read
10

Rate Limiting: Algorithms, Redis, and Atomic Operations

10 min read
Read Chapter
1. Rate Limiting Algorithms: From Token Bucket to Sliding Window
  10 min read
2. Distributed Rate Limiting with Redis and Lua
  13 min read
11

The Browser Is a Client You Cannot Optimize: Bundle Size and Core Web Vitals

6 min read
Read Chapter
1. Bundle Size, Code Splitting, and Time to Interactive
  8 min read
2. Core Web Vitals as SLOs and API Design for Frontend Performance
  9 min read
12

Real-Time at Scale: WebSockets, SSE, and 100k Concurrent Connections

7 min read
Read Chapter
1. WebSockets vs SSE: Architecture and the Right Tool for Each Feature
  9 min read
2. 100k Concurrent Connections: Memory, File Descriptors, and Redis Pub/Sub
  11 min read
13

Kubernetes Scaling: HPA, VPA, KEDA, and the Metrics That Drive Them

9 min read
Read Chapter
1. HPA, VPA, and Why CPU-Based Scaling Fails for I/O-Bound Services
  8 min read
2. KEDA, Event-Driven Scaling, and the Metrics That Matter
  8 min read
14

Load Balancers: Algorithms, Health Checks, and the Sticky Session Trap

9 min read
Read Chapter
1. Algorithms and Health Checks That Actually Work
  8 min read
2. The Sticky Session Trap and Connection Draining
  9 min read
15

CI/CD as a Safety Gate: Performance Regression Testing in the Pipeline

10 min read
Read Chapter
1. Performance Regression Detection in CI
  11 min read
2. Tracking Performance Trends and the GitLab CI Equivalent
  10 min read
16

Distributed Tracing with OpenTelemetry: Finding the Slow Thing

8 min read
Read Chapter
1. OpenTelemetry Instrumentation for the Ride-Hailing Platform
  7 min read
2. Finding the Slow Thing in Production
  8 min read
17

SLOs, Error Budgets, and Escaping Alert Fatigue

9 min read
Read Chapter
1. Defining SLOs for the Ride-Hailing Platform
  8 min read
2. Burn Rate Alerting and Escaping Alert Fatigue
  10 min read
18

Cascading Failures, Circuit Breakers, and Bulkheads

7 min read
Read Chapter
1. Circuit Breakers with Resilience4j
  8 min read
2. Bulkhead Isolation and Retry Strategies
  8 min read
19

Degraded Mode Design: What the System Does When Half of It Is Gone

8 min read
Read Chapter
1. Feature Criticality and Graceful Degradation Chains
  8 min read
2. Partial Availability Architectures
  11 min read
20

Chaos Engineering: Breaking the Ride-Hailing Platform on Purpose

6 min read
Read Chapter
1. Chaos Toolkit and Steady State Hypotheses
  8 min read
2. Breaking the Platform: Four Experiments with Results
  14 min read
21

Multi-Region: The Complexity Tax and the Conditions That Justify It

7 min read
Read Chapter
1. Data Replication and Consistency Across Regions
  7 min read
2. The Complexity Tax: When Multi-Region Is Not Worth It
  7 min read
22

When to Rewrite vs When to Scale: The Honest Conversation

7 min read
Read Chapter
1. The Scaling Ceiling Diagnostic
  7 min read
2. The Honest Decision Framework
  10 min read