Caching Layer One: HTTP Cache Controls and CDN Behavior
Caching Layer One: HTTP Cache Controls and CDN Behavior
Your servers are burning CPU to render the same fare estimate 14,000 times per minute. The fare for the airport-to-downtown corridor during non-surge hours changes once every 60 seconds. Every one of those 14,000 requests hits the fare calculation service, runs the distance matrix lookup, applies the pricing model, and returns the same JSON payload. This is not a scaling problem. This is a caching problem, and the HTTP specification solved it in 1999.
HTTP caching is the first layer of defense between your users and your origin servers. Before Redis, before application-level memoization, before any code change at all, the correct Cache-Control headers can eliminate 80% or more of your origin traffic. The CDN does the work. Your origin does not even see the requests.
This chapter covers the full HTTP caching stack: the headers that control it, the conditional requests that validate it, and the CDN behavior that makes or breaks it. Every example targets the ride-hailing platform. Every metric comes from a Locust test.
This diagram illustrates the cache-aside pattern used throughout the platform. A request first checks the L1 in-process Caffeine cache (sub-millisecond). On a miss, it falls through to L2 Redis (shared across pods, ~5ms). Only on a double miss does the request reach the L3 PostgreSQL database (~50ms). When a lower layer returns data, it populates all the layers above it on the way back. With typical hit rates of 60-80% at L1, the average response latency drops to ~4ms, and the database handles only 5-10% of total read traffic.
The Symptom
The fare estimation endpoint handles 14,000 requests per minute during peak. CPU on the fare service pods sits at 78%. Response times at p95 are 340ms. The team is planning to scale horizontally, adding four more pods. The monthly infrastructure bill is about to increase by $4,200.
The fare for a given origin-destination pair during non-surge periods changes at most once per minute. During surge, the surge multiplier updates every 10 seconds. Even in the worst case, the same fare is recomputed hundreds of times within its validity window.
The Cause
Every request bypasses all caching layers and hits the origin. The API returns responses with no Cache-Control headers. The CDN (Cloudflare, in this case) treats every response as uncacheable and forwards every request to the origin. The browser makes a fresh request every time the user opens the fare screen.
// BOTTLENECK: No cache headers, every request hits origin
@GetMapping("/api/v1/fare/estimate")
public Mono<FareEstimate> estimateFare(
@RequestParam String originZone,
@RequestParam String destZone) {
return fareService.calculate(originZone, destZone);
}
The response headers look like this:
HTTP/1.1 200 OK
Content-Type: application/json
// No Cache-Control
// No ETag
// No Last-Modified
The CDN sees no caching instructions. It forwards everything. The origin processes everything.
Cache-Control Directives
Cache-Control is the primary mechanism for controlling HTTP caching. Each directive serves a specific purpose. Misunderstanding any one of them leads to either over-caching (serving stale data) or under-caching (wasting origin resources).
The Directives That Matter
| Directive | Where it applies | What it does |
|---|---|---|
max-age=N | Browser + CDN | Response is fresh for N seconds from the time of the request |
s-maxage=N | CDN only | Overrides max-age for shared caches (CDNs, proxies). Browser ignores this |
no-cache | Browser + CDN | Must revalidate with the origin before using cached copy. Does NOT mean “don’t cache” |
no-store | Browser + CDN | Do not cache at all. Not in browser, not in CDN, not on disk |
private | Browser only | Only the browser may cache. CDN must not cache |
public | Browser + CDN | Explicitly cacheable by any cache, including CDNs |
immutable | Browser | Content will never change. Browser should not revalidate even on reload |
stale-while-revalidate=N | CDN (varies) | Serve stale content for N seconds while revalidating in the background |
Ride-Hailing Endpoint Classification
Not every endpoint is cacheable. Real-time state must never be cached. Semi-static data can be cached aggressively.
| Endpoint | Cacheability | Headers | Reason |
|---|---|---|---|
| Fare estimate (non-surge) | CDN + Browser | s-maxage=60, max-age=30, stale-while-revalidate=30 | Changes at most once per minute |
| Fare estimate (surge active) | CDN only, short | s-maxage=10, no-cache | Surge multiplier changes every 10s |
| Driver availability zones | CDN only, short | s-maxage=10, stale-while-revalidate=5 | Zone aggregates update every 10s |
| Trip history | Browser only | private, max-age=300 | User-specific, sensitive |
| Real-time driver location | Not cacheable | no-store | Changes every second |
| Active trip status | Not cacheable | no-store | Real-time state |
| Surge pricing map tile | CDN + Browser | s-maxage=15, max-age=10 | Tile-based, updates on surge recalculation |
ETag and Last-Modified: Conditional Requests
Cache-Control tells caches how long content is fresh. ETags and Last-Modified tell caches how to check if content has changed once it goes stale.
The Conditional Request Flow
- Origin sends response with
ETag: "a1b2c3"andCache-Control: s-maxage=60 - CDN caches the response
- After 60 seconds, CDN receives a new request for the same resource
- CDN sends the request to the origin with
If-None-Match: "a1b2c3" - Origin checks: has the fare changed? If not, it returns
304 Not Modifiedwith no body - CDN serves the cached copy for another
s-maxageperiod
The 304 response is tiny. No fare calculation. No JSON serialization. No database query. The origin confirms “nothing changed” and the CDN does the rest.
ETag Generation
// SCALED: ETag from content hash
@GetMapping("/api/v1/fare/estimate")
public Mono<ResponseEntity<FareEstimate>> estimateFare(
@RequestParam String originZone,
@RequestParam String destZone,
ServerHttpRequest request) {
return fareService.calculate(originZone, destZone)
.map(estimate -> {
String etag = generateETag(estimate);
// Check if client already has current version
if (request.getHeaders().getIfNoneMatch().contains(etag)) {
return ResponseEntity.status(HttpStatus.NOT_MODIFIED)
.eTag(etag)
.build();
}
return ResponseEntity.ok()
.eTag(etag)
.cacheControl(CacheControl.maxAge(Duration.ofSeconds(30))
.sMaxAge(Duration.ofSeconds(60))
.staleWhileRevalidate(Duration.ofSeconds(30)))
.body(estimate);
});
}
private String generateETag(FareEstimate estimate) {
String content = estimate.originZone() + ":"
+ estimate.destZone() + ":"
+ estimate.totalCents() + ":"
+ estimate.surgeMultiplier();
return "\"" + Integer.toHexString(content.hashCode()) + "\"";
}
The Vary Header
The Vary header tells caches: “This response depends on these request headers. Different values of these headers produce different responses. Cache them separately.”
Vary: Accept-Encoding is safe. There are only two or three common encodings (gzip, br, identity). The CDN stores two or three variants. Hit rates remain high.
Vary: Cookie is a disaster. Every user has a different cookie. The CDN creates a separate cache entry for every unique cookie value. Cache hit rate drops to near zero. The CDN becomes a very expensive proxy.
Vary: Authorization is the same disaster. Every user has a unique token. If your response varies by user, use Cache-Control: private instead and let only the browser cache.
// BOTTLENECK: Vary: Authorization on a public endpoint
@GetMapping("/api/v1/zones/availability")
public Mono<ResponseEntity<ZoneAvailability>> getAvailability(
@RequestParam String zoneId) {
return zoneService.getAvailability(zoneId)
.map(avail -> ResponseEntity.ok()
.varyBy("Authorization") // Every user gets a separate cache entry
.cacheControl(CacheControl.maxAge(Duration.ofSeconds(10)))
.body(avail));
}
Zone availability is the same for every user. The Vary: Authorization header turns a single cacheable response into millions of cache entries, one per user token. The fix: remove Vary: Authorization entirely.
// SCALED: No Vary on public data, correct s-maxage
@GetMapping("/api/v1/zones/availability")
public Mono<ResponseEntity<ZoneAvailability>> getAvailability(
@RequestParam String zoneId) {
return zoneService.getAvailability(zoneId)
.map(avail -> ResponseEntity.ok()
.cacheControl(CacheControl.maxAge(Duration.ofSeconds(5))
.sMaxAge(Duration.ofSeconds(10))
.staleWhileRevalidate(Duration.ofSeconds(5)))
.body(avail));
}
Spring WebFlux CacheControl Builder
Spring provides CacheControl as a fluent builder. Use it. Do not construct Cache-Control header strings manually.
// Common patterns for the ride-hailing platform
// Fare estimates: CDN caches 60s, browser 30s, serve stale for 30s during revalidation
CacheControl fareEstimate = CacheControl
.maxAge(Duration.ofSeconds(30))
.sMaxAge(Duration.ofSeconds(60))
.staleWhileRevalidate(Duration.ofSeconds(30));
// Trip history: browser only, 5 minutes
CacheControl tripHistory = CacheControl
.maxAge(Duration.ofSeconds(300))
.cachePrivate();
// Real-time driver location: never cache
CacheControl driverLocation = CacheControl.noStore();
// Static zone boundaries: long cache, immutable content hash in URL
CacheControl zoneBoundaries = CacheControl
.maxAge(Duration.ofDays(365))
.cachePublic()
.immutable();
The Baseline: Locust Test
The Locust test simulates peak traffic: riders requesting fare estimates, checking driver availability, and viewing trip history. First, against the origin with no caching. Then, with CDN caching enabled via correct headers.
from locust import HttpUser, task, between, events
import time
class RideHailingUser(HttpUser):
wait_time = between(0.5, 2)
zones = [
("airport", "downtown"),
("downtown", "suburbs"),
("airport", "midtown"),
("suburbs", "airport"),
("midtown", "downtown"),
]
@task(5)
def fare_estimate(self):
origin, dest = self.zones[int(time.time()) % len(self.zones)]
self.client.get(
f"/api/v1/fare/estimate?originZone={origin}&destZone={dest}",
name="/api/v1/fare/estimate"
)
@task(3)
def zone_availability(self):
zone = self.zones[int(time.time()) % len(self.zones)][0]
self.client.get(
f"/api/v1/zones/availability?zoneId={zone}",
name="/api/v1/zones/availability"
)
@task(1)
def trip_history(self):
self.client.get(
"/api/v1/trips/history?limit=10",
name="/api/v1/trips/history",
headers={"Authorization": f"Bearer user-token-{self.environment.runner.user_count}"}
)
The Fix
Apply correct Cache-Control headers to every endpoint. The CDN absorbs the cacheable traffic. The origin only handles cache misses and personalized requests.
// SCALED: Complete fare controller with caching
@RestController
@RequestMapping("/api/v1/fare")
public class FareController {
private final FareService fareService;
private final SurgeService surgeService;
@GetMapping("/estimate")
public Mono<ResponseEntity<FareEstimate>> estimateFare(
@RequestParam String originZone,
@RequestParam String destZone,
ServerHttpRequest request) {
return surgeService.isActive(originZone)
.flatMap(surgeActive -> {
CacheControl cacheControl = surgeActive
? CacheControl.noCache()
.sMaxAge(Duration.ofSeconds(10))
: CacheControl.maxAge(Duration.ofSeconds(30))
.sMaxAge(Duration.ofSeconds(60))
.staleWhileRevalidate(Duration.ofSeconds(30));
return fareService.calculate(originZone, destZone)
.map(estimate -> {
String etag = generateETag(estimate);
if (request.getHeaders()
.getIfNoneMatch().contains(etag)) {
return ResponseEntity
.status(HttpStatus.NOT_MODIFIED)
.eTag(etag)
.cacheControl(cacheControl)
.<FareEstimate>build();
}
return ResponseEntity.ok()
.eTag(etag)
.cacheControl(cacheControl)
.body(estimate);
});
});
}
}
The Proof
Locust results with 500 concurrent users over 5 minutes. Origin hit directly (no CDN) vs. origin behind Cloudflare with correct Cache-Control headers.
Before: No Cache Headers (Origin Direct)
| Metric | Fare Estimate | Zone Availability | Trip History |
|---|---|---|---|
| Requests/sec | 2,340 | 1,404 | 468 |
| p50 latency | 42ms | 38ms | 65ms |
| p95 latency | 340ms | 290ms | 410ms |
| p99 latency | 1,200ms | 980ms | 1,450ms |
| Origin CPU | 78% | ||
| Origin requests/sec | 4,212 (total) |
After: Correct Cache-Control Headers + CDN
| Metric | Fare Estimate | Zone Availability | Trip History |
|---|---|---|---|
| Requests/sec | 2,340 | 1,404 | 468 |
| p50 latency | 8ms | 6ms | 62ms |
| p95 latency | 22ms | 18ms | 390ms |
| p99 latency | 85ms | 64ms | 1,380ms |
| CDN cache hit rate | 87% | 82% | 0% (private) |
| Origin CPU | 18% | ||
| Origin requests/sec | 648 (total) |
The CDN absorbed 85% of the total traffic. Origin CPU dropped from 78% to 18%. The p95 for fare estimates dropped from 340ms to 22ms because CDN edge nodes are 5ms from the client, not 45ms.
Trip history shows no improvement because it is marked private. The CDN correctly does not cache it. This is the expected behavior.
The four pods the team planned to add are no longer needed. The infrastructure cost increase of $4,200/month is replaced by correct HTTP headers that cost nothing.
What This Chapter Does Not Cover
Application-level caching with Redis is covered in CH6. Frontend static asset caching is covered in CH11. This chapter addresses only the HTTP caching layer between clients, CDNs, and the origin.
The next two sections dig into the specifics: CH5-S1 covers the header mechanics and Spring WebFlux implementation in detail. CH5-S2 covers CDN behavior, the Vary header trap, and how to measure CDN effectiveness.