HTTP/2 Multiplexing and the End of Connection Limits
HTTP/2 Multiplexing and the End of Connection Limits
The main chapter showed HTTP/2 eliminating head-of-line blocking at the application layer. This section examines the mechanics: how frames interleave, how HPACK compresses repetitive headers, how flow control prevents fast senders from overwhelming slow receivers, and why server push was a good idea that failed in practice.
Frame Interleaving on the Wire
In HTTP/1.1, a response must complete before the next request can use the connection. In HTTP/2, the server sends DATA frames from multiple streams in any order. The client reassembles each stream independently:
// Wire capture: HTTP/2 serving article list + recommendation API concurrently
//
// Frame 1: HEADERS stream=1 :status=200, content-type=application/json
// Frame 2: HEADERS stream=3 :status=200, content-type=application/json
// Frame 3: DATA stream=1 [first 16KB of article list response]
// Frame 4: DATA stream=3 [first 16KB of recommendations response]
// Frame 5: DATA stream=1 [next 16KB of article list] END_STREAM
// Frame 6: DATA stream=3 [remaining recommendations] END_STREAM
//
// Both responses complete in ~same wall time as one
// Client reassembles stream 1 frames into article list
// Client reassembles stream 3 frames into recommendations
// HTTP/1.1 equivalent requires 2 connections or serial responses:
// Conn 1: [---article list response (full)---][---next request---]
// Conn 2: [---recommendations response (full)---]
The frame maximum size defaults to 16384 bytes (16KB). For the content platform’s 37KB minified article list response, this means 3 DATA frames per response. With 50 concurrent article list requests, 150 DATA frames interleave with frames from other streams:
// Server-side frame scheduling (simplified Netty H2 behavior):
// 1. Check all streams with pending data
// 2. Apply priority/weight to determine send order
// 3. Send up to SETTINGS_MAX_FRAME_SIZE bytes per stream per round
// 4. Move to next stream
// 5. Repeat until all streams drained or connection window exhausted
// Netty's default scheduling: weighted fair queuing
// Stream with weight 256 gets 2x bandwidth of stream with weight 128
// Content platform: article-list API at weight 220, analytics at weight 32
HPACK Header Compression
HTTP/1.1 headers are plain text, repeated on every request. A typical content platform API request sends:
GET /api/articles?page_size=50 HTTP/1.1
Host: api.contentplatform.com
Accept: application/json
Accept-Encoding: gzip, deflate, br
Authorization: Bearer eyJhbGciOiJSUzI1NiIs... (800+ bytes)
Cookie: session=abc123; preferences=dark-mode; consent=granted
User-Agent: Mozilla/5.0 (Linux; Android 14) AppleWebKit/537.36...
X-Request-ID: 7f8a9b2c-3d4e-5f6a-7b8c-9d0e1f2a3b4c
X-Correlation-ID: trace-2024-abc-def
Total: approximately 1,200 bytes per request. Across 15 resources on a page load, that is 18KB of headers alone. Most of it identical between requests.
HPACK compresses headers using two mechanisms:
// HPACK Static Table (61 pre-defined headers)
// Index 1: :authority
// Index 2: :method GET
// Index 3: :method POST
// Index 8: :status 200
// Index 16: accept-encoding: gzip, deflate
// Index 31: content-type
// ... 61 entries total
// HPACK Dynamic Table (connection-scoped, shared between requests)
// Entries added as headers are transmitted
// Default max size: 4096 bytes (SETTINGS_HEADER_TABLE_SIZE)
// First request: full header values transmitted, added to dynamic table
// Second request: reference by index (1 byte instead of 800+ bytes for Auth)
// Compression ratio for content platform headers:
// Request 1: 1,200 bytes -> 1,180 bytes (static table refs only)
// Request 2: 1,200 bytes -> 68 bytes (dynamic table refs for all)
// Request 3+: 1,200 bytes -> 42 bytes (stable dynamic table)
//
// 96.5% header compression after warmup
The dynamic table has critical performance implications. If the table is too small, entries get evicted, forcing retransmission. For the content platform’s 800-byte Authorization header:
// SETTINGS_HEADER_TABLE_SIZE tuning:
//
// Default (4096 bytes): Fits ~5 large headers before eviction
// Problem: With 15 unique header combinations, eviction thrashing occurs
//
// Recommended (8192 bytes): Fits all common header combinations
// Result: Stable compression after first page load
// Spring Boot configuration:
// server.http2.header-table-size=8192
@Configuration
public class HpackConfig {
@Bean
public WebServerFactoryCustomizer<NettyReactiveWebServerFactory> hpackCustomizer() {
return factory -> factory.addServerCustomizers(httpServer ->
httpServer.http2Settings(settings ->
settings.headerTableSize(8192)
.maxHeaderListSize(16384)
)
);
}
}
Flow Control
HTTP/2 implements flow control at two levels: connection-level and stream-level. Each level maintains a window size representing how many bytes the sender is allowed to transmit before receiving a WINDOW_UPDATE frame:
// Flow control windows:
//
// Connection window: Total bytes allowed across ALL streams
// Stream window: Bytes allowed for THIS stream
//
// Effective allowed = min(connection_window, stream_window)
//
// Default: 65,535 bytes (64KB) for both
//
// Problem for content platform:
// 50 concurrent streams * 37KB average response = 1,850KB needed
// Connection window of 64KB means constant WINDOW_UPDATE ping-pong
// Each WINDOW_UPDATE: 13 bytes + frame header = additional round trips
// Window too small: throughput limited by WINDOW_UPDATE latency
// Window too large: risk of overwhelming client (mobile with limited buffer)
// Optimal connection window for content platform:
// Target: sustain 50 concurrent responses without stalling
// Calculation: 50 streams * 16KB frame * 2 frames in flight = 1,600KB
// Setting: 2MB connection window (2,097,152 bytes)
// Stream window: 256KB per stream (sufficient for largest single response)
Configuration:
@Bean
public WebServerFactoryCustomizer<NettyReactiveWebServerFactory> flowControlCustomizer() {
return factory -> factory.addServerCustomizers(httpServer ->
httpServer.http2Settings(settings ->
settings.initialWindowSize(262144) // 256KB per stream
)
);
}
// Client-side (for internal services calling other services):
// HttpClient with custom HTTP/2 settings
HttpClient httpClient = HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_2)
.build();
// Note: Java's HttpClient does not expose H2 window tuning directly
// For internal services needing fine control, use Netty directly or gRPC
Server Push (Deprecated)
HTTP/2 originally included server push: the server sends resources before the client requests them. The content platform could push CSS and critical JS alongside the HTML response:
// Server push concept (DEPRECATED):
// Client requests: GET /articles
// Server pushes: /css/main.css, /js/critical.js (without waiting for client)
//
// Why it failed:
// 1. Cache invalidation: Server pushes resources the client already has cached
// Wasted bandwidth on repeated visits (80%+ of content platform traffic)
//
// 2. Priority inversion: Pushed resources compete with requested resources
// Large pushed CSS blocks small critical API response
//
// 3. Client cancellation: Client must RST_STREAM to cancel unwanted push
// By then, bytes already on the wire
//
// 4. No good heuristic: Server cannot know client cache state
// Cache digest proposals never standardized
//
// Chrome removed server push support in 2022 (Chrome 106)
// Replacement: 103 Early Hints (informational response with Link headers)
The replacement pattern for the content platform:
// 103 Early Hints: tell the browser to preload resources while server computes
@GetMapping("/articles")
public ResponseEntity<ArticleListResponse> getArticles() {
// Send 103 Early Hints immediately (before processing)
// Browser starts fetching CSS/JS in parallel with server processing
// This happens at the Nginx/CDN level:
// HTTP/1.1 103 Early Hints
// Link: </css/main.css>; rel=preload; as=style
// Link: </js/critical.js>; rel=preload; as=script
//
// ... server processes for 45ms ...
//
// HTTP/1.1 200 OK
// Content-Type: application/json
// [article list response body]
ArticleListResponse response = articleService.getArticleList(50);
return ResponseEntity.ok(response);
}
Stream Prioritization
HTTP/2 originally used a dependency tree model for stream priorities. This proved too complex for implementations and was replaced by RFC 9218 Extensible Priorities:
// RFC 9218 Priority signals:
// Priority: u=N (urgency, 0-7, lower = more important)
// Priority: u=N, i (incremental: can be processed as received)
//
// Content platform priority mapping:
// u=0: HTML document (critical rendering path)
// u=1: CSS, critical JS (render-blocking)
// u=2: Article list API, visible images (above the fold)
// u=3: Font files (needed for text rendering)
// u=4: Non-critical JS, below-fold images
// u=5: Recommendation API, analytics
// u=6: Prefetch for next page
// u=7: Background sync, beacon
// Spring Boot response with priority hint:
@GetMapping("/api/articles")
public ResponseEntity<ArticleListResponse> getArticles() {
ArticleListResponse articles = articleService.getArticleList(50);
return ResponseEntity.ok()
.header("Priority", "u=2")
.body(articles);
}
@GetMapping("/api/recommendations")
public ResponseEntity<RecommendationResponse> getRecommendations() {
RecommendationResponse recs = recommendationService.getForUser(currentUser());
return ResponseEntity.ok()
.header("Priority", "u=5, i") // lower urgency, incremental
.body(recs);
}
Benchmark: HTTP/1.1 vs HTTP/2 Under Load
Locust test simulating the content platform’s page load pattern:
# locust_protocol_comparison.py
from locust import HttpUser, task, between, events
import time
class ContentPlatformPageLoad(HttpUser):
"""Simulates a full page load: 15 resources per page view."""
wait_time = between(1, 3)
@task
def full_page_load(self):
"""Load all resources a content page needs."""
start = time.time()
# Critical path (blocks rendering)
self.client.get("/api/articles?page_size=50", name="article-list")
self.client.get("/css/main.css", name="css-main")
self.client.get("/js/critical.js", name="js-critical")
# Above the fold
self.client.get("/api/recommendations", name="recommendations")
for i in range(3):
self.client.get(f"/images/hero-{i}.webp", name="hero-image")
# Fonts
for i in range(4):
self.client.get(f"/fonts/inter-{i}.woff2", name="font")
# Below the fold
self.client.get("/js/analytics.js", name="js-analytics")
self.client.get("/api/user/state", name="user-state")
self.client.get("/js/lazy.js", name="js-lazy")
elapsed = time.time() - start
events.request.fire(
request_type="PAGE",
name="full-page-load",
response_time=elapsed * 1000,
response_length=0,
exception=None,
context={}
)
Results at 500 concurrent users, 80ms simulated RTT:
HTTP/1.1 (6 connections per user, TLS 1.2):
Page load P50: 1,240ms
Page load P99: 3,890ms
Requests/sec: 4,200
Active connections: 3,000
Server CPU: 72%
Server memory: 3.1GB (connection buffers)
HTTP/2 (1 connection per user, TLS 1.3):
Page load P50: 380ms
Page load P99: 820ms
Requests/sec: 12,600
Active connections: 500
Server CPU: 45%
Server memory: 1.2GB
Improvement:
P50 page load: 3.3x faster
P99 page load: 4.7x faster
Throughput: 3.0x higher
Connections: 6.0x fewer
Memory: 2.6x less
The P99 improvement (4.7x) exceeds P50 improvement (3.3x) because HTTP/1.1’s connection contention causes queuing: under load, the 7th request waits for a free connection. This waiting time is highly variable, inflating tail latency.
Max Concurrent Streams Tuning
The server controls how many streams a client can open simultaneously via SETTINGS_MAX_CONCURRENT_STREAMS:
// Too low (default 100): client forced to queue requests
// Effect: reintroduces HTTP/1.1-style waiting for "free" streams
//
// Too high (unlimited): risk of resource exhaustion under attack
// Effect: single client opens 10,000 streams, exhausts server memory
//
// Content platform calculation:
// Typical page: 15 resources
// Prefetch next page: 10 resources
// Background sync: 5 requests
// Safety margin: 2x
// Target: (15 + 10 + 5) * 2 = 60
//
// But: HTTP/2 connections are often shared across browser tabs
// User with 4 tabs: 60 * 4 = 240
// Setting: 250 max concurrent streams
// Protection against abuse:
// Rate limit: max 1000 new streams per second per connection
// If exceeded: GOAWAY frame, force new connection
@Bean
public WebServerFactoryCustomizer<NettyReactiveWebServerFactory> streamLimitCustomizer() {
return factory -> factory.addServerCustomizers(httpServer ->
httpServer.http2Settings(settings ->
settings.maxConcurrentStreams(250)
)
);
}
When HTTP/2 Hurts
HTTP/2 is not universally faster. Scenarios where it underperforms HTTP/1.1:
// Scenario 1: Single large download (video, large file)
// HTTP/1.1: Simple byte stream, no framing overhead
// HTTP/2: Every 16KB wrapped in 9-byte frame header
// Overhead: 9 / 16384 = 0.055% (negligible, but no multiplexing benefit either)
// Verdict: No benefit, no harm
// Scenario 2: TLS certificate > 16KB (multiple large certs in chain)
// HTTP/1.1: Send cert, done. Subsequent requests on same connection.
// HTTP/2: Must complete TLS before SETTINGS exchange
// Large cert chains delay SETTINGS, delay all streams
// Mitigation: Use ECDSA certs (smaller than RSA), OCSP stapling
// Scenario 3: Lossy network (2%+ packet loss)
// HTTP/1.1: 6 connections, loss on one does not affect others
// HTTP/2: 1 connection, any TCP loss blocks ALL streams
// Impact at 2% loss, 80ms RTT:
// HTTP/1.1 page load P99: 2,100ms
// HTTP/2 page load P99: 2,800ms (TCP HOL blocking)
// Solution: HTTP/3 (QUIC eliminates TCP-level HOL blocking)
The content platform’s traffic profile (many small API calls, 15+ resources per page, global users) is the ideal HTTP/2 workload. The 3.3x P50 improvement confirmed in production: median page load dropped from 1.2s to 360ms after enabling HTTP/2 at the CDN edge.