Timeout Coordination Across All Layers
Timeout Coordination Across All Layers
Every resilience pattern adds a timeout surface. When they are configured independently, they conflict. This section calculates the correct timeout at every layer for the fraud detection call chain.
The Complete Fraud Detection Call Chain
API Gateway (8s timeout)
-> Payment Service Tomcat (connection-timeout: 5s)
-> TimeLimiter (2s)
-> Retry (3 attempts, 200ms + 400ms backoff)
-> CircuitBreaker (records timeout as failure)
-> Bulkhead (100ms max wait)
-> HTTP Client (connection: 1s, read: 500ms)
-> Fraud Detection Service
-> External Scoring API (3s timeout)
Layer-by-Layer Calculation
Layer 7: External Scoring API timeout (3s). The fraud detection service sets this. Not controlled by the payment service.
Layer 6: HTTP Client read timeout (500ms). This is the payment service’s timeout for the fraud detection HTTP call. If fraud detection does not respond in 500ms, the call fails with a timeout exception.
Layer 5: Bulkhead max wait (100ms). If all bulkhead permits are in use, wait up to 100ms for one to free. After 100ms, reject with BulkheadFullException.
Layer 4: CircuitBreaker. No timeout of its own. Records the outcome (success, failure, timeout) of each call.
Layer 3: Retry. 3 attempts with exponential backoff. Worst case:
- Attempt 1: 100ms (bulkhead wait) + 500ms (HTTP timeout) = 600ms
- Backoff: ~200ms (with jitter)
- Attempt 2: 100ms + 500ms = 600ms
- Backoff: ~400ms (with jitter)
- Attempt 3: 100ms + 500ms = 600ms
- Total worst case: 600ms + 200ms + 600ms + 400ms + 600ms = 2,400ms
Layer 2: TimeLimiter (2s). Caps the entire chain at 2 seconds. The worst case retry chain (2,400ms) exceeds the TimeLimiter (2,000ms). This means the third retry attempt may be cut short by the TimeLimiter. This is intentional: the TimeLimiter prevents runaway retry chains from exceeding the time budget.
Layer 1: Tomcat connection-timeout (5s). This is the server-side timeout for the incoming connection to the payment service. It must be greater than the TimeLimiter timeout to allow the payment service to complete its work and return a response.
Layer 0: API Gateway timeout (8s). The gateway’s timeout on the payment service call. Must be greater than the payment service’s total processing time for the entire payment flow (fraud + balance + gateway + notification + audit).
The Complete Configuration Reference
# PRODUCTION - Complete timeout configuration for the payment service
# All values are justified by the calculations above.
server:
tomcat:
connection-timeout: 5000 # 5s: greater than any single TimeLimiter
# HTTP Client timeouts (per dependency)
fraud:
client:
connect-timeout-ms: 1000 # 1s: TCP handshake to fraud service
read-timeout-ms: 500 # 500ms: fraud response (normal p99: 120ms)
connection-request-timeout-ms: 500 # 500ms: pool acquisition
balance:
client:
connect-timeout-ms: 1000
read-timeout-ms: 500 # Balance is fast or broken
gateway:
client:
connect-timeout-ms: 2000 # External service, allow more
read-timeout-ms: 5000 # Payment processing takes time
# Resilience4J timeouts
resilience4j:
timelimiter:
instances:
fraudDetection:
timeout-duration: 2s # Total budget for fraud call chain
balanceCheck:
timeout-duration: 1500ms
paymentGateway:
timeout-duration: 8s # Includes 3 retries of a slow call
bulkhead:
instances:
fraudDetection:
max-wait-duration: 100ms # Brief wait for permit
balanceCheck:
max-wait-duration: 50ms
notification:
max-wait-duration: 0ms # Immediate rejection
retry:
instances:
paymentGateway:
wait-duration: 200ms # Between retries
max-attempts: 3
The rule: at every layer, timeout(outer) > timeout(inner) + processing overhead. The API gateway (8s) exceeds the sum of TimeLimiter values for all sequential calls in the payment flow. The Tomcat connection timeout (5s) exceeds any single TimeLimiter. The TimeLimiter (2s) exceeds a single retry attempt but may truncate the full retry chain. Each inner layer fails before the outer layer, providing clear, layered error handling.