Skip to main content
resilience patterns in production

Timeout Coordination Across All Layers

4 min read Chapter 18 of 40

Timeout Coordination Across All Layers

Every resilience pattern adds a timeout surface. When they are configured independently, they conflict. This section calculates the correct timeout at every layer for the fraud detection call chain.

The Complete Fraud Detection Call Chain

API Gateway (8s timeout)
  -> Payment Service Tomcat (connection-timeout: 5s)
    -> TimeLimiter (2s)
      -> Retry (3 attempts, 200ms + 400ms backoff)
        -> CircuitBreaker (records timeout as failure)
          -> Bulkhead (100ms max wait)
            -> HTTP Client (connection: 1s, read: 500ms)
              -> Fraud Detection Service
                -> External Scoring API (3s timeout)

Layer-by-Layer Calculation

Layer 7: External Scoring API timeout (3s). The fraud detection service sets this. Not controlled by the payment service.

Layer 6: HTTP Client read timeout (500ms). This is the payment service’s timeout for the fraud detection HTTP call. If fraud detection does not respond in 500ms, the call fails with a timeout exception.

Layer 5: Bulkhead max wait (100ms). If all bulkhead permits are in use, wait up to 100ms for one to free. After 100ms, reject with BulkheadFullException.

Layer 4: CircuitBreaker. No timeout of its own. Records the outcome (success, failure, timeout) of each call.

Layer 3: Retry. 3 attempts with exponential backoff. Worst case:

  • Attempt 1: 100ms (bulkhead wait) + 500ms (HTTP timeout) = 600ms
  • Backoff: ~200ms (with jitter)
  • Attempt 2: 100ms + 500ms = 600ms
  • Backoff: ~400ms (with jitter)
  • Attempt 3: 100ms + 500ms = 600ms
  • Total worst case: 600ms + 200ms + 600ms + 400ms + 600ms = 2,400ms

Layer 2: TimeLimiter (2s). Caps the entire chain at 2 seconds. The worst case retry chain (2,400ms) exceeds the TimeLimiter (2,000ms). This means the third retry attempt may be cut short by the TimeLimiter. This is intentional: the TimeLimiter prevents runaway retry chains from exceeding the time budget.

Layer 1: Tomcat connection-timeout (5s). This is the server-side timeout for the incoming connection to the payment service. It must be greater than the TimeLimiter timeout to allow the payment service to complete its work and return a response.

Layer 0: API Gateway timeout (8s). The gateway’s timeout on the payment service call. Must be greater than the payment service’s total processing time for the entire payment flow (fraud + balance + gateway + notification + audit).

The Complete Configuration Reference

# PRODUCTION - Complete timeout configuration for the payment service
# All values are justified by the calculations above.

server:
  tomcat:
    connection-timeout: 5000 # 5s: greater than any single TimeLimiter

# HTTP Client timeouts (per dependency)
fraud:
  client:
    connect-timeout-ms: 1000 # 1s: TCP handshake to fraud service
    read-timeout-ms: 500 # 500ms: fraud response (normal p99: 120ms)
    connection-request-timeout-ms: 500 # 500ms: pool acquisition

balance:
  client:
    connect-timeout-ms: 1000
    read-timeout-ms: 500 # Balance is fast or broken

gateway:
  client:
    connect-timeout-ms: 2000 # External service, allow more
    read-timeout-ms: 5000 # Payment processing takes time

# Resilience4J timeouts
resilience4j:
  timelimiter:
    instances:
      fraudDetection:
        timeout-duration: 2s # Total budget for fraud call chain
      balanceCheck:
        timeout-duration: 1500ms
      paymentGateway:
        timeout-duration: 8s # Includes 3 retries of a slow call

  bulkhead:
    instances:
      fraudDetection:
        max-wait-duration: 100ms # Brief wait for permit
      balanceCheck:
        max-wait-duration: 50ms
      notification:
        max-wait-duration: 0ms # Immediate rejection

  retry:
    instances:
      paymentGateway:
        wait-duration: 200ms # Between retries
        max-attempts: 3

The rule: at every layer, timeout(outer) > timeout(inner) + processing overhead. The API gateway (8s) exceeds the sum of TimeLimiter values for all sequential calls in the payment flow. The Tomcat connection timeout (5s) exceeds any single TimeLimiter. The TimeLimiter (2s) exceeds a single retry attempt but may truncate the full retry chain. Each inner layer fails before the outer layer, providing clear, layered error handling.