Resilience Patterns in Production
Resilience Patterns in Production
Circuit Breakers, Bulkheads, and the Failure Modes That Define System Character.
This book targets senior Java developers who have experienced cascading failures, added timeouts that did not help, and implemented retry logic that made things worse. You already have a distributed system that fails in non-obvious ways. This book explains why and fixes it.
Every pattern is implemented from scratch in plain Java 21 before Resilience4J is introduced. Not as an exercise. As the only way to correctly configure a library whose defaults will silently kill your service if you do not understand what the parameters actually control. An engineer who has written a state machine for a circuit breaker knows immediately why a half-open state with one probe request is dangerous under high concurrency. One who has only configured Resilience4J annotations does not.
After the from-scratch implementation demonstrates the internals, every pattern chapter switches to Resilience4J as the definitive production implementation. The from-scratch code is the explanation. Resilience4J is the production code.
The through-line system is a financial transaction processing platform: payment initiation, fraud detection, account balance checks, notification dispatch, and audit logging. Each service has realistic latency profiles and realistic failure modes.
Code examples use Java 21, Spring Boot 3, and Resilience4J 2.x. Every network call has an explicit timeout. Performance impact is measured, not hand-waved.
This book was generated using AI assistance.