Skip to main content
architecting resilient distributed systems high-scale engineering and failure mode mitigation

Isolation Patterns: Bulkheads and Sidecars

3 min read Chapter 5 of 13
Summary

Bulkheads and sidecars are isolation strategies that prevent...

Bulkheads and sidecars are isolation strategies that prevent fault propagation in distributed systems.

Isolation Patterns: Bulkheads and Sidecars

Introduction to Bulkheads

The Bulkhead pattern is a critical isolation strategy in distributed systems, named after the physical partitions in a ship’s hull that prevent the entire vessel from sinking if one section is breached [1]. This pattern partitions system resources into independent units, or compartments, to ensure that a failure in one unit does not exhaust the resources of the entire system. By isolating resources, bulkheads prevent a single point of failure from causing a cascade of failures throughout the system.

Implementing Bulkheads

Bulkheads can be implemented at multiple levels, including thread pools, processes, containers, or physical hardware clusters. For instance, separating thread pools for critical versus non-critical tasks can prevent one service call from hanging all worker threads. Similarly, connection pools can be isolated to ensure availability for critical database queries during load spikes. The choice of isolation level depends on the specific requirements of the system and the resources that need to be protected.

Sidecar Pattern

The Sidecar pattern is another key isolation strategy, where a peripheral component is deployed alongside a primary application to extend its features without modifying the application code. Sidecars are commonly used in service meshes, such as Istio, to manage traffic, security, and observability. For example, an Envoy proxy can be used as a sidecar to intercept and manage all network traffic for a service. The sidecar shares the same lifecycle as the primary application and is typically deployed in the same pod.

Benefits of Sidecars

Sidecars offer several benefits, including the ability to offload complex logic from the primary application and prevent network failure propagation. They can also be used to implement circuit breakers, which detect failures and prevent a system from repeatedly trying to execute an operation that is likely to fail. Additionally, sidecars can be used to implement bulkheads, by isolating resources and preventing a single point of failure from causing a cascade of failures.

Comparison of Isolation Strategies

The following table compares different isolation strategies, including thread pools, connection pools, namespaces, and sidecar proxies.

StrategyResource IsolatedPrimary Benefit
Thread PoolCPU/ExecutionPrevents one service call from hanging all worker threads.
Connection PoolDatabase/SocketEnsures availability for critical DB queries during load spikes.
Namespace/ClusterMemory/CPU/NetworkPrevents total system collapse via physical or logical hardware separation.
Sidecar ProxyNetwork/SecurityOffloads complex logic; prevents network failure propagation.

Conclusion

In conclusion, bulkheads and sidecars are two critical isolation patterns that can be used to prevent fault propagation in distributed systems. By partitioning system resources into independent units and deploying peripheral components alongside primary applications, these patterns can help ensure the resilience and availability of complex systems. As demonstrated by the comparison table, each isolation strategy has its own benefits and trade-offs, and the choice of strategy depends on the specific requirements of the system.

Sources

[1] https://www.geeksforgeeks.org/system-design/bulkhead-pattern/ [2] https://istio.io/latest/docs/ops/common-problems/injection/ [3] https://www.freecodecamp.org/news/design-patterns-for-distributed-systems/