Bulkhead

A circuit breaker stops calling a broken dependency. A bulkhead limits the resources a single dependency can consume, preventing it from affecting other dependencies that share the same process.

The name comes from ship design. A ship’s hull is divided into watertight compartments (bulkheads). If one compartment floods, the bulkhead walls prevent water from spreading to other compartments. The ship stays afloat.

In software, the ship is your service’s thread pool. The flood is a slow dependency consuming all threads. The bulkheads are isolated thread pools or semaphore permits, one per dependency.

The Failure Mode

Without a bulkhead, the payment service has a single shared thread pool of 200 threads. All five downstream calls (fraud, balance, payment gateway, notification, audit) compete for threads from this pool. When fraud detection slows down, fraud detection calls consume an increasing share of the pool. At 100 requests per second with a 5-second fraud response time, fraud detection alone needs 500 threads. The pool has 200. Fraud detection consumes all of them. Balance checks, payment processing, notifications, and audit writes all queue or fail. A single slow dependency monopolizes the entire service.

Bulkhead: Thread Pool Isolation

The left side shows the problem: a shared thread pool where one slow dependency (Fraud Service) consumes 180 of 200 threads, leaving only 20 for everything else. The right side shows the solution: separate pools for each dependency, each with a fixed maximum. When the Fraud Pool fills to its 20-thread maximum, overflow requests are rejected immediately. The Payment Pool, Balance Pool, and Notification Pool continue operating normally. The key insight, stated at the bottom: the bulkhead does not fix the slow dependency. It prevents the slow dependency from consuming resources needed by healthy dependencies.

The Internals: From Scratch

There are two bulkhead implementations: thread pool isolation and semaphore isolation.

Semaphore bulkhead limits the number of concurrent calls using a semaphore. The call executes on the caller’s thread. If the semaphore has no available permits, the call is rejected. This is simpler and has less overhead, but the caller’s thread is still blocked during the call.

Thread pool bulkhead executes the call on a separate thread pool. The caller submits the work and waits for the result with a timeout. If the thread pool is full, the call is rejected. This provides true isolation: the caller’s thread is freed when the call is submitted.

// FROM SCRATCH - Semaphore bulkhead
public class SemaphoreBulkhead {

    private final Semaphore semaphore;
    private final Duration maxWaitDuration;
    private final String name;
    private final AtomicInteger rejectedCount = new AtomicInteger(0);

    /**
     * @param maxConcurrentCalls Maximum number of concurrent calls permitted
     * @param maxWaitDuration How long to wait for a permit before rejecting
     */
    public SemaphoreBulkhead(String name, int maxConcurrentCalls,
                              Duration maxWaitDuration) {
        this.name = name;
        this.semaphore = new Semaphore(maxConcurrentCalls, true); // fair ordering
        this.maxWaitDuration = maxWaitDuration;
    }

    public <T> T execute(Supplier<T> supplier) {
        boolean acquired;
        try {
            acquired = semaphore.tryAcquire(
                    maxWaitDuration.toMillis(), TimeUnit.MILLISECONDS);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new BulkheadRejectedException(name, "Interrupted waiting for permit");
        }

        if (!acquired) {
            rejectedCount.incrementAndGet();
            throw new BulkheadRejectedException(name,
                    "No permit available within " + maxWaitDuration.toMillis() + "ms. "
                    + "Available permits: " + semaphore.availablePermits()
                    + ", Queue length: " + semaphore.getQueueLength());
        }

        try {
            return supplier.get();
        } finally {
            semaphore.release();
        }
    }

    public int availablePermits() {
        return semaphore.availablePermits();
    }

    public int rejectedCount() {
        return rejectedCount.get();
    }
}

// FROM SCRATCH - Thread pool bulkhead
public class ThreadPoolBulkhead {

    private final ExecutorService executor;
    private final Duration timeout;
    private final String name;
    private final AtomicInteger rejectedCount = new AtomicInteger(0);

    public ThreadPoolBulkhead(String name, int corePoolSize, int maxPoolSize,
                               int queueCapacity, Duration timeout) {
        this.name = name;
        this.timeout = timeout;
        this.executor = new ThreadPoolExecutor(
                corePoolSize,
                maxPoolSize,
                60L, TimeUnit.SECONDS,
                new ArrayBlockingQueue<>(queueCapacity),
                new ThreadPoolExecutor.AbortPolicy() // Reject when full
        );
    }

    public <T> T execute(Supplier<T> supplier) {
        Future<T> future;
        try {
            future = ((ThreadPoolExecutor) executor).submit(supplier::get);
        } catch (RejectedExecutionException e) {
            rejectedCount.incrementAndGet();
            throw new BulkheadRejectedException(name,
                    "Thread pool and queue are full");
        }

        try {
            return future.get(timeout.toMillis(), TimeUnit.MILLISECONDS);
        } catch (java.util.concurrent.TimeoutException e) {
            future.cancel(true);
            throw new BulkheadTimeoutException(name,
                    "Call did not complete within " + timeout.toMillis() + "ms");
        } catch (ExecutionException e) {
            throw new RuntimeException(e.getCause());
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new RuntimeException("Bulkhead execution interrupted", e);
        }
    }
}

What the Scratch Implementation Reveals

Semaphore bulkheads do not free the caller’s thread. The semaphore limits concurrency, but the calling thread is blocked during the entire execution. If you have 20 semaphore permits and each call takes 5 seconds, 20 of your Tomcat threads are blocked. The semaphore prevents the 21st thread from being consumed, but it does not prevent 20 threads from being consumed. For the transaction platform, this is acceptable because the alternative (no bulkhead) would consume all 200 threads.

Thread pool bulkheads add context-switching overhead. Each call is executed on a separate thread, which means the request context (MDC, security context, transaction context) is not automatically propagated. You must explicitly copy context to the bulkhead thread. This is a source of bugs: logging loses the request ID, security checks fail, and transaction boundaries are wrong.

The queue capacity determines the behavior under load. A thread pool bulkhead with a queue of 0 rejects immediately when all threads are busy. A queue of 10 buffers 10 additional requests. The queue adds latency under load (requests wait in the queue) but improves throughput when the load is just slightly above capacity. For the transaction platform, a small queue (5-10) is appropriate: absorb brief bursts but reject sustained overload.

The Production Implementation

# PRODUCTION - application.yml
resilience4j:
  bulkhead:
    instances:
      fraudDetection:
        max-concurrent-calls: 20
        # Maximum 20 concurrent calls to fraud detection.
        # Sizing: 100 rps * 0.12s (p99 latency) * 1.5 (safety) = 18, rounded to 20.
        # If fraud degrades to 5s response, these 20 threads are occupied
        # but the other 180 Tomcat threads are available for other work.

        max-wait-duration: 100ms
        # Wait at most 100ms for a permit.
        # Short: if the bulkhead is full, the dependency is already degraded.
        # Waiting longer just queues work that will likely time out.

      balanceCheck:
        max-concurrent-calls: 30
        max-wait-duration: 50ms

      notification:
        max-concurrent-calls: 10
        # Notifications are non-critical. Limit them to 10 concurrent calls.
        # If this fills, notifications are queued via fallback.
        max-wait-duration: 0ms
        # Zero wait: if no permit available, fall back immediately.

// PRODUCTION - Bulkhead with annotation
@Service
public class FraudDetectionService {

    private final FraudDetectionClient fraudClient;
    private final FraudFallback fallback;

    @io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker(
            name = "fraudDetection", fallbackMethod = "fraudFallback")
    @io.github.resilience4j.bulkhead.annotation.Bulkhead(
            name = "fraudDetection", fallbackMethod = "fraudFallback")
    public FraudScore checkFraud(PaymentRequest request) {
        return fraudClient.score(request);
    }

    private FraudScore fraudFallback(PaymentRequest request, Throwable cause) {
        return fallback.fallbackScore(request, cause);
    }
}

The Test

// PRODUCTION - Test proving bulkhead isolation
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@Testcontainers
class BulkheadIntegrationTest {

    @Container
    static GenericContainer<?> wireMock = new GenericContainer<>(
            DockerImageName.parse("wiremock/wiremock:latest"))
            .withExposedPorts(8080);

    @DynamicPropertySource
    static void configureProperties(DynamicPropertyRegistry registry) {
        registry.add("fraud.service.url", () ->
                "http://localhost:" + wireMock.getMappedPort(8080));
        registry.add("resilience4j.bulkhead.instances.fraudDetection.max-concurrent-calls",
                () -> "5"); // Small pool for testing
        registry.add("resilience4j.bulkhead.instances.fraudDetection.max-wait-duration",
                () -> "0ms");
    }

    @Autowired
    private FraudDetectionService fraudDetectionService;

    @Autowired
    private BulkheadRegistry bulkheadRegistry;

    @Test
    void bulkheadRejects_whenConcurrencyExceeded() throws Exception {
        // Fraud service delays 5 seconds
        stubFraudServiceWithDelay(Duration.ofSeconds(5));

        // Submit 5 calls (fills the bulkhead)
        ExecutorService executor = Executors.newFixedThreadPool(10);
        CountDownLatch startLatch = new CountDownLatch(1);
        CountDownLatch doneLatch = new CountDownLatch(10);
        AtomicInteger rejections = new AtomicInteger(0);

        for (int i = 0; i < 10; i++) {
            executor.submit(() -> {
                try {
                    startLatch.await();
                    fraudDetectionService.checkFraud(samplePayment());
                } catch (BulkheadFullException e) {
                    rejections.incrementAndGet();
                } catch (Exception ignored) {
                } finally {
                    doneLatch.countDown();
                }
            });
        }

        startLatch.countDown(); // Release all threads simultaneously
        doneLatch.await(10, TimeUnit.SECONDS);

        // 5 calls should pass (bulkhead size), 5 should be rejected
        assertThat(rejections.get()).isEqualTo(5);
        executor.shutdown();
    }
}

The Observable Signal

# Bulkhead metrics
resilience4j_bulkhead_available_concurrent_calls{name="fraudDetection"}
resilience4j_bulkhead_max_allowed_concurrent_calls{name="fraudDetection"}

The Grafana panel: available / max as a percentage. When this drops below 20%, the bulkhead is 80% utilized, which means the dependency is consuming close to its allocated share. This is a leading indicator of degradation. Alert when available permits reach 0 for more than 30 seconds: the bulkhead is full, and excess requests are being rejected.