Bulkhead Pattern Low-Level Design: Thread Pool Isolation, Semaphore Bulkheads, and Resource Partitioning

The Bulkhead Analogy

A ship's hull is divided into watertight compartments (bulkheads). If one compartment floods, the others remain dry — the ship stays afloat. In software, a bulkhead isolates failures in one downstream dependency from affecting other dependencies. Without bulkheads, a single slow service can exhaust the shared thread pool and bring down the entire application.

The Problem: Shared Thread Pool Exhaustion

A typical web server has a shared thread pool (e.g., 200 Tomcat threads). If service A becomes slow (10s response time) and receives 50 concurrent requests, those 50 threads are blocked waiting. Incoming requests for services B and C — which are healthy — queue up, find no available threads, and time out. A failure in A cascades to B and C through shared resource exhaustion, not through any direct dependency.

Thread Pool Isolation

Assign a dedicated thread pool to each downstream dependency. Calls to that dependency execute in its pool, not the shared request pool:

ExecutorService paymentPool       = Executors.newFixedThreadPool(20);
ExecutorService recommendationPool = Executors.newFixedThreadPool(10);
ExecutorService adsPool            = Executors.newFixedThreadPool(5);

// Call payment service in its own pool
Future result = paymentPool.submit(() -> paymentService.charge(request));

If the payment service hangs and exhausts its 20 threads, recommendation and ads calls are unaffected — they use their own pools. The main request handler threads are also unaffected.

Thread Pool Sizing

Size each pool based on expected concurrency and acceptable latency:

pool_size = (requests_per_second * avg_latency_seconds) + headroom
-- Little's Law: throughput = concurrency / latency

For a service handling 100 RPS with 100ms average latency: 100 * 0.1 = 10 threads at steady state. Add 50-100% headroom for spikes: 15-20 threads.

Semaphore Bulkhead

Thread pool isolation adds overhead: context switching between thread pools, extra memory per thread. For synchronous in-process calls (no I/O blocking), a semaphore bulkhead limits concurrency without a separate thread pool:

Semaphore semaphore = new Semaphore(20);  // max 20 concurrent calls

boolean acquired = semaphore.tryAcquire(timeout, TimeUnit.MILLISECONDS);
if (!acquired) {
    throw new BulkheadFullException("Service unavailable");
}
try {
    return callDownstream();
} finally {
    semaphore.release();
}

Semaphore bulkheads add minimal overhead but do not protect against thread exhaustion from blocking I/O — for that, thread pool isolation is required.

Queue Depth Cap

Thread pools have a task queue for requests waiting for a thread. An unbounded queue causes latency spikes — requests queue for seconds while earlier requests complete. Cap the queue:

ThreadPoolExecutor executor = new ThreadPoolExecutor(
    10,     // core pool size
    20,     // max pool size
    60, TimeUnit.SECONDS,
    new ArrayBlockingQueue(50),       // bounded queue, not LinkedBlockingQueue
    new ThreadPoolExecutor.AbortPolicy()  // reject when queue full
);

With a bounded queue of 50 and max pool of 20, the bulkhead can handle 70 concurrent requests before rejecting. Rejection is fast — it fails immediately rather than queuing indefinitely. Callers handle rejection with a fallback or circuit breaker.

Timeout Per Bulkhead

Set a timeout for each bulkhead call independent of the main request timeout. A downstream service that consistently hits the bulkhead timeout is a candidate for circuit breaker tripping:

BulkheadConfig config = BulkheadConfig.custom()
    .maxConcurrentCalls(20)
    .maxWaitDuration(Duration.ofMilliseconds(100))  // fail fast if no slot in 100ms
    .build();

The maxWaitDuration controls how long a caller waits for a slot in the semaphore. Set low (50-200ms) for interactive request paths to fail fast rather than queue.

Metrics: What to Monitor

  • Active count: Current concurrent calls in the bulkhead. Sustained high values indicate the pool is undersized.
  • Rejected count: Requests rejected due to full pool/semaphore. Alert if non-zero in production.
  • Queue size (thread pool): Requests waiting for a thread. High queue = latency spike incoming.
  • Wait time (semaphore): How long callers wait for a slot. High wait = pool undersized or downstream slow.

Combining Bulkhead with Circuit Breaker and Retry

In Resilience4j, decorate in this order (outermost to innermost):

  1. Retry: Outermost — retries the entire decorated call.
  2. Circuit Breaker: Tracks outcomes; trips on high failure rate.
  3. Bulkhead: Limits concurrency entering the circuit breaker.
  4. Timeout: Innermost — bounds individual call duration.
Supplier decorated = Decorators.ofSupplier(this::callDownstream)
    .withTimeout(timeoutConfig)
    .withBulkhead(bulkhead)
    .withCircuitBreaker(circuitBreaker)
    .withRetry(retry)
    .decorate();

Real-World Example: Search API

A search API calls three downstream services: product catalog, recommendations, and ads. Without bulkheads, a slow ads service (50ms P99 becomes 5s) exhausts threads and breaks product results. With bulkheads:

  • Ads pool: 5 threads. When full, reject ads calls immediately, return empty ads.
  • Recommendations pool: 10 threads. Independent of ads pool.
  • Product catalog pool: 30 threads. Core dependency — larger pool, no fallback to empty.

The ads slowdown fills the ads pool, triggers fast rejection, and the page renders without ads. Product results and recommendations are unaffected.

Resilience4j BulkheadConfig Parameters

// Semaphore bulkhead
BulkheadConfig.custom()
    .maxConcurrentCalls(25)
    .maxWaitDuration(Duration.ofMilliseconds(50))
    .build();

// Thread pool bulkhead
ThreadPoolBulkheadConfig.custom()
    .maxThreadPoolSize(10)
    .coreThreadPoolSize(5)
    .queueCapacity(20)
    .keepAliveDuration(Duration.ofMilliseconds(20))
    .build();

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

Scroll to Top