The Bulkhead Analogy
A ship's hull is divided into watertight compartments (bulkheads). If one compartment floods, the others remain dry — the ship stays afloat. In software, a bulkhead isolates failures in one downstream dependency from affecting other dependencies. Without bulkheads, a single slow service can exhaust the shared thread pool and bring down the entire application.
The Problem: Shared Thread Pool Exhaustion
A typical web server has a shared thread pool (e.g., 200 Tomcat threads). If service A becomes slow (10s response time) and receives 50 concurrent requests, those 50 threads are blocked waiting. Incoming requests for services B and C — which are healthy — queue up, find no available threads, and time out. A failure in A cascades to B and C through shared resource exhaustion, not through any direct dependency.
Thread Pool Isolation
Assign a dedicated thread pool to each downstream dependency. Calls to that dependency execute in its pool, not the shared request pool:
ExecutorService paymentPool = Executors.newFixedThreadPool(20);
ExecutorService recommendationPool = Executors.newFixedThreadPool(10);
ExecutorService adsPool = Executors.newFixedThreadPool(5);
// Call payment service in its own pool
Future result = paymentPool.submit(() -> paymentService.charge(request));
If the payment service hangs and exhausts its 20 threads, recommendation and ads calls are unaffected — they use their own pools. The main request handler threads are also unaffected.
Thread Pool Sizing
Size each pool based on expected concurrency and acceptable latency:
pool_size = (requests_per_second * avg_latency_seconds) + headroom
-- Little's Law: throughput = concurrency / latency
For a service handling 100 RPS with 100ms average latency: 100 * 0.1 = 10 threads at steady state. Add 50-100% headroom for spikes: 15-20 threads.
Semaphore Bulkhead
Thread pool isolation adds overhead: context switching between thread pools, extra memory per thread. For synchronous in-process calls (no I/O blocking), a semaphore bulkhead limits concurrency without a separate thread pool:
Semaphore semaphore = new Semaphore(20); // max 20 concurrent calls
boolean acquired = semaphore.tryAcquire(timeout, TimeUnit.MILLISECONDS);
if (!acquired) {
throw new BulkheadFullException("Service unavailable");
}
try {
return callDownstream();
} finally {
semaphore.release();
}
Semaphore bulkheads add minimal overhead but do not protect against thread exhaustion from blocking I/O — for that, thread pool isolation is required.
Queue Depth Cap
Thread pools have a task queue for requests waiting for a thread. An unbounded queue causes latency spikes — requests queue for seconds while earlier requests complete. Cap the queue:
ThreadPoolExecutor executor = new ThreadPoolExecutor(
10, // core pool size
20, // max pool size
60, TimeUnit.SECONDS,
new ArrayBlockingQueue(50), // bounded queue, not LinkedBlockingQueue
new ThreadPoolExecutor.AbortPolicy() // reject when queue full
);
With a bounded queue of 50 and max pool of 20, the bulkhead can handle 70 concurrent requests before rejecting. Rejection is fast — it fails immediately rather than queuing indefinitely. Callers handle rejection with a fallback or circuit breaker.
Timeout Per Bulkhead
Set a timeout for each bulkhead call independent of the main request timeout. A downstream service that consistently hits the bulkhead timeout is a candidate for circuit breaker tripping:
BulkheadConfig config = BulkheadConfig.custom()
.maxConcurrentCalls(20)
.maxWaitDuration(Duration.ofMilliseconds(100)) // fail fast if no slot in 100ms
.build();
The maxWaitDuration controls how long a caller waits for a slot in the semaphore. Set low (50-200ms) for interactive request paths to fail fast rather than queue.
Metrics: What to Monitor
- Active count: Current concurrent calls in the bulkhead. Sustained high values indicate the pool is undersized.
- Rejected count: Requests rejected due to full pool/semaphore. Alert if non-zero in production.
- Queue size (thread pool): Requests waiting for a thread. High queue = latency spike incoming.
- Wait time (semaphore): How long callers wait for a slot. High wait = pool undersized or downstream slow.
Combining Bulkhead with Circuit Breaker and Retry
In Resilience4j, decorate in this order (outermost to innermost):
- Retry: Outermost — retries the entire decorated call.
- Circuit Breaker: Tracks outcomes; trips on high failure rate.
- Bulkhead: Limits concurrency entering the circuit breaker.
- Timeout: Innermost — bounds individual call duration.
Supplier decorated = Decorators.ofSupplier(this::callDownstream)
.withTimeout(timeoutConfig)
.withBulkhead(bulkhead)
.withCircuitBreaker(circuitBreaker)
.withRetry(retry)
.decorate();
Real-World Example: Search API
A search API calls three downstream services: product catalog, recommendations, and ads. Without bulkheads, a slow ads service (50ms P99 becomes 5s) exhausts threads and breaks product results. With bulkheads:
- Ads pool: 5 threads. When full, reject ads calls immediately, return empty ads.
- Recommendations pool: 10 threads. Independent of ads pool.
- Product catalog pool: 30 threads. Core dependency — larger pool, no fallback to empty.
The ads slowdown fills the ads pool, triggers fast rejection, and the page renders without ads. Product results and recommendations are unaffected.
Resilience4j BulkheadConfig Parameters
// Semaphore bulkhead
BulkheadConfig.custom()
.maxConcurrentCalls(25)
.maxWaitDuration(Duration.ofMilliseconds(50))
.build();
// Thread pool bulkhead
ThreadPoolBulkheadConfig.custom()
.maxThreadPoolSize(10)
.coreThreadPoolSize(5)
.queueCapacity(20)
.keepAliveDuration(Duration.ofMilliseconds(20))
.build();
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does thread pool isolation prevent cascading failures?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each downstream dependency is assigned its own bounded thread pool; when that dependency becomes slow, only its pool fills up and requests are rejected fast-fail without consuming threads from other pools. This contains the blast radius so a latency spike in one service cannot exhaust the shared thread pool and starve unrelated features.”
}
},
{
“@type”: “Question”,
“name”: “When should semaphore bulkheads be used instead of thread pool bulkheads?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Semaphore bulkheads are preferable when the protected call is non-blocking or very fast (e.g., an in-process cache lookup or a reactive/async call) because they avoid the overhead of thread creation and context switching. Thread pool bulkheads are better for synchronous blocking I/O where you need true thread isolation and the ability to time out stuck threads independently.”
}
},
{
“@type”: “Question”,
“name”: “How is queue depth bounded in a bulkhead?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The bulkhead's executor is configured with a bounded work queue (e.g., a LinkedBlockingQueue with a fixed capacity); requests that arrive when both the thread pool and the queue are full are rejected immediately with a capacity exception rather than queued indefinitely. Keeping the queue small (often equal to the pool size) ensures rejection happens fast and latency percentiles remain predictable.”
}
},
{
“@type”: “Question”,
“name”: “How does bulkhead combine with circuit breaker?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Bulkhead and circuit breaker are complementary: the bulkhead limits concurrency so a slow dependency can't consume unbounded resources, while the circuit breaker monitors the error rate and opens to stop calls entirely when the failure threshold is crossed. In practice the bulkhead sits in front of the circuit breaker — rejected-by-bulkhead calls increment the circuit's failure counter, causing it to open faster under saturation.”
}
}
]
}
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering