The thundering herd problem occurs when many processes or threads simultaneously wake up to compete for the same scarce resource — overwhelming the system. Common causes: a cache expiry that triggers simultaneous database queries from hundreds of servers, a circuit breaker opening that sends all traffic to a backup system simultaneously, or a spike of connections on server restart. Solving thundering herd requires preventing simultaneous contention before it occurs.
Cache Stampede
Cache stampede (cache dogpile) is the most common thundering herd variant. A popular cache key expires; simultaneously, hundreds of servers query the database to regenerate it — the database is overwhelmed by identical queries. Solutions: mutex locking (only one server may regenerate the cache at a time; others wait or return stale data), early cache expiration (probabilistic early recomputation — before the TTL actually expires, each request has a small probability of recomputing the cache; the probability increases as expiration approaches, spreading the regeneration load), and stale-while-revalidate (return the stale cached value immediately while a background process regenerates the cache — eliminates the stampede entirely; requires accepting briefly stale data).
Probabilistic Early Expiration
XFetch (Optimal Probabilistic Cache Stampede Prevention): when a cache entry is fetched, compute whether to regenerate it before expiry. Recompute if: -beta * delta * ln(random()) > time_to_expiry. Where delta = time to recompute the value (seconds), beta = tuning parameter (1.0 is a good default). This formula ensures that as time_to_expiry decreases, more requests trigger early recomputation. On average, recomputation happens once before expiry, spread randomly across requests — no stampede. The key insight: don’t wait for expiry; start recomputing probabilistically while the current value is still valid.
Mutex Lock Approach
When a cache miss occurs, acquire a distributed lock (Redis SET NX with TTL) before querying the database. If the lock is acquired, query the database, populate the cache, and release the lock. If the lock is not acquired (another process is regenerating), either: wait for the lock and then read the freshly-populated cache (reduces stampede but adds latency for waiters), or return the stale cached value (if available) while another process regenerates (no additional latency). The lock must have a TTL slightly longer than the maximum expected regeneration time — a crashed process holding the lock must not block indefinitely. Use a shorter lock TTL than the regeneration time with lock renewal (extend TTL while regeneration is in progress).
TTL Jitter
When many cache entries are set simultaneously (e.g., all users’ sessions at startup, all product prices at midnight batch update), they expire simultaneously — causing a mass stampede. TTL jitter randomizes expiry to spread the load: instead of TTL = 3600s, use TTL = 3600 + random(-300, 300). This ensures expirations are spread over a 10-minute window rather than all occurring at the same second. Apply jitter to: session TTLs (spread logouts across a window), cache warming TTLs (spread cache coldness), and cron-triggered cache invalidation (add jitter to the cron trigger). Jitter is a simple, effective defense against correlated cache expirations.
Thundering Herd on Server Restart
When a backend server restarts, clients that were connected re-establish connections simultaneously — overloading the server before it is ready. Solutions: exponential backoff with jitter in client reconnection logic (all clients wait different intervals before reconnecting); connection draining on shutdown (the server stops accepting new connections before terminating, giving existing clients time to reconnect to other servers); and graceful warm-up (the load balancer routes a small fraction of traffic to a new instance initially, increasing as the instance’s CPU and memory warm up — avoids the cold-start spike). Kubernetes readiness probes prevent traffic to pods that are not fully initialized.
Request Coalescing
Request coalescing deduplicates in-flight identical requests: if 100 requests for the same cache miss arrive simultaneously, only one query is sent to the database; the other 99 wait for the first to complete and return the same result. Implementation: a singleflight map keyed by the request’s cache key. When a request arrives: if a singleflight is already in-progress for this key, subscribe to the in-progress result (don’t issue a new query); if no singleflight exists, start a new one and issue the query. When the query completes, fan the result out to all waiting subscribers. Go’s golang.org/x/sync/singleflight package implements this pattern. Request coalescing eliminates duplicate database queries for the same data regardless of how many concurrent requests arrive.
Circuit Breaker Thundering Herd
When a circuit breaker opens and then transitions to half-open, it typically allows one request through. If the request succeeds, the breaker closes and all queued requests flood through simultaneously — a thundering herd at the dependency. Mitigations: half-open gradual recovery (allow a small percentage of requests through, not a single binary allow-all), exponential backoff before allowing requests through the half-open breaker (delay the first recovery attempt, then gradually allow more), and sliding window recovery (gradually increase the allowed fraction over a recovery period rather than snapping to 100% immediately). Configure the maximum number of concurrent requests allowed through in half-open state and increase gradually as the dependency proves healthy.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering
See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Atlassian Interview Guide
See also: Coinbase Interview Guide
See also: Shopify Interview Guide
See also: Snap Interview Guide
See also: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems