Cache-Aside Pattern Low-Level Design: Lazy Loading, Cache Warming, Thundering Herd, and Eviction Policies

What Is the Cache-Aside Pattern?

Cache-aside (also called lazy loading) is the most common caching strategy in production systems. The application owns the interaction with both the cache and the database: it checks the cache first, fetches from the database only on a miss, writes the result into the cache, and returns the value to the caller. Subsequent reads for the same key hit the cache and never touch the database.

This contrasts with read-through caching, where the cache layer itself fetches from the database. Cache-aside gives the application explicit control over what gets cached and for how long.

Cache-Aside Flow

  1. Application checks the cache for key.
  2. Cache hit: return cached value immediately.
  3. Cache miss: query the database for the record.
  4. Write the database result into the cache with a TTL.
  5. Return the value to the caller.

The lazy nature means the cache is only populated on demand. Cold caches (after a restart or flush) will experience a burst of cache misses until the working set warms up.

TTL Strategy Per Entity Type

TTL should reflect how frequently the underlying data changes and how much staleness is acceptable:

  • User profile: 5 minutes — changes occasionally, but staleness is noticeable.
  • Product catalog: 1 hour — changes infrequently; short staleness acceptable.
  • Static configuration: 24 hours — rarely changes; long TTL safe.
  • Session data: sliding TTL, reset on each access.

Using a single global TTL is an anti-pattern. Per-entity TTLs reduce unnecessary database load while keeping stale data risk proportional to business tolerance.

Cache Warming

On deployment or after a cache flush, all keys are cold. If traffic arrives before the cache warms, every request hits the database simultaneously — the same thundering herd problem at deployment scale.

Proactive cache warming runs a background job before routing live traffic. It queries the database for the most-accessed keys (derived from analytics or query logs) and populates the cache. Only after warming is complete does the load balancer shift traffic to the new instance.

Thundering Herd on Cold Start

When a popular key expires or is evicted, many concurrent requests may all experience a cache miss at the same moment and all query the database for the same record. This spike can saturate the database.

Distributed lock solution (Redis SETNX): when a miss is detected, the first caller acquires a lock (SETNX on a lock key with a short TTL). It fetches from the database and populates the cache, then releases the lock. Other callers that failed to acquire the lock wait briefly and retry the cache read — by which time the first caller has populated it.

SETNX lock:user:42 "1" PX 500
# only one caller wins; others sleep and retry

The lock TTL must exceed the expected database fetch time but be short enough to not block callers indefinitely if the lock holder crashes.

Probabilistic Early Expiration (XFetch)

An alternative to lock-based stampede prevention. XFetch recomputes the cache value before it expires, with a probability that increases as the TTL approaches zero. The formula is:

recompute if: now - delta * beta * ln(rand()) >= expiry_time

This spreads recomputation across multiple callers over time, eliminating the sharp spike at expiry. No lock is needed. It works best when the computation is cheap enough to occasionally run early.

Eviction Policies

When the cache reaches its size limit, an eviction policy determines which entries to remove:

  • LRU (Least Recently Used): evict the entry that was accessed longest ago. Implemented with a doubly linked list (O(1) move-to-front) plus a hash map for O(1) lookup. Works well when recency of access correlates with future access.
  • LFU (Least Frequently Used): evict the entry with the lowest access count. Implemented with a min-heap on frequency counters. Works well for stable hot sets, but cold entries can accumulate high frequency counts over time (frequency aging helps).
  • TTL-based passive expiry: entries are not proactively evicted; they are removed on next access after TTL expires, or lazily purged by a background sweep. Redis uses this combined with periodic random sampling.
  • FIFO: simple but rarely optimal; ignores access patterns.

Write Invalidation Strategy

When the database record changes, the cache entry becomes stale. Two approaches:

  • Delete on write (cache-aside invalidation): after writing to the database, delete the cache key. The next read will be a miss and will reload the fresh value. Simple and correct; temporary miss spike is acceptable.
  • Update on write (write-through hybrid): after writing to the database, also write the new value to cache. Avoids the miss spike but requires the application to always have the full serialized value available at write time, which is not always the case.

Delete on write is safer and more common. Write-through is appropriate when the write path always produces the full value needed for caching (e.g., a computed aggregate).

SQL: Schema Design

-- Cache configuration per entity type
CREATE TABLE CacheConfig (
    cache_name         VARCHAR(128) PRIMARY KEY,
    default_ttl_seconds INT          NOT NULL,
    max_size           INT          NOT NULL,   -- max entries
    eviction_policy    VARCHAR(16)  NOT NULL    -- LRU | LFU | TTL
        CHECK (eviction_policy IN ('LRU','LFU','TTL')),
    created_at         TIMESTAMP    NOT NULL DEFAULT NOW(),
    updated_at         TIMESTAMP    NOT NULL DEFAULT NOW()
);

-- Sampled metrics for cache health monitoring
CREATE TABLE CacheMetric (
    id          BIGSERIAL   PRIMARY KEY,
    cache_name  VARCHAR(128) NOT NULL REFERENCES CacheConfig(cache_name),
    hits        BIGINT      NOT NULL DEFAULT 0,
    misses      BIGINT      NOT NULL DEFAULT 0,
    evictions   BIGINT      NOT NULL DEFAULT 0,
    sampled_at  TIMESTAMP   NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_cache_metric_name_time ON CacheMetric(cache_name, sampled_at DESC);

Python: CacheAsideClient with Stampede Prevention

import time
import redis
import hashlib
import math
import random
from typing import Callable, Any, Optional

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

class CacheAsideClient:
    def __init__(self, redis_client: redis.Redis, default_ttl: int = 300):
        self.r = redis_client
        self.default_ttl = default_ttl
        self.LOCK_TTL_MS = 500  # lock held for max 500ms

    def _lock_key(self, key: str) -> str:
        return f'lock:{key}'

    def get_or_load(
        self,
        key: str,
        loader_fn: Callable[[], Any],
        ttl: Optional[int] = None,
    ) -> Any:
        """
        Fetch from cache. On miss, acquire distributed lock,
        load from DB, populate cache. Others wait and retry.
        """
        ttl = ttl or self.default_ttl

        # Fast path: cache hit
        cached = self.r.get(key)
        if cached is not None:
            return cached

        lock_key = self._lock_key(key)
        acquired = self.r.set(lock_key, '1', px=self.LOCK_TTL_MS, nx=True)

        if acquired:
            try:
                # Re-check after acquiring lock (another caller may have populated)
                cached = self.r.get(key)
                if cached is not None:
                    return cached
                value = loader_fn()
                self.r.setex(key, ttl, value)
                return value
            finally:
                self.r.delete(lock_key)
        else:
            # Wait for the lock holder to populate the cache
            deadline = time.monotonic() + (self.LOCK_TTL_MS / 1000) + 0.1
            while time.monotonic() < deadline:
                time.sleep(0.05)
                cached = self.r.get(key)
                if cached is not None:
                    return cached
            # Last resort: load directly (lock holder may have crashed)
            return loader_fn()

    def warm_cache(self, keys: list, loader_fn: Callable[[str], Any], ttl: Optional[int] = None):
        """Proactively populate cache keys before serving traffic."""
        ttl = ttl or self.default_ttl
        pipeline = self.r.pipeline()
        for key in keys:
            value = loader_fn(key)
            pipeline.setex(key, ttl, value)
        pipeline.execute()
        print(f'Warmed {len(keys)} keys')

    def invalidate(self, key: str):
        """Delete cache key on write (cache-aside invalidation)."""
        self.r.delete(key)
        print(f'Invalidated: {key}')


# Usage example
def load_user_profile(user_id: int) -> str:
    # Simulate DB fetch
    return f'{{"id": {user_id}, "name": "Alice"}}'

client = CacheAsideClient(r, default_ttl=300)
profile = client.get_or_load(
    key=f'user:42',
    loader_fn=lambda: load_user_profile(42),
    ttl=300,
)
print(profile)

# Invalidate after update
client.invalidate('user:42')

Design Trade-offs Summary

Concern Approach Trade-off
Cold start stampede Distributed lock (SETNX) Serializes first fetch; slight delay for waiters
Stampede (alternative) XFetch probabilistic early recompute Occasional early recompute; no coordination
Eviction LRU Good for recency-sensitive workloads
Eviction LFU Good for stable hot sets; needs aging
Write consistency Delete on write Brief miss window; always consistent
Write consistency Write-through No miss window; requires full value at write

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a distributed lock prevent the thundering herd problem in cache-aside?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When many concurrent requests experience a cache miss for the same key, only one caller acquires a Redis SETNX lock. That caller fetches from the database and populates the cache. All other callers detect the lock, wait briefly, and then read from cache once it is populated. This serializes the database fetch to a single call instead of N simultaneous calls.”
}
},
{
“@type”: “Question”,
“name”: “When should you use LRU vs LFU eviction?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use LRU when access patterns are recency-driven: recently accessed items are likely to be accessed again soon. Use LFU when there is a stable hot set of items accessed repeatedly over a long period. LFU requires frequency aging to prevent old high-frequency items from blocking new hot items.”
}
},
{
“@type”: “Question”,
“name”: “How do you tune TTL for cached entities?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “TTL should reflect the acceptable staleness for each entity type. User profiles might tolerate 5 minutes of staleness, product catalogs an hour, and static configuration 24 hours. Monitor cache hit rate and stale read rate together: a very high hit rate with frequent stale reads indicates TTL is too long; a low hit rate indicates TTL is too short or the working set exceeds cache capacity.”
}
},
{
“@type”: “Question”,
“name”: “Should you delete or update the cache key after a database write?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Delete (invalidate) is safer and more common. After writing to the database, delete the cache key. The next read will miss and reload the fresh value. Updating the cache on write (write-through hybrid) avoids the miss window but requires the full serialized value to be available at write time, which is not always the case in complex systems.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a distributed lock prevent thundering herd on cache miss?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The first request to miss the cache acquires a Redis lock (SETNX with TTL); other concurrent misses for the same key fail to acquire the lock and wait; after the lock holder populates the cache, waiters read the cached value and the lock is released.”
}
},
{
“@type”: “Question”,
“name”: “How is TTL tuned per entity type?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “TTL should be shorter than the expected update frequency; frequently changing data (stock prices, live scores) uses TTLs of seconds; slowly changing data (user profiles, product details) uses minutes to hours; configuration data that rarely changes can use hours to days.”
}
},
{
“@type”: “Question”,
“name”: “How does XFetch prevent stampedes without a lock?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “XFetch probabilistically recomputes a cache entry before it expires; the recomputation probability increases as expiry approaches, adjusted by the recompute time; one request recomputes early, refreshing the cache before expiry without other requests noticing.”
}
},
{
“@type”: “Question”,
“name”: “When should write-through be preferred over delete-on-write invalidation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Write-through (update cache on write) is preferred when the next read is very likely and the new value is available immediately at write time; delete-on-write is simpler and preferred when the new cached value requires a complex computation or join not available at write time.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

Scroll to Top