What Is the Cache-Aside Pattern?
Cache-aside (also called lazy loading) is the most common caching strategy in production systems. The application owns the interaction with both the cache and the database: it checks the cache first, fetches from the database only on a miss, writes the result into the cache, and returns the value to the caller. Subsequent reads for the same key hit the cache and never touch the database.
This contrasts with read-through caching, where the cache layer itself fetches from the database. Cache-aside gives the application explicit control over what gets cached and for how long.
Cache-Aside Flow
- Application checks the cache for
key. - Cache hit: return cached value immediately.
- Cache miss: query the database for the record.
- Write the database result into the cache with a TTL.
- Return the value to the caller.
The lazy nature means the cache is only populated on demand. Cold caches (after a restart or flush) will experience a burst of cache misses until the working set warms up.
TTL Strategy Per Entity Type
TTL should reflect how frequently the underlying data changes and how much staleness is acceptable:
- User profile: 5 minutes — changes occasionally, but staleness is noticeable.
- Product catalog: 1 hour — changes infrequently; short staleness acceptable.
- Static configuration: 24 hours — rarely changes; long TTL safe.
- Session data: sliding TTL, reset on each access.
Using a single global TTL is an anti-pattern. Per-entity TTLs reduce unnecessary database load while keeping stale data risk proportional to business tolerance.
Cache Warming
On deployment or after a cache flush, all keys are cold. If traffic arrives before the cache warms, every request hits the database simultaneously — the same thundering herd problem at deployment scale.
Proactive cache warming runs a background job before routing live traffic. It queries the database for the most-accessed keys (derived from analytics or query logs) and populates the cache. Only after warming is complete does the load balancer shift traffic to the new instance.
Thundering Herd on Cold Start
When a popular key expires or is evicted, many concurrent requests may all experience a cache miss at the same moment and all query the database for the same record. This spike can saturate the database.
Distributed lock solution (Redis SETNX): when a miss is detected, the first caller acquires a lock (SETNX on a lock key with a short TTL). It fetches from the database and populates the cache, then releases the lock. Other callers that failed to acquire the lock wait briefly and retry the cache read — by which time the first caller has populated it.
SETNX lock:user:42 "1" PX 500
# only one caller wins; others sleep and retry
The lock TTL must exceed the expected database fetch time but be short enough to not block callers indefinitely if the lock holder crashes.
Probabilistic Early Expiration (XFetch)
An alternative to lock-based stampede prevention. XFetch recomputes the cache value before it expires, with a probability that increases as the TTL approaches zero. The formula is:
recompute if: now - delta * beta * ln(rand()) >= expiry_time
This spreads recomputation across multiple callers over time, eliminating the sharp spike at expiry. No lock is needed. It works best when the computation is cheap enough to occasionally run early.
Eviction Policies
When the cache reaches its size limit, an eviction policy determines which entries to remove:
- LRU (Least Recently Used): evict the entry that was accessed longest ago. Implemented with a doubly linked list (O(1) move-to-front) plus a hash map for O(1) lookup. Works well when recency of access correlates with future access.
- LFU (Least Frequently Used): evict the entry with the lowest access count. Implemented with a min-heap on frequency counters. Works well for stable hot sets, but cold entries can accumulate high frequency counts over time (frequency aging helps).
- TTL-based passive expiry: entries are not proactively evicted; they are removed on next access after TTL expires, or lazily purged by a background sweep. Redis uses this combined with periodic random sampling.
- FIFO: simple but rarely optimal; ignores access patterns.
Write Invalidation Strategy
When the database record changes, the cache entry becomes stale. Two approaches:
- Delete on write (cache-aside invalidation): after writing to the database, delete the cache key. The next read will be a miss and will reload the fresh value. Simple and correct; temporary miss spike is acceptable.
- Update on write (write-through hybrid): after writing to the database, also write the new value to cache. Avoids the miss spike but requires the application to always have the full serialized value available at write time, which is not always the case.
Delete on write is safer and more common. Write-through is appropriate when the write path always produces the full value needed for caching (e.g., a computed aggregate).
SQL: Schema Design
-- Cache configuration per entity type
CREATE TABLE CacheConfig (
cache_name VARCHAR(128) PRIMARY KEY,
default_ttl_seconds INT NOT NULL,
max_size INT NOT NULL, -- max entries
eviction_policy VARCHAR(16) NOT NULL -- LRU | LFU | TTL
CHECK (eviction_policy IN ('LRU','LFU','TTL')),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
-- Sampled metrics for cache health monitoring
CREATE TABLE CacheMetric (
id BIGSERIAL PRIMARY KEY,
cache_name VARCHAR(128) NOT NULL REFERENCES CacheConfig(cache_name),
hits BIGINT NOT NULL DEFAULT 0,
misses BIGINT NOT NULL DEFAULT 0,
evictions BIGINT NOT NULL DEFAULT 0,
sampled_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_cache_metric_name_time ON CacheMetric(cache_name, sampled_at DESC);
Python: CacheAsideClient with Stampede Prevention
import time
import redis
import hashlib
import math
import random
from typing import Callable, Any, Optional
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
class CacheAsideClient:
def __init__(self, redis_client: redis.Redis, default_ttl: int = 300):
self.r = redis_client
self.default_ttl = default_ttl
self.LOCK_TTL_MS = 500 # lock held for max 500ms
def _lock_key(self, key: str) -> str:
return f'lock:{key}'
def get_or_load(
self,
key: str,
loader_fn: Callable[[], Any],
ttl: Optional[int] = None,
) -> Any:
"""
Fetch from cache. On miss, acquire distributed lock,
load from DB, populate cache. Others wait and retry.
"""
ttl = ttl or self.default_ttl
# Fast path: cache hit
cached = self.r.get(key)
if cached is not None:
return cached
lock_key = self._lock_key(key)
acquired = self.r.set(lock_key, '1', px=self.LOCK_TTL_MS, nx=True)
if acquired:
try:
# Re-check after acquiring lock (another caller may have populated)
cached = self.r.get(key)
if cached is not None:
return cached
value = loader_fn()
self.r.setex(key, ttl, value)
return value
finally:
self.r.delete(lock_key)
else:
# Wait for the lock holder to populate the cache
deadline = time.monotonic() + (self.LOCK_TTL_MS / 1000) + 0.1
while time.monotonic() < deadline:
time.sleep(0.05)
cached = self.r.get(key)
if cached is not None:
return cached
# Last resort: load directly (lock holder may have crashed)
return loader_fn()
def warm_cache(self, keys: list, loader_fn: Callable[[str], Any], ttl: Optional[int] = None):
"""Proactively populate cache keys before serving traffic."""
ttl = ttl or self.default_ttl
pipeline = self.r.pipeline()
for key in keys:
value = loader_fn(key)
pipeline.setex(key, ttl, value)
pipeline.execute()
print(f'Warmed {len(keys)} keys')
def invalidate(self, key: str):
"""Delete cache key on write (cache-aside invalidation)."""
self.r.delete(key)
print(f'Invalidated: {key}')
# Usage example
def load_user_profile(user_id: int) -> str:
# Simulate DB fetch
return f'{{"id": {user_id}, "name": "Alice"}}'
client = CacheAsideClient(r, default_ttl=300)
profile = client.get_or_load(
key=f'user:42',
loader_fn=lambda: load_user_profile(42),
ttl=300,
)
print(profile)
# Invalidate after update
client.invalidate('user:42')
Design Trade-offs Summary
| Concern | Approach | Trade-off |
|---|---|---|
| Cold start stampede | Distributed lock (SETNX) | Serializes first fetch; slight delay for waiters |
| Stampede (alternative) | XFetch probabilistic early recompute | Occasional early recompute; no coordination |
| Eviction | LRU | Good for recency-sensitive workloads |
| Eviction | LFU | Good for stable hot sets; needs aging |
| Write consistency | Delete on write | Brief miss window; always consistent |
| Write consistency | Write-through | No miss window; requires full value at write |
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety