Low Level Design: Caching Strategies

⏱ 9 min read

Why Caching Matters

Caching is one of the highest-leverage tools in system design. A database read can take 1–10 ms; a cache hit from Redis or Memcached takes under 1 ms. At scale, that gap compounds: a service handling 100k requests/second that saves even 2 ms per request is eliminating 200 seconds of aggregate DB wait per second. Beyond latency, caching reduces DB load, cuts infrastructure costs, and protects downstream systems during traffic spikes.

Cache-Aside (Lazy Loading)

Cache-aside is the most common pattern. The application owns the cache interaction: on a read, check the cache first. On a miss, fetch from the database, populate the cache, and return the result. On a write, invalidate or update the cache entry.

def get_user(user_id):
    user = cache.get(f"user:{user_id}")
    if user is None:
        user = db.query("SELECT * FROM users WHERE id = %s", user_id)
        cache.set(f"user:{user_id}", user, ttl=300)
    return user

Trade-offs: Cold start problem — a freshly deployed or flushed cache means every request is a cache miss until data warms up. Stale data risk exists between DB writes and cache invalidation. Cache-aside is resilient to cache failures since the app falls back to the DB automatically.

Read-Through Cache

In a read-through pattern, the cache sits in front of the database. The application only ever talks to the cache. On a miss, the cache itself fetches the data from the DB, populates the entry, and returns it. This simplifies application code — the app doesn’t need cache-miss handling logic. Libraries like AWS DAX (for DynamoDB) implement read-through transparently.

Trade-offs: The first read for any key is always slow (cache miss + DB fetch). The cache layer must understand your data model, which can add coupling. Works best when your cache provider supports it natively.

Write-Through Cache

Every write goes to both the cache and the database synchronously, in the same operation. The write is not acknowledged until both succeed. This guarantees strong consistency — reads always see the latest data.

Trade-offs: Writes are slower because you pay the cost of two writes on every mutation. Cache is populated with data that may never be read (write-heavy workloads with infrequent reads waste cache space). Most appropriate when read latency and consistency are both critical.

Write-Behind (Write-Back) Cache

Writes go to the cache immediately, and the cache flushes dirty entries to the database asynchronously — typically on a timer or when an entry is evicted. This maximizes write throughput since the application only waits for an in-memory write.

Trade-offs: Risk of data loss if the cache node crashes before flushing. DB and cache are temporarily inconsistent. Appropriate for high write throughput workloads where occasional data loss is acceptable (e.g., analytics counters, activity feeds). Not appropriate for financial transactions.

Write-Around Cache

Writes go directly to the database, bypassing the cache entirely. Only reads populate the cache. This avoids polluting the cache with write-once or rarely-read data — for example, bulk data imports, audit logs, or archival writes that will never be read through the hot path.

Trade-offs: The first read after a write is always a cache miss. Useful when write patterns differ significantly from read patterns and you want to preserve cache space for frequently-read data.

Eviction Policies

When the cache is full, an eviction policy determines which entries to remove:

LRU (Least Recently Used): Evict the entry that was accessed longest ago. Works well for most workloads. Redis default. Implemented with a doubly linked list + hash map (O(1) get and put).
LFU (Least Frequently Used): Evict the entry accessed fewest times. Better for workloads with a stable hot set. More complex to implement. Redis supports it via allkeys-lfu policy.
TTL (Time-To-Live): Entries expire after a fixed duration regardless of access pattern. Simple and predictable. Essential for data freshness guarantees.
FIFO (First In, First Out): Evict the oldest-inserted entry. Simple but ignores access frequency — rarely optimal.
Random: Evict a random entry. Surprisingly competitive with LRU in some workloads, with much lower overhead.

TTL Selection

TTL is a critical knob. Too short: frequent cache misses, high DB load, good data freshness. Too long: stale data served to users, but high cache hit rates and low DB load. The right value depends on how often the underlying data changes and how stale is acceptable.

Practical strategies: use shorter TTLs for mutable user-facing data (profile info: 60–300s), longer TTLs for slow-changing reference data (product catalog: hours), and no TTL for immutable data (content addressed by hash). Jitter TTLs (add ±10–20% randomness) to prevent synchronized mass expiry across a cluster.

Cache Stampede (Thundering Herd)

When a popular cache key expires, many concurrent requests simultaneously miss the cache and all rush to the database to recompute the same value. This thundering herd can overwhelm the DB and cause cascading failures.

Solutions:

Mutex/lock-based cache fill: Only one request acquires a lock to recompute the value; others wait or serve stale data. Simple but adds latency for waiting requests.
Probabilistic early expiry (PER): Before the TTL expires, each request has a small increasing probability of proactively refreshing the cache. Spreads the refresh work over time with no coordination needed.
Request coalescing: Deduplicate in-flight requests for the same key. Return the same future/promise to all concurrent waiters. First requester fetches; others get the result when it arrives.
Background refresh: A background job refreshes popular keys before they expire, ensuring the cache is always warm.

Negative Caching

If your application queries the DB for a key that doesn’t exist (e.g., a user lookup for a non-existent ID), and you don’t cache the miss, every repeat request goes to the DB. Negative caching means storing a sentinel value (e.g., NULL or an empty object) with a short TTL for keys that produce no result. This prevents DB hammering from repeated lookups of non-existent entities — common in abuse scenarios (bots probing IDs) or after deletions.

Distributed Cache Topology and Consistency

Client-side cache: Each application instance maintains its own local cache (e.g., in-process HashMap or Caffeine). No network hop, lowest latency, but each instance has its own copy — consistency is hard and memory usage multiplies with instance count.

Dedicated cache cluster (Redis, Memcached): All application instances share a single cache. Consistent view, but adds a network hop (~0.5–1 ms intra-datacenter). Sharding across cache nodes via consistent hashing minimizes key remapping when nodes are added or removed.

Read-your-writes consistency: After a user writes data, they expect to read their own write immediately. With cache replication, their next read might hit a replica that hasn’t received the invalidation yet. Solutions: route that user’s reads to the primary for a short window after their write, or use sticky sessions to a specific cache node.

Frequently Asked Questions

What is the difference between cache-aside and read-through caching?

Cache-aside (lazy loading): the application code first checks the cache. On miss, the application fetches from the database and populates the cache. The application manages cache population. Read-through: the cache sits in front of the database and automatically fetches from the database on miss, transparently to the application. Cache-aside gives the application full control over what is cached. Read-through simplifies application code but all misses go to the same backing store.

What is write-through caching and when should it be used?

Write-through: every write goes to both the cache and the database synchronously before returning success to the client. The cache is always consistent with the database. Use when: reads are frequent and data must be fresh, the latency of a synchronous dual write is acceptable, and data loss on crash is unacceptable. Write-through increases write latency (two writes per operation) but eliminates stale cache reads. It pairs well with read-through for a fully transparent cache layer.

What is the cache stampede problem and how do you solve it?

A cache stampede (thundering herd) occurs when a popular cached item expires and many concurrent requests simultaneously miss the cache and all query the database at once. This can overwhelm the database. Solutions: (1) Mutex / distributed lock: the first miss acquires a lock, fetches from DB, populates cache; subsequent misses wait for the lock then read from cache. (2) Probabilistic early expiry: before the TTL expires, each request has a small probability of refreshing early, spreading the load over time. (3) Request coalescing: deduplicate concurrent misses for the same key, making only one DB call.

When should you use write-behind (write-back) caching?

Write-behind: writes go to the cache immediately and are asynchronously flushed to the database in batches. This maximizes write throughput (cache absorbs write bursts) and reduces database write amplification. Use when: write throughput exceeds DB capacity, slight data loss on cache crash is acceptable (or mitigated by WAL), and the application can tolerate eventual database consistency. Write-behind is used in gaming leaderboards, IoT sensor ingestion, and analytics counters where losing a few recent writes is acceptable.

How do you prevent cache pollution from large sequential scans?

Large scans (like bulk exports or analytics queries) can evict frequently-used items from the cache if using LRU. Solutions: (1) Scan-resistant eviction: use LRU-K (evict based on K-th most recent access) or CLOCK-Pro, which resists one-time scans. (2) Separate cache pools: route scan queries to a dedicated cache pool with aggressive eviction, protecting the main pool. (3) Bypass cache for scans: detect large range queries and route them directly to the database or a read replica. (4) Prefetch aware caching: explicitly mark scanned items as low-priority.