Caching is the single highest-leverage optimization in most system designs. Before you reach for database sharding or complex infrastructure, a well-placed cache can reduce database load by 90% and cut response times from hundreds of milliseconds to single digits. Interviewers know this, and they expect you to know it too.
The question usually comes after you’ve described a system: “Your database is getting hammered by reads. How do you scale it?” The right first answer is almost always a cache.
Strategy
Don’t just say “add a cache.” Walk through where the cache sits, what gets cached, how data gets in and out, and what happens when cache and database disagree. These are the four caching patterns. Know them by name.
The Four Caching Patterns
1. Cache-Aside (Lazy Loading)
The application code manages the cache directly. On a read:
- Check the cache. If hit → return data.
- If miss → query the database, store result in cache, return data.
def get_user(user_id):
# 1. Check cache
user = cache.get(f"user:{user_id}")
if user:
return user
# 2. Cache miss — go to DB
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
# 3. Populate cache for next time
cache.set(f"user:{user_id}", user, ttl=3600)
return user
Pros: Only caches what’s actually requested. Cache failures are non-fatal — the app falls back to the DB gracefully. Works well for read-heavy workloads with uneven access patterns.
Cons: First request after a cache miss (or expiry) is slow. Under high load, many simultaneous cache misses for the same key cause a thundering herd — dozens of requests all hitting the DB simultaneously. Fix with cache stampede protection (mutex lock or probabilistic early expiration).
When to use: Most web applications. User profiles, product catalogs, content feeds — anything where data is read far more than written.
2. Write-Through
Every write goes to the cache and the database synchronously, in the same operation.
def update_user(user_id, data):
# Write to DB first
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
# Then update cache
cache.set(f"user:{user_id}", data, ttl=3600)
Pros: Cache is always warm and consistent with the DB. No cache misses after writes. Good for write-heavy workloads where you want reads to always hit cache.
Cons: Every write pays double latency (DB + cache). You cache data that may never be read again (write-heavy, read-light data wastes cache space). Use TTLs to evict stale entries.
When to use: Systems where data consistency between cache and DB is critical and you can afford slightly slower writes. User preference settings, configuration data.
3. Write-Behind (Write-Back)
Write to the cache immediately, acknowledge the write to the client, and asynchronously flush to the database later.
def update_user(user_id, data):
cache.set(f"user:{user_id}", data) # fast, synchronous
write_queue.push({"user_id": user_id, "data": data}) # async
return "OK"
# Background worker
def flush_worker():
while True:
item = write_queue.pop()
db.execute("UPDATE users SET ... WHERE id = ?", item["user_id"], item["data"])
Pros: Extremely fast writes — the client doesn’t wait for the DB. Great for write-heavy workloads (gaming leaderboards, analytics counters, IoT sensor data).
Cons: Data loss risk — if the cache crashes before flushing, writes are lost. The async queue is a new failure point. Harder to implement correctly.
When to use: High-throughput write scenarios where occasional data loss is acceptable, or where you batch many small writes into fewer large DB writes (counter aggregation, view counts).
4. Read-Through
The cache sits in front of the database as a transparent proxy. The application only talks to the cache; the cache fetches from the DB on a miss automatically.
# Application code — clean, no DB logic
user = cache.get(f"user:{user_id}")
# Cache handles the miss internally, populates itself, returns data
Pros: Cleaner application code — no explicit cache management logic. Same behavior as cache-aside but encapsulated in the cache layer.
Cons: First request for any key is always slow (cold start). Requires a cache that supports read-through (Redis doesn’t natively; you’d build this or use a proxy like Twemproxy). Harder to debug cache misses.
When to use: When you want to abstract caching completely from application code. Common in ORM-level caching (Hibernate second-level cache, Django’s cache framework).
Cache Eviction Policies
When the cache is full, something must go. The policy determines what:
- LRU (Least Recently Used): Evict the key that hasn’t been accessed for the longest time. Default for most caches. Best general-purpose choice.
- LFU (Least Frequently Used): Evict the key accessed the fewest times. Better when some keys are accessed constantly (hot keys) and others are accessed in bursts but then forgotten.
- FIFO: Evict the oldest-inserted key regardless of access. Simple but rarely optimal.
- Random: Evict a random key. Surprisingly decent in practice; avoids the overhead of tracking access order.
- TTL-based: Keys expire after a set time. Not an eviction policy per se, but layered on top of LRU/LFU in production Redis deployments.
Redis default is noeviction (returns errors when full) — you almost always want to set this to allkeys-lru or volatile-lru in production.
Redis vs. Memcached
This is a common follow-up question:
| | Redis | Memcached |
|—|—|—|
| Data types | Strings, hashes, lists, sets, sorted sets, bitmaps, streams | Strings only |
| Persistence | Optional (RDB snapshots, AOF logs) | None |
| Replication | Yes (primary-replica, Sentinel, Cluster) | No native replication |
| Pub/Sub | Yes | No |
| Lua scripting | Yes | No |
| Multi-threading | Single-threaded (I/O multiplexing) | Multi-threaded |
| Use case | General-purpose, session store, pub/sub, leaderboards, rate limiting | Pure cache, high-throughput simple key-value |
Choose Redis when you need persistence, replication, rich data structures (sorted sets for leaderboards, pub/sub for notifications), or atomic operations.
Choose Memcached when you have a pure caching workload at very high throughput and want the simplicity of a multi-threaded daemon with no persistence overhead. In 2026, most teams default to Redis.
Cache Problems You Must Know
Cache Stampede (Thundering Herd): A popular key expires; hundreds of simultaneous requests all miss the cache and hammer the DB. Fix: use a mutex (only one thread refills the cache), or probabilistic early expiration (start refreshing before the TTL expires, randomly).
Cache Penetration: Requests for keys that don’t exist in the cache or the DB (often malicious). Every miss hits the DB. Fix: cache negative results (cache.set("user:9999", null, ttl=60)), or use a Bloom filter to reject clearly nonexistent keys before they reach the cache.
Cache Avalanche: Many cache keys expire simultaneously (e.g., you just deployed and set all TTLs to 3600s at the same second). The DB gets flooded. Fix: add jitter to TTLs (ttl = 3600 + random.randint(-300, 300)).
Hot Key Problem: One key is so popular that a single cache shard is overwhelmed. Fix: replicate the hot key across multiple shards with a suffix (user:1:0, user:1:1, …), read from a random replica.
What to Cache and What Not To
Cache: read-heavy data that changes infrequently — user profiles, product details, rendered HTML, API responses, session tokens, computed aggregates.
Don’t cache: data that must be strongly consistent in real time (bank balances, inventory for flash sales), data too large to fit in memory, data accessed so rarely that a cache miss is acceptable.
Summary
Cache-aside is the default for most applications: lazy, resilient, and easy to reason about. Write-through keeps the cache hot at the cost of slower writes. Write-behind maximizes write throughput at the cost of durability. Read-through abstracts caching from the application. LRU eviction and Redis are the industry defaults. The three failure modes to know — stampede, penetration, and avalanche — each have standard fixes. State all of this confidently in an interview and you’ll stand out from candidates who just say “add Redis.”
Related System Design Topics
Caching works alongside other distributed systems components:
- Consistent Hashing — distributed caches use consistent hashing to assign keys to cache nodes, making it easy to add capacity without invalidating everything.
- Database Sharding — caching should always be evaluated before sharding; many systems that look like they need sharding just need a better caching strategy.
- Load Balancing — cache servers are typically deployed behind a load balancer; the LB must route the same key to the same cache server (consistent hashing or IP hash) to maximize cache hit rate.
- Message Queues — cache invalidation at scale is often handled asynchronously via a message queue: a write event triggers a queue message that tells all cache nodes to evict the affected key.
Also see: API Design (REST vs GraphQL vs gRPC) and SQL vs NoSQL — the remaining two system design foundations.
See also: Design Search Autocomplete — Redis caching of top-K suggestions per prefix, and Design a Ride-sharing App — caching driver locations and ETA in Redis.
See also: Design a Proximity Service — geohash-keyed result caching for nearby search, and Design a Hotel / Airbnb Reservation System — availability calendar caching with write-invalidation on booking.
See also: Design a CDN — CDN edge nodes are the outermost cache tier; understanding CDN caching behavior is essential for setting correct Cache-Control headers at the origin.
See also: Design a Monitoring & Alerting System — Prometheus recording rules as precomputed metric aggregates; same pattern as materialized view caching.