Caching is the single most impactful performance optimization in distributed systems. A well-designed caching layer can reduce database load by 10-100x and cut response latency from 50ms to sub-millisecond. This guide provides a deep technical dive into distributed caching — architecture, patterns, consistency challenges, and production pitfalls — essential knowledge for system design interviews and real-world architecture.
Redis vs Memcached: When to Use Each
Redis: an in-memory data structure server supporting strings, hashes, lists, sets, sorted sets, streams, and more. Supports persistence (RDB snapshots, AOF log), replication (primary-replica), clustering (Redis Cluster with automatic sharding), pub/sub messaging, Lua scripting, and transactions. Single-threaded event loop (Redis 7+ has I/O threading for network operations). Typical latency: 0.1-0.5ms. Memcached: a high-performance, distributed memory caching system. Supports only key-value strings with a maximum value size of 1MB. No persistence, no replication (by design), no data structures beyond strings. Multi-threaded architecture scales better on multi-core machines for simple get/set workloads. When to use Redis: you need data structures (sorted sets for leaderboards, lists for queues, hashes for objects), persistence, pub/sub, or atomic operations beyond simple get/set. When to use Memcached: you need a simple, high-throughput cache for HTML fragments, serialized objects, or database query results, and your workload is purely get/set with no need for persistence. In practice, Redis dominates — its versatility and feature set make it the default choice for most applications.
Caching Patterns: Cache-Aside, Write-Through, Write-Behind
Cache-Aside (Lazy Loading): the application checks the cache first. On cache hit, return the cached value. On cache miss, query the database, populate the cache, and return the value. The application manages both the cache and the database. Pros: only requested data is cached (no wasted memory), cache failures do not prevent reads (fall back to database). Cons: cache miss penalty (extra round-trip), potential for stale data (cache may hold an outdated value after the database is updated). Write-Through: every write goes to both the cache and the database. The cache is always consistent with the database. Pros: cache is never stale. Cons: write latency increases (must write to both), data that is written but never read wastes cache memory. Write-Behind (Write-Back): writes go to the cache immediately and are asynchronously written to the database later (batched). Pros: low write latency (only cache write on the critical path), write batching reduces database load. Cons: risk of data loss if the cache fails before the async write completes. Read-Through: similar to cache-aside but the cache itself fetches from the database on a miss (the application only talks to the cache). Simplifies application code but requires cache library support.
Cache Eviction Policies
When the cache is full, an eviction policy decides which entries to remove. Common policies: (1) LRU (Least Recently Used) — evict the entry that has not been accessed for the longest time. Redis default (approximated LRU using a sampling algorithm). Good for most workloads where recently accessed items are likely to be accessed again. (2) LFU (Least Frequently Used) — evict the entry with the fewest accesses. Better than LRU for workloads with popular items that are accessed repeatedly (cache hot items even if they were not accessed in the last few seconds). Redis supports LFU since Redis 4.0. (3) TTL-based — entries expire after a time-to-live. Set TTL based on how quickly the underlying data changes: user profiles (TTL 1 hour), product prices (TTL 5 minutes), session data (TTL 30 minutes). (4) Random eviction — evict a random entry. Surprisingly effective for uniform access patterns and has no overhead for tracking access history. Redis maxmemory-policy options: allkeys-lru, volatile-lru (only evict keys with TTL set), allkeys-lfu, volatile-lfu, allkeys-random, noeviction (return error when memory is full). Choose allkeys-lru as the default for most applications.
Cache Stampede and Thundering Herd
Cache stampede occurs when a popular cache entry expires and many concurrent requests simultaneously miss the cache and hit the database. If 1000 requests per second access a cached product page, and the cache entry expires, all 1000 requests flood the database in the same second — potentially overwhelming it. Solutions: (1) Lock-based approach — when a cache miss occurs, acquire a distributed lock (Redis SETNX). Only one request queries the database and repopulates the cache. Other requests wait for the lock to be released and then read from the cache. (2) Early recomputation — refresh the cache entry before it expires. Track the TTL remaining and trigger a background refresh when TTL drops below a threshold (e.g., 20% remaining). The cache never fully expires. (3) Stale-while-revalidate — serve the stale cached value while asynchronously refreshing it in the background. The first request after TTL triggers the refresh; all requests continue receiving the stale value until the refresh completes. This is the HTTP Cache-Control: stale-while-revalidate pattern applied to application caching. (4) Probabilistic early expiration — each request has a small probability of refreshing the cache before TTL expires: if random() < beta * ln(random()) * -delta_t, then refresh. This spreads refreshes randomly to avoid synchronization.
Cache Consistency Challenges
The fundamental challenge: keeping the cache in sync with the database. Race condition scenario: Thread A reads from DB (gets value V1). Thread B updates DB to V2. Thread B invalidates cache. Thread A writes V1 to cache (stale!). Now the cache holds V1 while the database has V2. Solutions: (1) Cache invalidation (delete) instead of cache update — on database write, delete the cache entry rather than updating it. The next read will miss the cache and fetch the fresh value from the database. This avoids the race condition because the delete and the subsequent read are temporally separated. (2) Versioned cache entries — include a version number in the cache key or value. When updating, compare versions before writing to the cache. Reject stale writes. (3) Write-through with the database as source of truth — always read from the database for the most recent write, cache for subsequent reads. (4) Short TTLs as a safety net — even with invalidation, set a TTL (e.g., 5 minutes) so stale data self-corrects. The TTL bounds the staleness window. In practice, cache-aside with delete-on-write and a TTL safety net is the most common and reliable pattern.
Redis Cluster Architecture
Redis Cluster provides automatic sharding across multiple Redis nodes. The key space is divided into 16,384 hash slots. Each key is assigned to a slot: slot = CRC16(key) % 16384. Each node in the cluster is responsible for a subset of slots. A 3-node cluster: node A owns slots 0-5460, node B owns 5461-10922, node C owns 10923-16383. Each node has one or more replicas for fault tolerance. When a primary node fails, its replica is promoted automatically. Client routing: Redis Cluster clients (Jedis, redis-py, ioredis) maintain a slot-to-node mapping. The client hashes the key, determines the slot, and sends the command directly to the correct node. If the mapping is stale (after a resharding), the node responds with a MOVED redirect, and the client updates its mapping. Multi-key operations (MGET, transactions) require all keys to be on the same node. Use hash tags to force co-location: {user:123}:profile and {user:123}:settings hash to the same slot because Redis only hashes the content within curly braces.
Caching Strategy for System Design Interviews
When to add caching in your design: (1) Hot data accessed frequently — user profiles, product listings, configuration. Cache-aside with LRU eviction and TTL. (2) Expensive computations — aggregated analytics, recommendation scores, search results. Cache the result with a TTL matching the acceptable staleness. (3) Session storage — user sessions in a stateless application. Redis with TTL matching the session timeout. (4) Rate limiting — token bucket or sliding window counters stored in Redis for atomic increment operations. What NOT to cache: data that changes on every request (unique request IDs, real-time sensor data), highly sensitive data (plaintext passwords, unencrypted PII), and data with strict consistency requirements where any staleness is unacceptable. Sizing the cache: use the Pareto principle — 20% of data serves 80% of requests. Cache the hot 20%. Monitor cache hit rate (target: 90%+). If hit rate is below 80%, the cache is either too small or the TTL is too short. If hit rate is above 99%, you may be over-caching (wasting memory on cold data).