System Design: Distributed Cache — Eviction Policies, Consistency, and Scaling (2025)

Why Distributed Caching?

A single application server’s in-memory cache is lost on restart and not shared across instances. A distributed cache (Redis, Memcached) provides a shared, persistent (optionally), low-latency key-value store accessible by all service instances. Key metrics: cache hit rate (aim for > 95% for hot data), cache latency (< 1ms p99 in the same region), memory efficiency (how well eviction policy utilizes available memory). The decision to cache: cache is beneficial when reads far outnumber writes, the same data is read repeatedly, the computation or database query is expensive, and the data does not need to be always perfectly fresh (tolerate some staleness).

Eviction Policies

LRU (Least Recently Used): evict the item that was accessed least recently. Works well when recent access predicts future access (temporal locality). Implemented with a doubly linked list + hashmap (O(1) get and put). Redis approximation: Redis does not track exact LRU order (too expensive). Instead, it samples N random keys and evicts the least recently used among the sample. LFU (Least Frequently Used): evict the item accessed least often. Better for skewed access patterns (popular items stay in cache even if not accessed recently). Harder to implement efficiently — Redis 4.0+ includes LFU. TTL-based eviction: every key has a time-to-live; evict on expiry. Combines with LRU/LFU for the remaining memory. Write-through vs. write-behind: write-through — update cache and DB synchronously on write (strong consistency, higher write latency). Write-behind (write-back) — update cache immediately, DB asynchronously (lower write latency, risk of data loss if cache crashes before DB write).

Cache Aside (Lazy Loading) Pattern

class CacheAsideService:
    def get_user(self, user_id: int) -> User:
        # 1. Check cache
        cached = self.redis.get(f"user:{user_id}")
        if cached:
            return User.from_json(cached)

        # 2. Cache miss: load from DB
        user = self.db.query("SELECT * FROM users WHERE id = %s", user_id)
        if not user:
            return None

        # 3. Populate cache with TTL
        self.redis.setex(
            f"user:{user_id}",
            3600,  # TTL: 1 hour
            user.to_json()
        )
        return user

    def update_user(self, user_id: int, data: dict) -> User:
        # Update DB first (source of truth)
        user = self.db.execute(
            "UPDATE users SET ... WHERE id = %s", user_id, data
        )
        # Invalidate cache (delete, not update -- avoid race conditions)
        self.redis.delete(f"user:{user_id}")
        return user

Why delete on update rather than update the cache? Race condition: if two threads simultaneously update the user, one may write a stale value into the cache after the other has already written the fresh value. Delete + lazy reload (on next read) avoids this race. This is the “cache-aside with invalidation” pattern.

Consistent Hashing for Cache Sharding

With N cache nodes, a naive modulo hash (key % N) remaps nearly all keys when a node is added or removed. Consistent hashing maps keys to a virtual ring (0 to 2^32). Each cache node owns a segment of the ring. On node addition/removal: only keys in the affected segment are remapped (approximately 1/N of all keys). Virtual nodes (vnodes): each physical node is represented by K virtual nodes spread around the ring. This improves load balancing (without vnodes, node placement can be uneven). K=150 virtual nodes per physical node is a common default (used by Cassandra). Implementation: sorted array of (hash, node) pairs; binary search to find the responsible node for a key.

Cache Stampede and Thundering Herd Prevention

Cache stampede: a popular key expires. Simultaneously, hundreds of requests find a cache miss and all query the database, overwhelming it. Solutions: Probabilistic early expiration: before a key expires, some requests probabilistically recompute it. Formula: if current_time + beta * staleness_factor > expiry_time: recompute now (with probability). This spreads the recomputation over time rather than all at once. Mutex/lock on miss: the first request to detect a cache miss acquires a distributed lock (Redis SETNX with TTL). Only the lock holder queries the DB and repopulates the cache. Other requests wait or return stale data if available. Background refresh: a background job refreshes cache entries before they expire (based on last access time). Popular items never expire — they are continuously refreshed while they have traffic.


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the cache-aside pattern and why delete on update instead of update the cache?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Cache-aside (lazy loading): on read, check cache first; on miss, load from DB and populate cache. On write, update DB first, then invalidate (delete) the cache key. Why delete instead of update: if two concurrent writes both update the cache, the final cache value depends on which write lands last, which may not match the DB (due to network ordering). Example: thread A writes value 1 to DB and cache; thread B writes value 2 to DB first, then A's cache write overwrites with stale value 1. Deleting the cache key on write forces the next read to reload from DB, which always has the correct current value. The delete-on-write + lazy-load-on-read pattern avoids this race condition at the cost of one extra DB read after each write.”}},{“@type”:”Question”,”name”:”What is consistent hashing and why does it minimize key remapping when cache nodes change?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Naive sharding (key % N nodes): when a node is added or removed, N changes and nearly all keys remap to different nodes, causing a cache cold start (mass invalidation). Consistent hashing maps both keys and nodes to positions on a virtual ring (0 to 2^32). A key is served by the first node clockwise from its position. When a node is removed: only the keys between the removed node and its predecessor remap (approximately 1/N of all keys). When a node is added: only the keys between the new node and its predecessor remap. This bounds disruption to 1/N of keys regardless of total keys. Virtual nodes (each physical node mapped to K positions on the ring) improve load distribution — without them, non-uniform node placement creates hotspots.”}},{“@type”:”Question”,”name”:”How do you prevent a cache stampede when a popular key expires?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Cache stampede: a popular key expires; hundreds of concurrent requests all miss and query the database simultaneously, overwhelming it. Prevention strategies: (1) Mutex: first request to detect a miss acquires a Redis lock (SETNX key_lock 1 PX 5000). Only the lock holder queries the DB and repopulates the cache. Other requests either spin-wait briefly or return stale data if available. (2) Probabilistic early expiration: before TTL expires, with probability proportional to staleness factor, preemptively recompute. This spreads recomputation across multiple requests and time. (3) Background refresh: a background job proactively refreshes keys before expiry based on access frequency. Hot keys are never actually expired – they are continuously renewed while they receive traffic. Strategy 3 is the most robust for very high-traffic keys.”}},{“@type”:”Question”,”name”:”What is the difference between write-through and write-behind caching?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Write-through: on every write, update the cache AND the database synchronously before returning to the caller. The cache always has the latest data; no risk of cache-DB inconsistency. Downside: every write has the latency of both a cache write and a DB write; write-heavy workloads get no benefit. Write-behind (write-back): on write, update the cache immediately and return to the caller; the DB write happens asynchronously in the background (buffered). Much lower write latency. Downside: if the cache crashes before the async DB write completes, data is lost. Use write-through when data consistency is critical (financial data). Use write-behind when write throughput matters more than durability (activity counters, session data, analytics events where some loss is acceptable).”}},{“@type”:”Question”,”name”:”How does Redis handle memory pressure when the cache is full?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Redis supports several maxmemory-policy options: noeviction (return error on write when full), allkeys-lru (evict LRU key from all keys), volatile-lru (evict LRU key only among keys with TTL set), allkeys-random (evict random key), volatile-ttl (evict key with shortest TTL first), allkeys-lfu (evict least frequently used, Redis 4+). Redis does not use a true LRU (scanning all keys to find LRU is too slow). Instead: allkeys-lru samples a configurable number of random keys (default 5, tunable with maxmemory-samples) and evicts the least recently used among the sample. Higher sample counts are more accurate but slower. For most use cases: allkeys-lru with maxmemory-samples=10 provides a good balance. For skewed access patterns (20% of keys receive 80% of traffic), allkeys-lfu avoids evicting popular keys that happen to be LRU.”}}]}

See also: Netflix Interview Prep

See also: Cloudflare Interview Prep

See also: Uber Interview Prep

Scroll to Top