Low Level Design: Distributed Cache Design and Internals

⏱ 4 min read

A distributed cache stores frequently accessed data in memory across a cluster of nodes, reducing latency and database load. Redis and Memcached are the dominant solutions, each with different tradeoffs. Designing a distributed cache correctly — choosing eviction policies, handling cache stampedes, managing consistency, and sizing capacity — dramatically impacts system performance at scale. Cache design is one of the most common topics in system design interviews because nearly every large-scale system relies on caching to achieve low latency at high throughput.

Cache Eviction Policies

When the cache reaches capacity, it must evict entries to make room for new ones. Eviction policies: LRU (Least Recently Used): evict the entry not accessed for the longest time. Effective for temporal locality (recently accessed data is likely to be accessed again). Implementation: doubly-linked list + hash map — O(1) access and eviction. Redis: approximated LRU by sampling N random keys and evicting the least recently used among them. LFU (Least Frequently Used): evict the entry with the fewest accesses. Better for frequency-based access patterns (popular items never evicted even if not accessed recently). Redis 4.0+ supports LFU with frequency counter decay. TTL (Time To Live): entries expire after a fixed duration regardless of access. Simplest consistency guarantee. Often combined with LRU — TTL for correctness, LRU for capacity management.

// Cache-aside pattern (most common)
func getUser(ctx context.Context, userID string) (*User, error) {
    cacheKey := "user:" + userID

    // 1. Check cache first
    cached, err := redis.Get(ctx, cacheKey).Result()
    if err == nil {
        var user User
        json.Unmarshal([]byte(cached), &user)
        return &user, nil  // cache hit
    }

    // 2. Cache miss: load from database
    user, err := db.GetUser(ctx, userID)
    if err != nil { return nil, err }

    // 3. Populate cache with TTL
    serialized, _ := json.Marshal(user)
    redis.Set(ctx, cacheKey, serialized, 5*time.Minute)

    return user, nil
}

// Cache invalidation on update
func updateUser(ctx context.Context, user *User) error {
    if err := db.UpdateUser(ctx, user); err != nil { return err }
    // Invalidate cache: next read will reload from DB
    redis.Del(ctx, "user:"+user.ID)
    return nil
}

Cache Stampede Prevention

Cache stampede (thundering herd): a popular cache entry expires; simultaneously, thousands of requests find a cache miss and all query the database at once, overwhelming it. Prevention strategies: Probabilistic early expiry: before a key expires, proactively regenerate it — each request calculates a probability of early refresh based on time remaining and fetch cost; one request refreshes while others use the about-to-expire value. Distributed lock on cache miss: when a cache miss occurs, acquire a distributed lock before querying the database; subsequent requests that miss the cache wait for the lock holder to populate the cache, then read from cache. Background refresh: serve stale cached data immediately while a background job refreshes the cache asynchronously. Jittered TTLs: add random jitter to TTLs (TTL = base ± 10-20%) to prevent synchronized expiry of many related keys.

Redis Cluster Architecture

Redis Cluster shards data across multiple primary nodes using hash slots (16384 total). Each key hashes to a slot (CRC16(key) mod 16384), and each primary owns a range of slots. Clients redirect to the correct node (MOVED redirect) when they send a command to the wrong node. Each primary has one or more replicas for failover — if a primary fails, its replica is promoted within seconds. Redis Cluster gossip: nodes gossip every 100ms about slot assignments, node health, and epoch (configuration version). Cluster topology is propagated via gossip to all nodes. Hash tags: keys in braces {user:123}:profile and {user:123}:settings hash only the content inside {}, ensuring both keys land on the same slot — enabling multi-key commands and Lua scripts across related keys.

Key Interview Discussion Points

Write-through vs write-behind: write-through updates cache and database synchronously on every write (consistent but slower writes); write-behind (write-back) updates cache and asynchronously writes to database (faster writes, risk of data loss if cache fails before flush)
Cache warming: after a cold start (new cache cluster, cache flush), traffic spikes hit the database as the cache is empty; warm the cache by pre-loading popular keys or using a lazy warm-up period with reduced load
Hotkey problem: a single very popular key (celebrity profile, trending item) can overload one Redis node; solutions: replicate the hot key to multiple nodes and read-balance, or use local in-process caches (Caffeine, Guava) to reduce Redis requests for the hottest keys
Memcached vs Redis: Memcached is multi-threaded, simple key-value only, no persistence — optimal for pure caching of large homogeneous objects. Redis is single-threaded, rich data structures (lists, sets, sorted sets, streams), optional persistence — better for complex data patterns and features beyond pure caching
Cache consistency: cache-aside with TTL accepts eventual consistency (stale for up to TTL); for stronger consistency, use write-through or event-driven invalidation (database CDC triggers cache delete on row change)