Low Level Design: Caching Strategies Deep Dive

⏱ 4 min read

Caching is one of the highest-leverage performance optimizations available — serving data from memory (nanoseconds to microseconds) instead of disk or network (milliseconds). But every caching decision involves tradeoffs: consistency vs. performance, simplicity vs. flexibility, memory cost vs. latency savings. Cache-aside, write-through, write-behind, read-through, and refresh-ahead are the five foundational patterns. Beyond patterns, cache invalidation, eviction policies, and consistency guarantees determine whether a cache helps or creates subtle bugs. Facebook, Twitter, and Netflix have each published detailed analyses of how they use caching at scale.

The Five Caching Patterns

Cache-aside (lazy loading): application checks cache, misses load from DB and populate cache. Application controls cache population. Resilient to cache failure (fall back to DB). Cache may be inconsistent for TTL duration after writes. Read-through: cache sits in front of DB; all reads go through cache; on miss, cache loads from DB transparently. Simpler application code. Cache always has fresh data for recently read keys. Write-through: every write goes to cache AND DB synchronously. Cache always consistent with DB. Higher write latency (two writes per operation). Write-behind (write-back): writes go to cache only; cache asynchronously flushes to DB. Lowest write latency. Risk: data loss if cache fails before flush. Refresh-ahead: cache proactively refreshes entries before they expire (based on predicted access patterns). No cache-miss latency for popular items. Wastes resources refreshing rarely-accessed entries.

// Multi-level caching (L1: local, L2: Redis, L3: DB)
// Used by Facebook (TAO), Twitter, and large-scale systems

type MultiLevelCache struct {
    l1    *sync.Map          // in-process cache (nanoseconds, limited size)
    l2    *redis.Client      // distributed Redis (microseconds, shared)
    db    *sql.DB            // database (milliseconds)
    l1TTL time.Duration      // short TTL to limit staleness
    l2TTL time.Duration
}

func (c *MultiLevelCache) Get(key string) (interface{}, error) {
    // L1: in-process cache (fastest, no network)
    if val, ok := c.l1.Load(key); ok {
        return val, nil
    }

    // L2: distributed cache (fast, shared across instances)
    val, err := c.l2.Get(ctx, key).Result()
    if err == nil {
        c.l1.Store(key, val)  // backfill L1
        return val, nil
    }

    // L3: database (slowest, authoritative)
    val, err = c.db.QueryRow("SELECT data FROM items WHERE id = $1", key).Scan(&val)
    if err != nil { return nil, err }

    // Populate both cache levels
    c.l2.Set(ctx, key, val, c.l2TTL)
    c.l1.Store(key, val)
    return val, nil
}

Cache Invalidation Strategies

Cache invalidation is the hardest part of caching. Strategies: TTL expiry: entries expire after a fixed duration. Simple but trades off between staleness (long TTL) and cache miss rate (short TTL). Good for slowly changing data (user profile, product catalog). Event-driven invalidation: when data changes in the DB, publish an invalidation event; cache consumers delete or refresh the affected key. Lower staleness. Requires reliable event delivery (use CDC/Kafka — not in-process pub/sub that can fail). Write-through invalidation: delete cache key immediately when writing to DB (ensure next read gets fresh data). Simple, but causes cache miss storm after writes to popular keys. Tag-based invalidation: tag cache entries with related entity IDs; when an entity changes, invalidate all tagged entries. Useful when one write affects multiple derived cache entries (an author update invalidates all their posts in cache).

Local vs. Distributed Cache Tradeoffs

Local (in-process) cache: fastest possible access (L1/L2 CPU cache, nanoseconds), no serialization overhead, no network. Disadvantages: not shared across instances (each server has its own cache — a write on server A does not invalidate server B cache), limited by single process memory, data lost on process restart. Suitable for: immutable reference data (country codes, config), computed results that are expensive to recalculate, request-scoped caching (within a single request). Distributed cache (Redis, Memcached): shared across all instances (consistency across the fleet), survives process restarts (Redis with persistence), supports larger datasets. Disadvantages: network round-trip (~0.5ms in same datacenter), serialization overhead, operational complexity. Optimal strategy: layer both (Caffeine/Guava for L1 in-process, Redis for L2 distributed), with short L1 TTLs to bound staleness.

Key Interview Discussion Points

Cache penetration: malicious requests for keys that never exist cause every request to reach the database (cache never populated); mitigate by caching null/not-found responses with a short TTL, or using a Bloom filter to reject requests for non-existent keys before they hit the DB
Cache breakdown: a popular key expires and a flood of concurrent requests all hit the database simultaneously; mitigate with mutex-based single-flight (only one request fetches from DB, others wait), probabilistic early expiry, or background refresh
Cache avalanche: many keys expire simultaneously (e.g., all populated at startup with the same TTL); mitigate with jittered TTLs (TTL = base + rand(0, jitter)), warming the cache before expiry, or staggering key creation
Read-your-writes consistency: after a write, the next read should reflect the write even if the write is not yet in cache; route reads to the primary database for a brief window after a write, or use write-through to immediately update the cache
Cache capacity planning: each cached item uses key size + value size + Redis overhead (~60-100 bytes per key); for a 10M user profile cache with 1KB per profile: 10GB + 1GB overhead = ~11GB — plan capacity with eviction in mind (LRU evicts oldest when capacity is full)