Cache System Low-Level Design (LRU, Write Policies, Redis Cluster)

Why Caching?

A cache stores copies of frequently-accessed data in fast storage (RAM) to avoid expensive re-computation or round trips to slower storage (DB, external API). Rule of thumb: if a read takes > 10ms and happens more than 100 times/second, cache it. Caching reduces DB load, cuts p99 latency, and enables horizontal scaling without scaling the database.

Cache Eviction Policies

  • LRU (Least Recently Used): evict the item accessed least recently. Best for general-purpose caches where recently accessed items are more likely to be accessed again. Implemented with a doubly-linked list + hashmap (O(1) get/put). Used by Redis default.
  • LFU (Least Frequently Used): evict the item accessed fewest times. Better than LRU when access frequency is a better predictor than recency (e.g., popular items stay in cache even if not accessed in the last minute). Harder to implement efficiently.
  • FIFO: evict the oldest item. Simple but ignores access patterns.
  • TTL (Time-to-Live): evict items after a fixed expiry. Used for data with a natural staleness bound (session tokens, DNS records, rate limit counters).

LRU Cache Implementation (LC 146)

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.cap = capacity
        self.cache = OrderedDict()  # maintains insertion/access order

    def get(self, key):
        if key not in self.cache: return -1
        self.cache.move_to_end(key)  # mark as recently used
        return self.cache[key]

    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.cap:
            self.cache.popitem(last=False)  # evict least recently used (front)

Cache Write Policies

  • Write-through: write to cache AND DB synchronously. Cache is always consistent with DB. Slower writes (wait for both). Good for read-heavy with important consistency (user profile).
  • Write-behind (write-back): write to cache immediately, write to DB asynchronously. Faster writes. Risk: data loss if cache fails before DB write. Good for counters, analytics.
  • Cache-aside (lazy loading): app reads from cache; on miss, reads from DB and populates cache. App writes directly to DB, invalidates/updates cache. Most flexible. Cache may serve stale data between write and invalidation.
  • Read-through: cache sits in front of DB. On miss, cache fetches from DB. App only talks to cache. Simpler app logic; cache must understand the data model.

Cache Invalidation Strategies

Cache invalidation is notoriously hard. Options:

  1. TTL expiry: simplest. Cache is eventually consistent — staleness bounded by TTL. No coordination needed.
  2. Event-driven invalidation: on DB write, publish an invalidation event (Kafka); cache consumers delete the key. Near-real-time consistency. Requires event infrastructure.
  3. Write-through: update cache on every DB write. Always consistent, but every write touches the cache (acceptable for low write volume).

Best practice: combine TTL (safety net for missed events) with event-driven invalidation (near-real-time) for important caches.

Distributed Cache Architecture

App servers → Redis Cluster (consistent hashing across N nodes)
              Each node: primary + 1 replica (Redis Sentinel / Redis Cluster)
              Consistent hashing: add/remove nodes with minimal key redistribution

Consistent hashing places nodes on a virtual ring. Each key hashes to a position on the ring; it is stored on the next clockwise node. Adding a node only moves keys from the successor node. Removing a node moves its keys to the successor. Minimizes cache misses during cluster scaling.

Cache Stampede (Thundering Herd)

When a popular key expires, thousands of requests all miss the cache simultaneously and hammer the DB. Solutions: (1) Probabilistic early expiration: before the TTL expires, proactively refresh the cache with a small probability (increases with proximity to expiry). (2) Mutex lock on miss: only one request fetches from DB; others wait or return slightly stale data. (3) Background refresh: a background job refreshes popular keys before they expire.

Key Design Decisions

  • LRU for general-purpose caches; TTL for time-bounded data
  • Cache-aside for most web applications — flexible, easy to reason about
  • Redis Cluster with consistent hashing for distributed horizontal scaling
  • TTL + event-driven invalidation combined — TTL as backstop, events for freshness
  • Mutex or probabilistic refresh to prevent cache stampede on popular keys


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the difference between LRU and LFU cache eviction?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”LRU (Least Recently Used): evicts the item that was accessed least recently. Assumption: items accessed recently are more likely to be accessed again (temporal locality). Implementation: doubly-linked list + hashmap. O(1) get and put. Best for general-purpose web caches where recent access predicts future access. Weakness: a one-time scan of many unique items (e.g., a batch job reading millions of keys) pollutes the cache by evicting frequently-used items. LFU (Least Frequently Used): evicts the item accessed fewest times overall. Better when long-term frequency predicts future access (popular pages, hot product listings). Weakness: "cache pollution" from old items with high historical frequency but low recent usage — a viral item from last year stays cached. LFU is harder to implement in O(1): requires tracking frequency counters and a min-frequency bucket structure. In practice, LRU (or LRUK — LRU with K = 2 recency levels) is used by most production caches.”}},{“@type”:”Question”,”name”:”What is cache-aside (lazy loading) and when should you use it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Cache-aside: the application manages the cache. Read: check cache first; on miss, read from DB, write to cache, return result. Write: write to DB, then invalidate or update the cache key. Used by default in most web applications with Redis. Advantages: only requested data is cached (no cold data). Cache failure is non-fatal — requests fall through to DB. Works with any DB. Disadvantages: first request always misses (cold start). If cache is down, all requests hit DB. Small window of stale data between DB write and cache invalidation. Use cache-aside when: reads dominate writes, data can tolerate brief staleness, you want the simplest possible caching layer. Compare to read-through: similar, but the cache fetches from DB on miss (not the app). Read-through is better when you want to centralize the cache-filling logic and avoid the app knowing about the DB directly.”}},{“@type”:”Question”,”name”:”How does consistent hashing work in a distributed Redis cache?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Consistent hashing places both cache nodes and keys on a virtual ring (hash space 0 to 2^32). Each key is stored on the first node clockwise from its hash position. When a node is added: only keys between the new node and its predecessor move — O(K/N) keys, where K is total keys and N is nodes. When a node is removed: only its keys move to the successor — minimal redistribution. Compare to modulo hashing (key % N): adding or removing a node remaps almost all keys (nearly 100% cache miss spike). Consistent hashing minimizes cache miss surge during cluster changes to O(1/N) of keys. Virtual nodes (vnodes): each physical node is represented by multiple virtual nodes on the ring (e.g., 150 vnodes per node). This improves load distribution when nodes have different capacities or when the ring has few physical nodes.”}},{“@type”:”Question”,”name”:”What is a cache stampede and how do you prevent it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Cache stampede (thundering herd): a popular cache key expires. Simultaneously, thousands of requests find the cache empty, all query the DB, and all try to write the result back to the cache. The DB is overwhelmed. Prevention strategies: (1) Mutex lock: only one request fetches from DB; others wait or return a slightly stale value. Redis SET NX (set if not exists) implements a distributed lock. Problem: all waiting requests add latency. (2) Probabilistic early expiration (PER): before the TTL expires, proactively re-fetch with probability p that increases as the key approaches expiry: p = exp(-beta * remaining_ttl / compute_time). Described in the "Optimal Probabilistic Cache Stampede Prevention" paper. (3) Background refresh: a cron job pre-fetches popular keys before they expire. Requires knowing which keys are popular. (4) Staggered TTLs: add ±10% random jitter to all TTLs, preventing mass simultaneous expiration. Combination of staggered TTLs + probabilistic early expiration is most robust.”}},{“@type”:”Question”,”name”:”What are the trade-offs between write-through and write-behind caching?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Write-through: every write goes to both cache and DB synchronously. The cache is always consistent with the DB. Write latency = DB write latency (slow). Read latency = cache latency (fast). No data loss on cache failure (DB has everything). Good for: data where consistency is critical and writes are infrequent (user settings, account balance). Write-behind (write-back): writes go to cache first; DB is updated asynchronously. Write latency = cache write latency (very fast). Read latency = cache latency. Risk: if the cache crashes before the async DB write completes, data is lost. Good for: high-write workloads where some loss is acceptable (analytics counters, game scores, click tracking). Hybrid: write-through for critical data (payments, auth), write-behind for analytics. Note: write-through doubles write traffic (every DB write also hits the cache). For write-heavy workloads, this overhead may outweigh the consistency benefit.”}}]}

Uber system design covers distributed caching and Redis. See common questions for Uber interview: caching and Redis system design.

Amazon system design covers caching layers and distributed cache. Review patterns for Amazon interview: cache system design and DynamoDB.

Databricks system design covers caching and data access patterns. See design patterns for Databricks interview: caching and data layer design.

Scroll to Top