Cache System Low-Level Design (LRU, Write Policies, Redis Cluster)

Why Caching?

A cache stores copies of frequently-accessed data in fast storage (RAM) to avoid expensive re-computation or round trips to slower storage (DB, external API). Rule of thumb: if a read takes > 10ms and happens more than 100 times/second, cache it. Caching reduces DB load, cuts p99 latency, and enables horizontal scaling without scaling the database.

Cache Eviction Policies

  • LRU (Least Recently Used): evict the item accessed least recently. Best for general-purpose caches where recently accessed items are more likely to be accessed again. Implemented with a doubly-linked list + hashmap (O(1) get/put). Used by Redis default.
  • LFU (Least Frequently Used): evict the item accessed fewest times. Better than LRU when access frequency is a better predictor than recency (e.g., popular items stay in cache even if not accessed in the last minute). Harder to implement efficiently.
  • FIFO: evict the oldest item. Simple but ignores access patterns.
  • TTL (Time-to-Live): evict items after a fixed expiry. Used for data with a natural staleness bound (session tokens, DNS records, rate limit counters).

LRU Cache Implementation (LC 146)

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.cap = capacity
        self.cache = OrderedDict()  # maintains insertion/access order

    def get(self, key):
        if key not in self.cache: return -1
        self.cache.move_to_end(key)  # mark as recently used
        return self.cache[key]

    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.cap:
            self.cache.popitem(last=False)  # evict least recently used (front)

Cache Write Policies

  • Write-through: write to cache AND DB synchronously. Cache is always consistent with DB. Slower writes (wait for both). Good for read-heavy with important consistency (user profile).
  • Write-behind (write-back): write to cache immediately, write to DB asynchronously. Faster writes. Risk: data loss if cache fails before DB write. Good for counters, analytics.
  • Cache-aside (lazy loading): app reads from cache; on miss, reads from DB and populates cache. App writes directly to DB, invalidates/updates cache. Most flexible. Cache may serve stale data between write and invalidation.
  • Read-through: cache sits in front of DB. On miss, cache fetches from DB. App only talks to cache. Simpler app logic; cache must understand the data model.

Cache Invalidation Strategies

Cache invalidation is notoriously hard. Options:

  1. TTL expiry: simplest. Cache is eventually consistent — staleness bounded by TTL. No coordination needed.
  2. Event-driven invalidation: on DB write, publish an invalidation event (Kafka); cache consumers delete the key. Near-real-time consistency. Requires event infrastructure.
  3. Write-through: update cache on every DB write. Always consistent, but every write touches the cache (acceptable for low write volume).

Best practice: combine TTL (safety net for missed events) with event-driven invalidation (near-real-time) for important caches.

Distributed Cache Architecture

App servers → Redis Cluster (consistent hashing across N nodes)
              Each node: primary + 1 replica (Redis Sentinel / Redis Cluster)
              Consistent hashing: add/remove nodes with minimal key redistribution

Consistent hashing places nodes on a virtual ring. Each key hashes to a position on the ring; it is stored on the next clockwise node. Adding a node only moves keys from the successor node. Removing a node moves its keys to the successor. Minimizes cache misses during cluster scaling.

Cache Stampede (Thundering Herd)

When a popular key expires, thousands of requests all miss the cache simultaneously and hammer the DB. Solutions: (1) Probabilistic early expiration: before the TTL expires, proactively refresh the cache with a small probability (increases with proximity to expiry). (2) Mutex lock on miss: only one request fetches from DB; others wait or return slightly stale data. (3) Background refresh: a background job refreshes popular keys before they expire.

Key Design Decisions

  • LRU for general-purpose caches; TTL for time-bounded data
  • Cache-aside for most web applications — flexible, easy to reason about
  • Redis Cluster with consistent hashing for distributed horizontal scaling
  • TTL + event-driven invalidation combined — TTL as backstop, events for freshness
  • Mutex or probabilistic refresh to prevent cache stampede on popular keys

Uber system design covers distributed caching and Redis. See common questions for Uber interview: caching and Redis system design.

Amazon system design covers caching layers and distributed cache. Review patterns for Amazon interview: cache system design and DynamoDB.

Databricks system design covers caching and data access patterns. See design patterns for Databricks interview: caching and data layer design.

Scroll to Top