System Design Interview: Distributed Cache (Redis vs Memcached)

⏱ 9 min read

Caching is the first answer to scaling — but designing a correct, consistent caching layer requires careful thought about invalidation, eviction, failure modes, and hot key problems. This deep dive goes beyond “add Redis” to cover the actual design decisions.

Cache Strategies

"""
Cache-Aside (Lazy Loading) — most common:
  Read: check cache → miss → read DB → write cache → return
  Write: write DB → invalidate (or update) cache
  Pros: only caches what is actually read; resilient (cache fails gracefully)
  Cons: cold start misses; stale data risk; cache-DB inconsistency window

Write-Through:
  Write: write cache → write DB (synchronously)
  Pros: cache always fresh; read always hits
  Cons: write latency (two writes); cache may store data never read

Write-Behind (Write-Back):
  Write: write cache immediately → DB write is async (batched/delayed)
  Pros: very low write latency; batching reduces DB load
  Cons: data loss risk if cache crashes before DB write; complex implementation

Read-Through:
  Cache is in front of DB; cache handles DB reads on miss (not application)
  Application only talks to cache; cache fetches from DB on miss
  Good when using a cache-as-a-service (e.g., Amazon ElastiCache with DAX)

Refresh-Ahead:
  Proactively refresh entries before TTL expires (predict what will be needed)
  Pros: no cold misses for popular items
  Cons: may refresh items that won t be read again; complexity
"""

import redis
import json
import time
from typing import Optional, Callable, TypeVar

T = TypeVar("T")

class CacheAside:
    def __init__(self, redis_client: redis.Redis, default_ttl: int = 3600):
        self.cache = redis_client
        self.default_ttl = default_ttl

    def get(self, key: str, fetch_fn: Callable[[], T], ttl: int = None) -> T:
        """
        Cache-aside with dogpile (thundering herd) prevention.
        Uses a soft TTL + lock to prevent cache stampede.
        """
        ttl = ttl or self.default_ttl
        raw = self.cache.get(key)
        if raw is not None:
            return json.loads(raw)

        # Cache miss: try to acquire a short lock before fetching
        lock_key = f"lock:{key}"
        acquired = self.cache.set(lock_key, "1", nx=True, ex=5)  # 5s lock

        if acquired:
            # We hold the lock: fetch and populate cache
            try:
                data = fetch_fn()
                self.cache.setex(key, ttl, json.dumps(data))
                return data
            finally:
                self.cache.delete(lock_key)
        else:
            # Another thread is fetching: wait briefly and retry
            time.sleep(0.05)
            raw = self.cache.get(key)
            if raw is not None:
                return json.loads(raw)
            return fetch_fn()  # Fallback: fetch without cache

    def invalidate(self, key: str):
        self.cache.delete(key)

    def invalidate_pattern(self, pattern: str):
        """Invalidate all keys matching a pattern. Use cautiously."""
        for key in self.cache.scan_iter(pattern, count=100):
            self.cache.delete(key)

Redis vs Memcached

Feature	Redis	Memcached
Data types	String, List, Set, Sorted Set, Hash, Stream, Bitmap, HLL	String only
Persistence	RDB snapshots + AOF log	None (memory only)
Replication	Primary-replica + Sentinel + Cluster	No built-in (client-side sharding)
Pub/Sub	Yes	No
Lua scripting	Yes (atomic multi-op)	No
Multi-threading	I/O multi-threaded (Redis 6+); commands single-threaded	Fully multi-threaded
Memory efficiency	Slightly less (overhead per key)	Better for simple string KV
Use when	Rich data structures, persistence, Lua atomicity	Pure caching, maximum throughput, simple values

Eviction Policies and Cache Sizing

"""
Redis eviction policies (set via maxmemory-policy):

  noeviction:       Return error when full. Use when losing data is unacceptable.
  allkeys-lru:      Evict least recently used key from all keys. Good default.
  volatile-lru:     Evict LRU key from keys with TTL set. For mixed TTL/non-TTL.
  allkeys-lfu:      Evict least frequently used. Better for Zipf-distributed access.
  volatile-lfu:     LFU from keys with TTL.
  allkeys-random:   Random eviction. Almost never right.
  volatile-ttl:     Evict key with shortest remaining TTL. Predictable expiry.

Choosing eviction policy:
  Hot-key workload (few keys accessed often): LFU (keeps hot keys in cache)
  General cache (mixed access): LRU
  Bounded cache (fixed TTL keys): volatile-lru or volatile-ttl

Cache sizing formula:
  Memory needed = expected_hit_items * avg_item_size / hit_ratio_target

  Example: 10M user profiles, avg 2KB each, target 80% hit rate
  Active dataset = 10M * 0.80 = 8M profiles
  Memory = 8M * 2KB = 16GB of cache

  Rule of thumb: cache the working set (20% of data = 80% of reads, Pareto).
"""

# LRU cache implementation (for interview) — O(1) get and put
from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity: int):
        self.cap = capacity
        self.cache = OrderedDict()

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)  # Mark as recently used
        return self.cache[key]

    def put(self, key: int, value: int):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.cap:
            self.cache.popitem(last=False)  # Remove least recently used

# LFU cache — O(1) using frequency buckets
class LFUCache:
    def __init__(self, capacity: int):
        from collections import defaultdict
        self.cap     = capacity
        self.min_freq = 0
        self.key_val  = {}                      # key -> value
        self.key_freq = {}                      # key -> frequency
        self.freq_keys = defaultdict(OrderedDict)  # freq -> {key: None} (insertion order)

    def get(self, key: int) -> int:
        if key not in self.key_val:
            return -1
        self._increment_freq(key)
        return self.key_val[key]

    def put(self, key: int, value: int):
        if self.cap = self.cap:
            # Evict LFU + LRU key
            evict_key, _ = self.freq_keys[self.min_freq].popitem(last=False)
            del self.key_val[evict_key]
            del self.key_freq[evict_key]
        self.key_val[key]  = value
        self.key_freq[key] = 1
        self.freq_keys[1][key] = None
        self.min_freq = 1

    def _increment_freq(self, key: int):
        freq = self.key_freq[key]
        del self.freq_keys[freq][key]
        if not self.freq_keys[freq] and freq == self.min_freq:
            self.min_freq += 1
        self.key_freq[key] = freq + 1
        self.freq_keys[freq + 1][key] = None

Hot Key Problem and Solutions

"""
Hot key: a small number of cache keys receive disproportionate traffic.
Example: a viral tweet ID or celebrity user profile → 100k reads/sec
on a single Redis node that handles 50k ops/sec total.

Solutions:

1. Local in-process cache:
   App server maintains a small (1000-key) in-memory cache.
   Hot keys served from local cache → zero Redis calls.
   Stale for up to a few seconds; fine for most use cases.

2. Read-through replicas:
   Redis cluster with read replicas. Route hot key reads to replicas.
   Load-balance reads: consistent hashing with virtual nodes.

3. Key sharding / fan-out:
   Split hot key into N variants: user:123:shard:0 .. user:123:shard:4
   Writes update all N shards. Reads pick random shard.
   Reduces per-shard load by N.

4. Cache-local deduplication:
   At L7 proxy / edge: coalesce concurrent identical requests.
   Only one backend request per hot key per time window.
"""

import random
import threading

class LocalL1Cache:
    """Process-local L1 cache to absorb hot-key traffic before hitting Redis."""
    def __init__(self, max_size: int = 1000, ttl_s: float = 1.0):
        self._cache = {}      # key -> (value, expires_at)
        self._lock  = threading.Lock()
        self._max   = max_size
        self._ttl   = ttl_s

    def get(self, key: str):
        with self._lock:
            entry = self._cache.get(key)
            if entry and entry[1] > time.monotonic():
                return entry[0]
            return None

    def set(self, key: str, value):
        with self._lock:
            if len(self._cache) >= self._max:
                # Simple eviction: remove random 10%
                remove = random.sample(list(self._cache), self._max // 10)
                for k in remove:
                    del self._cache[k]
            self._cache[key] = (value, time.monotonic() + self._ttl)

class TwoLevelCache:
    """L1: local in-process. L2: Redis. Reduces hot-key pressure."""
    def __init__(self, redis_client, l1_ttl: float = 1.0):
        self.l1 = LocalL1Cache(ttl_s=l1_ttl)
        self.l2 = redis_client

    def get(self, key: str, fetch_fn, ttl: int = 3600):
        # L1 hit
        val = self.l1.get(key)
        if val is not None:
            return val
        # L2 hit
        raw = self.l2.get(key)
        if raw:
            val = json.loads(raw)
            self.l1.set(key, val)
            return val
        # Both miss: fetch from origin
        val = fetch_fn()
        self.l2.setex(key, ttl, json.dumps(val))
        self.l1.set(key, val)
        return val

Coinbase Interview Guide

Shopify Interview Guide

Cloudflare Interview Guide

Twitter/X Interview Guide

Netflix Interview Guide

Companies That Ask This Question

LinkedIn Engineering Interview Guide

Meta Engineering Interview Guide

Airbnb Engineering Interview Guide

Uber Engineering Interview Guide

Databricks Interview Guide

Frequently Asked Questions

What is the difference between cache-aside and write-through caching?

Cache-aside (lazy loading): the application checks the cache first; on miss, reads from DB and populates the cache. Simple, resilient (cache failure degrades gracefully), but cold-start misses are expensive and data can become stale. Write-through: every write goes to the cache AND the database synchronously. Cache always has fresh data, no cold start for recently written keys, but write latency doubles and you cache data that may never be read. Cache-aside is more common for read-heavy workloads; write-through for read-after-write consistency requirements.

What is the cache stampede (thundering herd) problem and how do you prevent it?

Cache stampede occurs when a popular cached key expires and many concurrent requests simultaneously miss the cache, all hitting the database at once. Prevention strategies: (1) mutex/lock — only one request refreshes the cache, others wait; (2) probabilistic early expiration — probabilistically refresh before TTL expires (XFetch algorithm); (3) background refresh — a separate thread refreshes keys before they expire; (4) stale-while-revalidate — serve the stale value immediately while refreshing asynchronously. The mutex approach is simplest but creates a bottleneck; background refresh is best for high-traffic keys.

When should you use Redis vs Memcached?

Choose Redis when: you need data structures beyond strings (lists, sets, sorted sets, hashes, streams), persistence (RDB snapshots or AOF), pub/sub messaging, Lua scripting, transactions (MULTI/EXEC), or TTL per key with complex eviction policies. Choose Memcached when: your use case is pure key-value caching, you need simpler horizontal scaling with consistent hashing, or you want lower memory overhead per key. Redis is the default choice for most modern systems — its additional features rarely hurt and often help. Memcached has an edge only at very large scale with simple get/set workloads.

🏢 Asked at: Stripe Interview Guide