System Design: Distributed Cache (Memcached) — Slab Allocation, Consistent Hashing, Hot Keys, Thundering Herd

Memcached is a high-performance, distributed memory caching system used by Facebook, Twitter, and Wikipedia to reduce database load and accelerate response times. While Redis has overtaken Memcached in popularity, understanding Memcached architecture provides deep insight into distributed cache design: memory management, consistent hashing, hot key mitigation, and cache cluster operations. This guide covers these internals — essential for system design interviews involving caching at scale.

Memcached Architecture

Memcached is a distributed hash table spread across multiple servers. There is no inter-server communication — each server is independent. The client library handles distribution: it hashes the key to determine which server stores it, sends the request directly to that server, and handles failover. This shared-nothing architecture makes Memcached horizontally scalable: adding a server increases total cache capacity proportionally. Each Memcached server is a simple key-value store holding data in memory. Maximum value size: 1 MB. No data structures beyond strings (unlike Redis). No persistence — all data is lost on restart. No replication between servers. Operations: GET (retrieve by key), SET (store), DELETE (remove), INCR/DECR (atomic counter operations), and CAS (compare-and-swap for optimistic concurrency). The simplicity is the strength: Memcached achieves 200,000-500,000 operations per second per server with sub-millisecond latency because it does only one thing — in-memory key-value storage — with minimal overhead.

Slab Allocation: Memory Management

Memcached uses slab allocation to avoid memory fragmentation. Instead of malloc/free for each item (which causes fragmentation over time), memory is pre-allocated in slabs. A slab is a 1 MB chunk of memory divided into fixed-size slots (chunks). Slab classes: each class has a different chunk size. Class 1: 96-byte chunks. Class 2: 120-byte chunks. Class 3: 152-byte chunks (each class is ~1.25x the previous). Up to ~40 classes covering sizes up to 1 MB. When an item is stored: Memcached picks the smallest slab class that fits the item (key + value + metadata). The item is stored in a free chunk of that class. If no free chunk exists, LRU eviction removes the least recently used item within that slab class. Slab calcification problem: if your workload shifts (initially many small items, later many large items), the memory allocated to small-slab classes cannot be reused by large-slab classes. Memory is “calcified” in the wrong distribution. Memcached slab automover (newer versions) detects imbalanced slab usage and reassigns slabs between classes. Monitoring: the stats slabs command shows per-class usage, eviction counts, and free chunks — essential for identifying slab imbalance.

Client-Side Consistent Hashing

The Memcached client distributes keys across servers using consistent hashing. Each server is placed at multiple points on a hash ring (virtual nodes). A key is hashed to a position on the ring and assigned to the nearest server clockwise. Adding a server: only ~1/N of keys remap (from the adjacent server). Without consistent hashing (modular hashing: server = hash(key) % N), adding a server remaps ~80% of keys — causing a massive cache miss storm. Failover: if a server goes down, the client library detects it (connection failure) and remaps its keys to the next server on the ring. The failed server keys experience cache misses (fetched from the database and cached on the new server). The rest of the cluster is unaffected. Rehashing storm mitigation: when a server fails, its keys suddenly hit the database. If the failed server held popular keys, this can overwhelm the database. Mitigation: use a gutter pool — a small set of spare Memcached servers that temporarily absorb traffic from the failed server. Facebook uses this approach: instead of remapping to the next ring server (which may also become overloaded), remap to a dedicated gutter server with short TTLs.

Hot Keys and Thundering Herd

Hot key problem: a single key receives disproportionate traffic (a viral tweet, a popular product listing). Since Memcached maps each key to one server, that server becomes a bottleneck. Solutions: (1) Client-side replication — store the hot key on multiple servers with a suffix: hot_key_1, hot_key_2, …, hot_key_R. The client randomly picks one replica for each read. Distributes load across R servers. (2) Local caching — cache extremely hot keys in the application process memory (L1 cache) with a very short TTL (1-5 seconds). Requests are served from local memory without any network call. (3) Lease-based protection (Facebook approach) — when a cache miss occurs, Memcached issues a lease (a token) to the requesting client. Only the client with the lease fetches from the database and sets the cache. Other clients requesting the same key during this period receive a “wait” signal and retry after a short delay (e.g., 10ms). This prevents the thundering herd: instead of 1000 concurrent database queries for the same key, only 1 query is made. The lease approach is the most effective thundering herd mitigation at scale.

Facebook Memcached at Scale

Facebook published a seminal paper on scaling Memcached to billions of requests per second. Key insights: (1) Demand-filled cache — caches are populated lazily. A web server reads from Memcached; on miss, it reads from the database and sets the cache. On database writes, the web server invalidates the cache (delete, not set — avoids race conditions). (2) Reducing latency — batch GET requests using multi-get (fetch 100+ keys in one round-trip). Use UDP for GET requests (lower overhead than TCP for small payloads). Use TCP for SET/DELETE (reliability required). (3) Regions and replication — each datacenter region has its own Memcached cluster. Cross-region invalidation: when a database write occurs in the primary region, an invalidation daemon broadcasts delete commands to Memcached in all regions. This ensures eventual consistency with low latency (local reads, cross-region invalidation). (4) Cold cluster warm-up — when deploying a new Memcached cluster, it starts with a 0% hit rate. The “cold cluster” feature allows the new cluster to fetch from an existing warm cluster on cache miss (instead of hitting the database), rapidly warming the new cache. These techniques enabled Facebook to serve 2+ billion Memcached requests per second across their infrastructure.

Scroll to Top