Question 1

How does Memcached distribute keys across servers?

Accepted Answer

Memcached servers are independent -- no inter-server communication. The client library handles distribution using consistent hashing: each server is placed at multiple points on a hash ring (virtual nodes). A key is hashed to a ring position and assigned to the nearest server clockwise. Adding a server remaps only ~1/N of keys (minimal disruption). Without consistent hashing (modular hash: server = hash(key) % N), adding a server remaps ~80% of keys, causing a massive cache miss storm. Failover: if a server dies, its keys remap to the next ring server. Those keys experience cache misses (fetched from database, cached on the new server). The rest of the cluster is unaffected. Facebook uses a gutter pool: spare servers that temporarily absorb traffic from failed servers with short TTLs, preventing the main cluster from being overloaded by rehashed traffic.

Question 2

What is the thundering herd problem and how does Facebook solve it?

Accepted Answer

Thundering herd: a popular cached key expires. Hundreds of concurrent requests miss the cache simultaneously and all hit the database with the same query. The database may be overwhelmed. Facebook solution -- leases: when a cache miss occurs, Memcached issues a lease (token) to ONE requesting client. Only that client fetches from the database and sets the cache. Other clients requesting the same key during this period receive a wait signal and retry after a short delay (10ms). Result: instead of 1000 concurrent database queries for the same key, only 1 query is made. The other 999 requests either wait briefly or receive a slightly stale cached value (if stale-while-revalidate is used). The lease approach is the most effective thundering herd mitigation and was critical to Facebook scaling Memcached to 2+ billion requests per second.

Question 3

How does Memcached slab allocation prevent memory fragmentation?

Accepted Answer

Instead of malloc/free per item (which fragments memory over time), Memcached pre-allocates memory in 1 MB slabs divided into fixed-size chunks. Slab classes have different chunk sizes: class 1 = 96 bytes, class 2 = 120 bytes, class 3 = 152 bytes (each ~1.25x larger), up to 1 MB. When storing an item, Memcached picks the smallest class that fits. Items are stored in free chunks of that class. When no free chunk exists, LRU eviction removes the least recently used item WITHIN that class. Slab calcification problem: if workload shifts (initially small items, later large items), memory allocated to small classes cannot serve large items. Newer Memcached versions include slab automover that detects imbalance and reassigns slabs between classes. Monitor with stats slabs to identify class-level evictions and free chunk counts.

Question 4

How did Facebook scale Memcached to handle billions of requests per second?

Accepted Answer

Key techniques from Facebook published paper: (1) Demand-filled cache: populate lazily on miss, invalidate (delete, not set) on database writes to avoid race conditions. (2) Multi-get batching: fetch 100+ keys in one round-trip. Use UDP for GET requests (lower overhead). TCP for SET/DELETE (reliability needed). (3) Regional replication: each datacenter has its own Memcached cluster. Cross-region invalidation via a daemon that broadcasts deletes to all regions. Local reads, cross-region invalidation = eventual consistency with low latency. (4) Lease-based thundering herd prevention: only one client fetches from DB per cache miss. Others wait briefly. (5) Cold cluster warm-up: new Memcached clusters fetch from existing warm clusters on miss (instead of hitting the database), rapidly building cache hit rate. (6) Client-side hot key replication: replicate hot keys across multiple servers with suffixed keys. These techniques combined enabled 2+ billion requests per second across Facebook infrastructure.

System Design: Distributed Cache (Memcached) — Slab Allocation, Consistent Hashing, Hot Keys, Thundering Herd

Memcached Architecture

Slab Allocation: Memory Management

Client-Side Consistent Hashing

Hot Keys and Thundering Herd

Facebook Memcached at Scale