Memcached is a high-performance, distributed memory caching system used by Facebook, Twitter, and Wikipedia to reduce database load and accelerate response times. While Redis has overtaken Memcached in popularity, understanding Memcached architecture provides deep insight into distributed cache design: memory management, consistent hashing, hot key mitigation, and cache cluster operations. This guide covers these internals — essential for system design interviews involving caching at scale.
Memcached Architecture
Memcached is a distributed hash table spread across multiple servers. There is no inter-server communication — each server is independent. The client library handles distribution: it hashes the key to determine which server stores it, sends the request directly to that server, and handles failover. This shared-nothing architecture makes Memcached horizontally scalable: adding a server increases total cache capacity proportionally. Each Memcached server is a simple key-value store holding data in memory. Maximum value size: 1 MB. No data structures beyond strings (unlike Redis). No persistence — all data is lost on restart. No replication between servers. Operations: GET (retrieve by key), SET (store), DELETE (remove), INCR/DECR (atomic counter operations), and CAS (compare-and-swap for optimistic concurrency). The simplicity is the strength: Memcached achieves 200,000-500,000 operations per second per server with sub-millisecond latency because it does only one thing — in-memory key-value storage — with minimal overhead.
Slab Allocation: Memory Management
Memcached uses slab allocation to avoid memory fragmentation. Instead of malloc/free for each item (which causes fragmentation over time), memory is pre-allocated in slabs. A slab is a 1 MB chunk of memory divided into fixed-size slots (chunks). Slab classes: each class has a different chunk size. Class 1: 96-byte chunks. Class 2: 120-byte chunks. Class 3: 152-byte chunks (each class is ~1.25x the previous). Up to ~40 classes covering sizes up to 1 MB. When an item is stored: Memcached picks the smallest slab class that fits the item (key + value + metadata). The item is stored in a free chunk of that class. If no free chunk exists, LRU eviction removes the least recently used item within that slab class. Slab calcification problem: if your workload shifts (initially many small items, later many large items), the memory allocated to small-slab classes cannot be reused by large-slab classes. Memory is “calcified” in the wrong distribution. Memcached slab automover (newer versions) detects imbalanced slab usage and reassigns slabs between classes. Monitoring: the stats slabs command shows per-class usage, eviction counts, and free chunks — essential for identifying slab imbalance.
Client-Side Consistent Hashing
The Memcached client distributes keys across servers using consistent hashing. Each server is placed at multiple points on a hash ring (virtual nodes). A key is hashed to a position on the ring and assigned to the nearest server clockwise. Adding a server: only ~1/N of keys remap (from the adjacent server). Without consistent hashing (modular hashing: server = hash(key) % N), adding a server remaps ~80% of keys — causing a massive cache miss storm. Failover: if a server goes down, the client library detects it (connection failure) and remaps its keys to the next server on the ring. The failed server keys experience cache misses (fetched from the database and cached on the new server). The rest of the cluster is unaffected. Rehashing storm mitigation: when a server fails, its keys suddenly hit the database. If the failed server held popular keys, this can overwhelm the database. Mitigation: use a gutter pool — a small set of spare Memcached servers that temporarily absorb traffic from the failed server. Facebook uses this approach: instead of remapping to the next ring server (which may also become overloaded), remap to a dedicated gutter server with short TTLs.
Hot Keys and Thundering Herd
Hot key problem: a single key receives disproportionate traffic (a viral tweet, a popular product listing). Since Memcached maps each key to one server, that server becomes a bottleneck. Solutions: (1) Client-side replication — store the hot key on multiple servers with a suffix: hot_key_1, hot_key_2, …, hot_key_R. The client randomly picks one replica for each read. Distributes load across R servers. (2) Local caching — cache extremely hot keys in the application process memory (L1 cache) with a very short TTL (1-5 seconds). Requests are served from local memory without any network call. (3) Lease-based protection (Facebook approach) — when a cache miss occurs, Memcached issues a lease (a token) to the requesting client. Only the client with the lease fetches from the database and sets the cache. Other clients requesting the same key during this period receive a “wait” signal and retry after a short delay (e.g., 10ms). This prevents the thundering herd: instead of 1000 concurrent database queries for the same key, only 1 query is made. The lease approach is the most effective thundering herd mitigation at scale.
Facebook Memcached at Scale
Facebook published a seminal paper on scaling Memcached to billions of requests per second. Key insights: (1) Demand-filled cache — caches are populated lazily. A web server reads from Memcached; on miss, it reads from the database and sets the cache. On database writes, the web server invalidates the cache (delete, not set — avoids race conditions). (2) Reducing latency — batch GET requests using multi-get (fetch 100+ keys in one round-trip). Use UDP for GET requests (lower overhead than TCP for small payloads). Use TCP for SET/DELETE (reliability required). (3) Regions and replication — each datacenter region has its own Memcached cluster. Cross-region invalidation: when a database write occurs in the primary region, an invalidation daemon broadcasts delete commands to Memcached in all regions. This ensures eventual consistency with low latency (local reads, cross-region invalidation). (4) Cold cluster warm-up — when deploying a new Memcached cluster, it starts with a 0% hit rate. The “cold cluster” feature allows the new cluster to fetch from an existing warm cluster on cache miss (instead of hitting the database), rapidly warming the new cache. These techniques enabled Facebook to serve 2+ billion Memcached requests per second across their infrastructure.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does Memcached distribute keys across servers?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Memcached servers are independent — no inter-server communication. The client library handles distribution using consistent hashing: each server is placed at multiple points on a hash ring (virtual nodes). A key is hashed to a ring position and assigned to the nearest server clockwise. Adding a server remaps only ~1/N of keys (minimal disruption). Without consistent hashing (modular hash: server = hash(key) % N), adding a server remaps ~80% of keys, causing a massive cache miss storm. Failover: if a server dies, its keys remap to the next ring server. Those keys experience cache misses (fetched from database, cached on the new server). The rest of the cluster is unaffected. Facebook uses a gutter pool: spare servers that temporarily absorb traffic from failed servers with short TTLs, preventing the main cluster from being overloaded by rehashed traffic.”}},{“@type”:”Question”,”name”:”What is the thundering herd problem and how does Facebook solve it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Thundering herd: a popular cached key expires. Hundreds of concurrent requests miss the cache simultaneously and all hit the database with the same query. The database may be overwhelmed. Facebook solution — leases: when a cache miss occurs, Memcached issues a lease (token) to ONE requesting client. Only that client fetches from the database and sets the cache. Other clients requesting the same key during this period receive a wait signal and retry after a short delay (10ms). Result: instead of 1000 concurrent database queries for the same key, only 1 query is made. The other 999 requests either wait briefly or receive a slightly stale cached value (if stale-while-revalidate is used). The lease approach is the most effective thundering herd mitigation and was critical to Facebook scaling Memcached to 2+ billion requests per second.”}},{“@type”:”Question”,”name”:”How does Memcached slab allocation prevent memory fragmentation?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Instead of malloc/free per item (which fragments memory over time), Memcached pre-allocates memory in 1 MB slabs divided into fixed-size chunks. Slab classes have different chunk sizes: class 1 = 96 bytes, class 2 = 120 bytes, class 3 = 152 bytes (each ~1.25x larger), up to 1 MB. When storing an item, Memcached picks the smallest class that fits. Items are stored in free chunks of that class. When no free chunk exists, LRU eviction removes the least recently used item WITHIN that class. Slab calcification problem: if workload shifts (initially small items, later large items), memory allocated to small classes cannot serve large items. Newer Memcached versions include slab automover that detects imbalance and reassigns slabs between classes. Monitor with stats slabs to identify class-level evictions and free chunk counts.”}},{“@type”:”Question”,”name”:”How did Facebook scale Memcached to handle billions of requests per second?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Key techniques from Facebook published paper: (1) Demand-filled cache: populate lazily on miss, invalidate (delete, not set) on database writes to avoid race conditions. (2) Multi-get batching: fetch 100+ keys in one round-trip. Use UDP for GET requests (lower overhead). TCP for SET/DELETE (reliability needed). (3) Regional replication: each datacenter has its own Memcached cluster. Cross-region invalidation via a daemon that broadcasts deletes to all regions. Local reads, cross-region invalidation = eventual consistency with low latency. (4) Lease-based thundering herd prevention: only one client fetches from DB per cache miss. Others wait briefly. (5) Cold cluster warm-up: new Memcached clusters fetch from existing warm clusters on miss (instead of hitting the database), rapidly building cache hit rate. (6) Client-side hot key replication: replicate hot keys across multiple servers with suffixed keys. These techniques combined enabled 2+ billion requests per second across Facebook infrastructure.”}}]}