Consistent Hashing: System Design Deep Dive
Consistent hashing is a distributed hashing scheme that minimises key remapping when nodes are added or removed. It is the backbone of distributed caches, distributed databases, and load balancers. Almost every senior-level system design interview expects you to explain it precisely.
The Problem with Naive Hashing
With a cluster of N servers, the naive approach routes key k to server hash(k) % N. When a node is added or removed, N changes and nearly every key remaps to a different server — O(K) keys move, where K is the total number of keys. For a 10-node cache cluster that loses one node, ~90% of the keyspace remaps, causing a thundering-herd cache miss storm.
# Naive hashing – BAD for dynamic clusters
class NaiveHashRing:
def __init__(self, nodes):
self.nodes = list(nodes)
def get_node(self, key):
return self.nodes[hash(key) % len(self.nodes)]
def add_node(self, node):
self.nodes.append(node)
# ~K*(N/(N+1)) keys now map to different nodes – catastrophic
def remove_node(self, node):
self.nodes.remove(node)
# Again, ~K/N keys remap
The Ring Abstraction
Consistent hashing maps both nodes and keys onto a virtual ring of 2^32 positions (using a hash function like MD5 or xxHash). Each key is assigned to the first node clockwise from its position on the ring.
- Add a node: Only keys between the new node and its predecessor migrate — O(K/N) keys.
- Remove a node: Only keys owned by the removed node migrate to its successor — O(K/N) keys.
- Lookup: Binary search over sorted ring positions — O(log N) per key.
Implementation: Basic Ring
import hashlib
import bisect
class ConsistentHashRing:
def __init__(self):
self._ring = {} # position -> node
self._keys = [] # sorted positions
def _hash(self, value: str) -> int:
return int(hashlib.md5(value.encode()).hexdigest(), 16)
def add_node(self, node: str) -> None:
pos = self._hash(node)
self._ring[pos] = node
bisect.insort(self._keys, pos)
def remove_node(self, node: str) -> None:
pos = self._hash(node)
del self._ring[pos]
idx = bisect.bisect_left(self._keys, pos)
self._keys.pop(idx)
def get_node(self, key: str) -> str | None:
if not self._ring:
return None
pos = self._hash(key)
idx = bisect.bisect(self._keys, pos) % len(self._keys)
return self._ring[self._keys[idx]]
The Critical Problem: Hot Spots
With a small number of real nodes, positions on the ring are uneven. One node may own 40% of the ring while another owns 5%. This causes non-uniform load distribution — exactly the problem we want to avoid.
Virtual Nodes (vnodes)
Each physical node is mapped to V positions on the ring (virtual nodes). Instead of hash("node-A"), we hash hash("node-A#0"), hash("node-A#1"), …, hash("node-A#V-1"). With V=150, the load imbalance drops below 5% (empirically). Cassandra and DynamoDB both use virtual nodes.
class VNodeHashRing:
def __init__(self, virtual_nodes: int = 150):
self.V = virtual_nodes
self._ring = {}
self._keys = []
def _hash(self, value: str) -> int:
return int(hashlib.md5(value.encode()).hexdigest(), 16)
def add_node(self, node: str) -> None:
for i in range(self.V):
pos = self._hash(f"{node}#{i}")
self._ring[pos] = node
bisect.insort(self._keys, pos)
def remove_node(self, node: str) -> None:
for i in range(self.V):
pos = self._hash(f"{node}#{i}")
del self._ring[pos]
idx = bisect.bisect_left(self._keys, pos)
self._keys.pop(idx)
def get_node(self, key: str) -> str | None:
if not self._ring:
return None
pos = self._hash(key)
idx = bisect.bisect(self._keys, pos) % len(self._keys)
return self._ring[self._keys[idx]]
def get_nodes(self, key: str, n: int) -> list[str]:
"""Return n distinct physical nodes for replication."""
if not self._ring:
return []
pos = self._hash(key)
idx = bisect.bisect(self._keys, pos) % len(self._keys)
seen = set()
nodes = []
for _ in range(len(self._keys)):
node = self._ring[self._keys[idx % len(self._keys)]]
if node not in seen:
seen.add(node)
nodes.append(node)
if len(nodes) == n:
break
idx += 1
return nodes
Replication with Preference Lists
DynamoDB uses the get_nodes(key, N) pattern above — each key has a preference list of N distinct physical nodes. With N=3 and quorum writes (W=2) and reads (R=2), the system tolerates a single-node failure while achieving strong consistency guarantees (W+R > N).
Bounded Load Consistent Hashing
Google’s 2017 paper (Mirrokni et al.) adds a load cap: a node rejects a key if its current load exceeds (1+ε) × average_load. The key is then forwarded clockwise to the next node. This bounds the maximum load imbalance to factor (1+ε) ≈ 1.25, preventing hot-spot overload during traffic spikes.
class BoundedLoadRing(VNodeHashRing):
def __init__(self, virtual_nodes=150, epsilon=0.25):
super().__init__(virtual_nodes)
self.epsilon = epsilon
self.load = {} # node -> current key count
self.total_keys = 0
def get_node_bounded(self, key: str) -> str | None:
if not self._ring:
return None
self.total_keys += 1
average = self.total_keys / max(len(set(self._ring.values())), 1)
cap = (1 + self.epsilon) * average
pos = self._hash(key)
idx = bisect.bisect(self._keys, pos) % len(self._keys)
for _ in range(len(self._keys)):
node = self._ring[self._keys[idx % len(self._keys)]]
if self.load.get(node, 0) < cap:
self.load[node] = self.load.get(node, 0) + 1
return node
idx += 1
return None # all overloaded — shouldn't happen
Real-World Usage
| System | Usage | V-nodes |
|---|---|---|
| Apache Cassandra | Token ring partitioning, virtual nodes for load balance | 256 |
| Amazon DynamoDB | Preference list, sloppy quorum via consistent hashing | yes |
| Redis Cluster | 16384 hash slots assigned to nodes (simplified consistent hashing) | slots |
| Memcached (libketama) | Client-side ring for cache routing | 150 |
| Nginx upstream hash | Sticky routing for session affinity | configurable |
| Envoy proxy | Ring hash load balancing policy | configurable |
Complexity Summary
| Operation | Basic Ring | VNode Ring |
|---|---|---|
| Add node | O(log N) insert, O(K/N) keys migrate | O(V log VN) insert, O(K/N) migrate |
| Remove node | O(log N) delete, O(K/N) keys migrate | O(V log VN) delete, O(K/N) migrate |
| Get node (lookup) | O(log N) | O(log VN) |
| Load balance | Poor (O(1/N) variance high) | Good (~5% imbalance at V=150) |
Interview Questions
How would you add weighted nodes to consistent hashing?
Assign a number of virtual nodes proportional to the node’s capacity. A node with 2x CPU/RAM gets 2V vnodes. This naturally routes more traffic to powerful nodes without any extra routing logic.
What happens during a node failure — how do you avoid data loss?
With replication (N=3), each key lives on 3 consecutive nodes. When node B fails, reads/writes fall through to the next node in the preference list (hinted handoff in Dynamo-style systems). When B recovers, hints are replayed to restore full replication.
How does Redis Cluster differ from pure consistent hashing?
Redis Cluster uses 16384 fixed hash slots assigned to master nodes. This is equivalent to consistent hashing with V=16384/num_nodes virtual nodes, but the slot assignments are explicit and stored in cluster state, not computed on-the-fly. Hot slot migration is manual (CLUSTER SETSLOT).
How do you detect key skew (hot keys) on the ring?
Track per-node request rates with exponentially weighted moving averages. If any node exceeds 2× average rate, emit a HOTSPOT alert. For hot keys specifically, add a key-level frequency counter (Count-Min Sketch) and replicate hot keys to extra nodes.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is consistent hashing and why is it used in distributed systems?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Consistent hashing maps both servers and keys onto a virtual ring. Each key is owned by the first server clockwise from its ring position. When a server is added or removed, only O(K/N) keys migrate (where K=keys, N=nodes) instead of remapping nearly all keys as in modular hashing. This is critical for distributed caches and databases that must tolerate node churn without thundering-herd cache miss storms.”
}
},
{
“@type”: “Question”,
“name”: “What are virtual nodes and why does consistent hashing need them?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “With a small number of physical nodes, their ring positions are uneven, causing one node to own 40% of keyspace while another owns 5%. Virtual nodes solve this: each physical node is mapped to V positions (hash(“node-A#0”), u2026, hash(“node-A#V-1″)). With V=150, load imbalance drops below 5%. Cassandra uses 256 vnodes; libketama uses 150.”
}
},
{
“@type”: “Question”,
“name”: “How does DynamoDB use consistent hashing for replication?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “DynamoDB hashes each key to a ring position, then builds a preference list of N distinct physical nodes starting from that position. Writes go to all N nodes (quorum W=2 of N=3). This means each key has 3 replicas on 3 different nodes. When a node fails, reads/writes fall through to the next node in the preference list, and hinted handoff queues repairs for when the node returns.”
}
},
{
“@type”: “Question”,
“name”: “What is the time complexity of consistent hashing lookup?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “O(log(V*N)) where V=virtual nodes per server and N=server count. The ring is stored as a sorted array of positions; lookup is a binary search to find the successor position. For V=150, N=10: log(1500) u2248 11 comparisons per lookup.”
}
},
{
“@type”: “Question”,
“name”: “How does Redis Cluster differ from consistent hashing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Redis Cluster uses 16,384 fixed hash slots. Each key is assigned to slot = CRC16(key) % 16384. Slots are statically distributed to master nodes. This is similar to consistent hashing with a very large number of virtual nodes, but slot-to-node mapping is stored in explicit cluster state rather than computed on-the-fly. Hot slot migration is manual via CLUSTER SETSLOT.”
}
}
]
}
Asked at: Uber Interview Guide
Asked at: Databricks Interview Guide
Asked at: Cloudflare Interview Guide
Asked at: Netflix Interview Guide
Asked at: Twitter/X Interview Guide