Consistent Hashing Low-Level Design: Distributed Key Routing and Ring Design

Consistent hashing distributes keys across a cluster of nodes such that adding or removing a node only remaps approximately 1/N of the keys, not all of them. Standard modulo hashing (key % N) remaps every key when N changes — catastrophic for caches (cold start) and distributed databases (massive data movement). Consistent hashing solves this by placing both nodes and keys on a virtual ring and assigning each key to the nearest node clockwise. It is the foundation of DynamoDB, Cassandra, and distributed caches like Memcached and Redis Cluster.

Ring Implementation

import hashlib
import bisect

class ConsistentHashRing:
    def __init__(self, virtual_nodes: int = 150):
        self.virtual_nodes = virtual_nodes  # replicas per physical node
        self.ring: dict[int, str] = {}       # hash position -> node name
        self.sorted_keys: list[int] = []     # sorted list of hash positions

    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add_node(self, node: str):
        """Add a node with virtual_nodes replicas spread around the ring."""
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:vnode:{i}"
            h = self._hash(virtual_key)
            self.ring[h] = node
            bisect.insort(self.sorted_keys, h)

    def remove_node(self, node: str):
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:vnode:{i}"
            h = self._hash(virtual_key)
            del self.ring[h]
            self.sorted_keys.remove(h)

    def get_node(self, key: str) -> str:
        """Find the node responsible for a given key."""
        if not self.ring:
            raise ValueError("Ring is empty")
        h = self._hash(key)
        # Find the first position on the ring >= h (clockwise)
        idx = bisect.bisect_left(self.sorted_keys, h)
        if idx == len(self.sorted_keys):
            idx = 0  # wrap around the ring
        return self.ring[self.sorted_keys[idx]]

    def get_nodes(self, key: str, n: int) -> list[str]:
        """Get n distinct nodes for replication (walk clockwise)."""
        if not self.ring:
            return []
        h = self._hash(key)
        idx = bisect.bisect_left(self.sorted_keys, h)
        nodes = []
        seen = set()
        for offset in range(len(self.sorted_keys)):
            pos = (idx + offset) % len(self.sorted_keys)
            node = self.ring[self.sorted_keys[pos]]
            if node not in seen:
                nodes.append(node)
                seen.add(node)
            if len(nodes) == n:
                break
        return nodes

Why Virtual Nodes Are Essential

Without virtual nodes (one position per physical node on the ring), the ring positions are determined by the hash of the node names. With only a few nodes, their hash positions cluster unevenly — some nodes end up responsible for a large arc of the ring and get more keys. With N=150 virtual nodes per physical node, each physical node is placed at 150 positions, and the arcs between adjacent positions are roughly equal on average. Load is distributed more evenly across nodes.

# Compare key distribution with and without virtual nodes:
# 4 physical nodes, 10000 keys:

# Without virtual nodes (1 vnode each):
# node-1: 2847 keys (28.5%)
# node-2: 3201 keys (32.0%)
# node-3: 1998 keys (20.0%)
# node-4: 1954 keys (19.5%)

# With 150 virtual nodes:
# node-1: 2512 keys (25.1%)
# node-2: 2489 keys (24.9%)
# node-3: 2503 keys (25.0%)
# node-4: 2496 keys (25.0%)
# -- much more even distribution

Adding and Removing Nodes

# Adding a node: only keys in the new node's arcs migrate
ring = ConsistentHashRing(virtual_nodes=150)
ring.add_node('cache-1')
ring.add_node('cache-2')
ring.add_node('cache-3')

# Keys currently mapped:
# user:1001 -> cache-2
# user:1002 -> cache-1
# user:1003 -> cache-3

ring.add_node('cache-4')

# After adding cache-4:
# user:1001 -> cache-4  (moved -- falls in cache-4's arc)
# user:1002 -> cache-1  (unchanged)
# user:1003 -> cache-3  (unchanged)
# ~25% of keys moved, not 100%

# Removing a node: only keys on that node need to migrate
ring.remove_node('cache-2')
# Keys previously on cache-2 move to their next clockwise node
# All other keys are unaffected

Usage in Distributed Cache Routing

class DistributedCache:
    def __init__(self, nodes: list[str]):
        self.ring = ConsistentHashRing(virtual_nodes=150)
        self.clients = {}
        for node in nodes:
            self.ring.add_node(node)
            self.clients[node] = RedisClient(node)

    def get(self, key: str):
        node = self.ring.get_node(key)
        return self.clients[node].get(key)

    def set(self, key: str, value, ttl: int = 300):
        node = self.ring.get_node(key)
        self.clients[node].setex(key, ttl, value)

    def add_node(self, node: str):
        self.ring.add_node(node)
        self.clients[node] = RedisClient(node)
        # Existing keys cached on previous nodes are not migrated immediately
        # They expire naturally via TTL or are served as cache misses until re-cached

    def remove_node(self, node: str):
        self.ring.remove_node(node)
        del self.clients[node]

Key Interview Points

  • With standard hash(key) % N, adding one node remaps ~N/(N+1) of all keys — effectively a full cache flush. With consistent hashing, adding one node remaps ~1/(N+1) of keys — a proportional migration.
  • Virtual nodes solve uneven distribution and make it easy to give higher-capacity nodes more weight: assign 300 virtual nodes to a node with 2x the memory instead of the standard 150.
  • Consistent hashing does not guarantee zero key movement — it guarantees minimal key movement. Only keys whose responsible arc changes move.
  • In Cassandra, the consistent hash ring is called the token ring. Each node owns a range of tokens; vnodes (virtual nodes) give each node multiple non-contiguous token ranges.
  • For replication, walk clockwise from the key’s position and collect N distinct physical nodes (get_nodes with n=3 for RF=3). This is exactly how Cassandra and DynamoDB select replica sets.
  • Hotspot mitigation: if one key is overwhelmingly popular (e.g., a viral post), consistent hashing still routes it to one node. Mitigate with application-level local caching or key sharding (append a random suffix to spread the hot key across multiple nodes).

Consistent hashing and distributed cache routing design is discussed in Amazon system design interview questions.

Consistent hashing and distributed data partitioning design is covered in Databricks system design interview preparation.

Consistent hashing and distributed systems design is discussed in Netflix system design interview guide.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

Scroll to Top