How would you design a server that has to process a fair number of functions of a requests in a second?
What if you are not aware of how many requests you will be getting?
What if requests has different priorities?
2026 Update: Fair Load Distribution — Consistent Hashing and Weighted Round Robin
Distributing work fairly across servers is a fundamental system design problem. Several algorithms exist, each with different trade-offs.
import hashlib
from bisect import bisect_left, insort
# Algorithm 1: Round Robin — equal distribution, sequential
class RoundRobin:
def __init__(self, servers: list):
self.servers = servers
self.index = 0
def get_server(self, request=None) -> str:
server = self.servers[self.index]
self.index = (self.index + 1) % len(self.servers)
return server
# Algorithm 2: Weighted Round Robin — proportional distribution
class WeightedRoundRobin:
def __init__(self, servers_weights: dict):
"""servers_weights: {'server1': 3, 'server2': 1} means 75/25 split."""
self.sequence = []
for server, weight in servers_weights.items():
self.sequence.extend([server] * weight)
self.index = 0
def get_server(self) -> str:
server = self.sequence[self.index]
self.index = (self.index + 1) % len(self.sequence)
return server
# Algorithm 3: Consistent Hashing — minimal redistribution on add/remove
class ConsistentHashRing:
def __init__(self, virtual_nodes: int = 150):
self.ring = {}
self.sorted_keys = []
self.virtual_nodes = virtual_nodes
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_server(self, server: str) -> None:
for i in range(self.virtual_nodes):
virtual_key = f"{server}#{i}"
h = self._hash(virtual_key)
self.ring[h] = server
insort(self.sorted_keys, h)
def remove_server(self, server: str) -> None:
for i in range(self.virtual_nodes):
virtual_key = f"{server}#{i}"
h = self._hash(virtual_key)
if h in self.ring:
del self.ring[h]
self.sorted_keys.remove(h)
def get_server(self, request: str) -> str:
"""Route request to nearest server clockwise on ring."""
if not self.ring:
return None
h = self._hash(request)
idx = bisect_left(self.sorted_keys, h)
if idx == len(self.sorted_keys):
idx = 0
return self.ring[self.sorted_keys[idx]]
# Test consistent hashing
ring = ConsistentHashRing(virtual_nodes=100)
servers = ['server-A', 'server-B', 'server-C']
for s in servers:
ring.add_server(s)
# Distribute 1000 requests
from collections import Counter
distribution = Counter(ring.get_server(f"request-{i}") for i in range(1000))
print("Consistent hashing distribution:")
for server, count in sorted(distribution.items()):
print(f" {server}: {count} requests ({count/10:.1f}%)")
# Removing a server: minimal redistribution
ring.remove_server('server-B')
distribution2 = Counter(ring.get_server(f"request-{i}") for i in range(1000))
print("nAfter removing server-B:")
for server, count in sorted(distribution2.items()):
print(f" {server}: {count} requests ({count/10:.1f}%)")
Algorithm comparison:
- Round robin: Simple, equal distribution, stateless. Bad if requests have different costs
- Weighted: Handles heterogeneous server capacity. Still requires knowing weights upfront
- Consistent hashing: When servers are added/removed, only ~1/n of requests need redistribution (vs n/n for modulo hashing). Used by: Amazon DynamoDB, Apache Cassandra, Akamai CDN, Redis Cluster
- Least connections: Route to server with fewest active connections — best for variable request duration
💡Strategies for Solving This Problem
Load Balancing and Request Handling
Got this at Meta in 2024. It's about designing scalable systems that handle variable load. Tests understanding of queuing, load balancing, and priority systems.
The Problem
Design a server that processes thousands of function requests per second. Challenges:
- Unknown request rate
- Different priorities
- Need fairness
- Can't block
Approach 1: Simple Queue
Single FIFO queue. Process requests as they come. Simple but:
- No priorities
- High-priority requests wait behind low-priority
- Doesn't scale well
Approach 2: Priority Queue
Use heap or priority queue. High-priority requests first. Better, but:
- Starvation: low-priority may never execute
- Not truly "fair"
Approach 3: Multiple Queues + Round Robin
Separate queue per priority level. Use weighted round-robin to process:
- 60% time on high priority
- 30% on medium
- 10% on low
Prevents starvation while maintaining priorities.
Approach 4: Thread Pool with Work Stealing
Multiple worker threads. Each has local queue. If idle, "steal" work from others. Good for:
- Load balancing
- Utilizing all cores
- Handling variable load
Key Components
Rate limiting: Prevent overwhelming server
Back pressure: Reject requests when overloaded
Monitoring: Track queue sizes, latencies
Autoscaling: Add workers when load increases
At Meta
They wanted to see if I understood the tradeoffs. Started with simple queue, they asked about priorities. Then about fairness and starvation. Finally discussed scaling and monitoring.