Server to Process Fair Number of Functions

How would you design a server that has to process a fair number of functions of a requests in a second?
What if you are not aware of how many requests you will be getting?
What if requests has different priorities?

2026 Update: Fair Load Distribution — Consistent Hashing and Weighted Round Robin

Distributing work fairly across servers is a fundamental system design problem. Several algorithms exist, each with different trade-offs.

import hashlib
from bisect import bisect_left, insort

# Algorithm 1: Round Robin — equal distribution, sequential
class RoundRobin:
    def __init__(self, servers: list):
        self.servers = servers
        self.index = 0

    def get_server(self, request=None) -> str:
        server = self.servers[self.index]
        self.index = (self.index + 1) % len(self.servers)
        return server

# Algorithm 2: Weighted Round Robin — proportional distribution
class WeightedRoundRobin:
    def __init__(self, servers_weights: dict):
        """servers_weights: {'server1': 3, 'server2': 1} means 75/25 split."""
        self.sequence = []
        for server, weight in servers_weights.items():
            self.sequence.extend([server] * weight)
        self.index = 0

    def get_server(self) -> str:
        server = self.sequence[self.index]
        self.index = (self.index + 1) % len(self.sequence)
        return server

# Algorithm 3: Consistent Hashing — minimal redistribution on add/remove
class ConsistentHashRing:
    def __init__(self, virtual_nodes: int = 150):
        self.ring = {}
        self.sorted_keys = []
        self.virtual_nodes = virtual_nodes

    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add_server(self, server: str) -> None:
        for i in range(self.virtual_nodes):
            virtual_key = f"{server}#{i}"
            h = self._hash(virtual_key)
            self.ring[h] = server
            insort(self.sorted_keys, h)

    def remove_server(self, server: str) -> None:
        for i in range(self.virtual_nodes):
            virtual_key = f"{server}#{i}"
            h = self._hash(virtual_key)
            if h in self.ring:
                del self.ring[h]
                self.sorted_keys.remove(h)

    def get_server(self, request: str) -> str:
        """Route request to nearest server clockwise on ring."""
        if not self.ring:
            return None
        h = self._hash(request)
        idx = bisect_left(self.sorted_keys, h)
        if idx == len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]

# Test consistent hashing
ring = ConsistentHashRing(virtual_nodes=100)
servers = ['server-A', 'server-B', 'server-C']
for s in servers:
    ring.add_server(s)

# Distribute 1000 requests
from collections import Counter
distribution = Counter(ring.get_server(f"request-{i}") for i in range(1000))
print("Consistent hashing distribution:")
for server, count in sorted(distribution.items()):
    print(f"  {server}: {count} requests ({count/10:.1f}%)")

# Removing a server: minimal redistribution
ring.remove_server('server-B')
distribution2 = Counter(ring.get_server(f"request-{i}") for i in range(1000))
print("nAfter removing server-B:")
for server, count in sorted(distribution2.items()):
    print(f"  {server}: {count} requests ({count/10:.1f}%)")

Algorithm comparison:

  • Round robin: Simple, equal distribution, stateless. Bad if requests have different costs
  • Weighted: Handles heterogeneous server capacity. Still requires knowing weights upfront
  • Consistent hashing: When servers are added/removed, only ~1/n of requests need redistribution (vs n/n for modulo hashing). Used by: Amazon DynamoDB, Apache Cassandra, Akamai CDN, Redis Cluster
  • Least connections: Route to server with fewest active connections — best for variable request duration

💡Strategies for Solving This Problem

Load Balancing and Request Handling

Got this at Meta in 2024. It's about designing scalable systems that handle variable load. Tests understanding of queuing, load balancing, and priority systems.

The Problem

Design a server that processes thousands of function requests per second. Challenges:

  • Unknown request rate
  • Different priorities
  • Need fairness
  • Can't block

Approach 1: Simple Queue

Single FIFO queue. Process requests as they come. Simple but:

  • No priorities
  • High-priority requests wait behind low-priority
  • Doesn't scale well

Approach 2: Priority Queue

Use heap or priority queue. High-priority requests first. Better, but:

  • Starvation: low-priority may never execute
  • Not truly "fair"

Approach 3: Multiple Queues + Round Robin

Separate queue per priority level. Use weighted round-robin to process:

  • 60% time on high priority
  • 30% on medium
  • 10% on low

Prevents starvation while maintaining priorities.

Approach 4: Thread Pool with Work Stealing

Multiple worker threads. Each has local queue. If idle, "steal" work from others. Good for:

  • Load balancing
  • Utilizing all cores
  • Handling variable load

Key Components

Rate limiting: Prevent overwhelming server

Back pressure: Reject requests when overloaded

Monitoring: Track queue sizes, latencies

Autoscaling: Add workers when load increases

At Meta

They wanted to see if I understood the tradeoffs. Started with simple queue, they asked about priorities. Then about fairness and starvation. Finally discussed scaling and monitoring.

Scroll to Top