Load Balancer Low-Level Design
A load balancer distributes incoming network traffic across a pool of upstream servers, ensuring no single server becomes a bottleneck. At the low level, the three decisions that define a load balancer are: which algorithm selects the upstream, how the balancer knows upstreams are alive, and how sessions are preserved across requests.
Algorithm Comparison
Round-Robin cycles through upstreams in order. It is optimal for stateless services where every request has similar cost, because it achieves even distribution with O(1) selection and zero state.
Least-Connections routes each new request to the upstream with the fewest active connections. It is optimal for services with variable request cost (e.g., mixed short and long queries) because it prevents a slow upstream from accumulating a backlog while idle peers wait.
Consistent Hash maps a hash of the request key (URL path, user ID, or IP) onto a ring of upstream tokens. The same key always maps to the same upstream node unless the ring changes. This is optimal when upstreams maintain local caches (object storage proxies, key-value shards) because cache hit rates degrade proportionally to how often requests are misrouted.
In practice, production balancers combine algorithms: consistent hash for cache-affinity tiers, least-connections for compute tiers, and round-robin for lightweight API tiers.
Active Health Checks
The balancer sends periodic HTTP or TCP probes to each upstream independently of real traffic. Each probe records latency and HTTP status. After failure_threshold consecutive failed probes, the upstream transitions to UNHEALTHY and is removed from the selection pool. After recovery_threshold consecutive successes, it is re-added.
The probe interval must be short enough to detect failures quickly but not so short that probe traffic saturates small upstreams. A typical configuration is a 5-second interval with a 2-second timeout and a threshold of 3 consecutive failures.
Circuit tripping is an extension: if an upstream returns 5xx responses above a configurable error rate on real traffic, it is tripped immediately without waiting for the health probe cycle.
Session Affinity via Cookie
For stateful applications (shopping carts, authenticated sessions stored in-process), requests from the same client must return to the same upstream. The balancer inserts a cookie (e.g., LB_AFFINITY) containing a hash of the selected upstream's address. On subsequent requests, the balancer reads the cookie and routes directly to that upstream if it is healthy. If the upstream is unhealthy, the balancer re-selects and updates the cookie.
Cookie affinity is preferable to IP-hash affinity because it survives NAT, proxies, and mobile network changes.
SSL Termination
In SSL termination mode, the balancer handles the full TLS handshake with the client, decrypts the traffic, and forwards plaintext HTTP to upstreams over a private network. This offloads certificate management and TLS computation from application servers and allows the balancer to inspect and route on HTTP headers.
In SSL passthrough mode, the balancer forwards the raw TLS stream to the upstream, which performs its own TLS handshake. Passthrough is required when end-to-end encryption is mandated or when the application needs to present its own certificate (e.g., mTLS to clients).
The trade-off: termination enables L7 routing and header inspection; passthrough provides E2E encryption but limits routing to L4 (IP/port only).
Connection Draining
When an upstream is removed from the pool (health failure, deployment, scale-in), in-flight requests must complete gracefully. The balancer marks the upstream as draining: no new requests are routed to it, but existing connections are allowed to finish. After a drain timeout (typically 30 seconds), remaining connections are forcibly closed. This prevents request errors during rolling deployments.
SQL Schema
CREATE TABLE Upstream (
id BIGSERIAL PRIMARY KEY,
address TEXT NOT NULL,
port INTEGER NOT NULL,
weight INTEGER NOT NULL DEFAULT 1,
status TEXT NOT NULL DEFAULT 'HEALTHY', -- HEALTHY | DRAINING | UNHEALTHY
active_connections INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE HealthProbe (
id BIGSERIAL PRIMARY KEY,
upstream_id BIGINT NOT NULL REFERENCES Upstream(id),
status TEXT NOT NULL, -- SUCCESS | FAILURE
latency_ms INTEGER,
checked_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX ON HealthProbe (upstream_id, checked_at DESC);
CREATE TABLE StickySession (
cookie_value TEXT PRIMARY KEY,
upstream_id BIGINT NOT NULL REFERENCES Upstream(id),
expires_at TIMESTAMPTZ NOT NULL
);
CREATE INDEX ON StickySession (upstream_id);
CREATE INDEX ON StickySession (expires_at);
Python Implementation
import hashlib, random, time
from dataclasses import dataclass, field
from typing import List, Optional
@dataclass
class Upstream:
id: int
address: str
port: int
weight: int = 1
status: str = "HEALTHY"
active_connections: int = 0
def select_upstream(algorithm: str, upstreams: List[Upstream], request_key: str = "") -> Optional[Upstream]:
healthy = [u for u in upstreams if u.status == "HEALTHY"]
if not healthy:
return None
if algorithm == "round_robin":
# Caller maintains RR index externally; here we simulate with random for brevity
return healthy[int(time.time()) % len(healthy)]
if algorithm == "least_connections":
return min(healthy, key=lambda u: u.active_connections)
if algorithm == "consistent_hash":
ring_points = 150 # virtual nodes per upstream
ring = {}
for u in healthy:
for i in range(ring_points):
h = int(hashlib.md5(f"{u.id}:{i}".encode()).hexdigest(), 16)
ring[h] = u
if not ring:
return None
key_hash = int(hashlib.md5(request_key.encode()).hexdigest(), 16)
sorted_keys = sorted(ring.keys())
for k in sorted_keys:
if key_hash dict:
import urllib.request, time as t
url = f"http://{upstream.address}:{upstream.port}/health"
start = t.monotonic()
try:
with urllib.request.urlopen(url, timeout=2) as resp:
latency_ms = int((t.monotonic() - start) * 1000)
return {"status": "SUCCESS" if resp.status == 200 else "FAILURE",
"latency_ms": latency_ms}
except Exception:
latency_ms = int((t.monotonic() - start) * 1000)
return {"status": "FAILURE", "latency_ms": latency_ms}
def drain_upstream(upstream_id: int, upstreams: List[Upstream], drain_timeout_s: int = 30) -> None:
for u in upstreams:
if u.id == upstream_id:
u.status = "DRAINING"
print(f"Upstream {upstream_id} set to DRAINING; waiting {drain_timeout_s}s")
deadline = time.monotonic() + drain_timeout_s
while time.monotonic() < deadline:
if u.active_connections == 0:
break
time.sleep(1)
u.status = "UNHEALTHY"
print(f"Upstream {upstream_id} drained and marked UNHEALTHY")
return
print(f"Upstream {upstream_id} not found")
Frequently Asked Questions
How does consistent hashing minimize redistribution when upstreams change?
The ring assigns each upstream multiple virtual node positions. When one upstream is removed, only the keys that mapped to that upstream's virtual nodes are redistributed to the next node on the ring; all other keys remain unaffected. With 150 virtual nodes per upstream, removing one upstream redistributes approximately 1/N of keys rather than all keys.
What is connection draining and why is it necessary during deployments?
Connection draining allows in-flight requests to complete on an upstream that has been removed from the selection pool. Without draining, a rolling deployment would terminate upstream processes mid-request, causing HTTP 502 or 504 errors for users whose requests were in progress. The balancer stops routing new requests to the draining upstream but waits for active connections to close naturally.
Why is cookie-based session affinity preferred over IP-hash?
IP-hash breaks under NAT (many users share one IP) and under mobile clients that change IP between requests. A load balancer-inserted cookie is tied to the individual browser session and survives network changes, giving deterministic routing per client without depending on IP stability.
What is the difference between SSL termination and SSL passthrough?
SSL termination decrypts TLS at the load balancer and forwards plaintext to upstreams, enabling L7 routing on HTTP headers and centralizing certificate management. SSL passthrough forwards the raw encrypted stream to the upstream, which handles its own TLS, providing end-to-end encryption but restricting routing to L4 (IP/port) with no header visibility.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does consistent hashing minimize redistribution when upstreams change?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The ring assigns each upstream multiple virtual node positions. When one upstream is removed, only the keys that mapped to that upstream's virtual nodes are redistributed to the next node on the ring; all other keys remain unaffected. With 150 virtual nodes per upstream, removing one upstream redistributes approximately 1/N of keys rather than all keys.”
}
},
{
“@type”: “Question”,
“name”: “What is connection draining and why is it necessary during deployments?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Connection draining allows in-flight requests to complete on an upstream that has been removed from the selection pool. Without draining, a rolling deployment would terminate upstream processes mid-request, causing HTTP 502 or 504 errors for users whose requests were in progress. The balancer stops routing new requests to the draining upstream but waits for active connections to close naturally.”
}
},
{
“@type”: “Question”,
“name”: “Why is cookie-based session affinity preferred over IP-hash?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “IP-hash breaks under NAT (many users share one IP) and under mobile clients that change IP between requests. A load balancer-inserted cookie is tied to the individual browser session and survives network changes, giving deterministic routing per client without depending on IP stability.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between SSL termination and SSL passthrough?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “SSL termination decrypts TLS at the load balancer and forwards plaintext to upstreams, enabling L7 routing on HTTP headers and centralizing certificate management. SSL passthrough forwards the raw encrypted stream to the upstream, which handles its own TLS, providing end-to-end encryption but restricting routing to L4 (IP/port) with no header visibility.”
}
}
]
}
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does consistent hash load balancing work for cache affinity?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A hash ring maps upstream servers to positions; each request is hashed to a position and routed to the nearest upstream clockwise. Adding or removing an upstream redistributes only a fraction of keys, preserving cache affinity for most requests.”
}
},
{
“@type”: “Question”,
“name”: “How is connection draining implemented during upstream removal?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The upstream is marked draining; the load balancer stops routing new connections to it but allows existing connections to complete up to a max drain timeout, after which connections are forcefully closed.”
}
},
{
“@type”: “Question”,
“name”: “How does session affinity via cookie work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “On first request, the load balancer inserts an affinity cookie containing a hash of the selected upstream; subsequent requests with the cookie are routed to the same upstream as long as it remains healthy.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between SSL termination and SSL passthrough?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “In SSL termination, the load balancer decrypts TLS and forwards plaintext to upstreams, enabling header inspection and routing; in SSL passthrough, TLS is forwarded unchanged to upstreams, which handle decryption themselves.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety