Load Balancer Low-Level Design: Algorithms, Health Checks, Session Affinity, and SSL Termination

⏱ 8 min read

Load Balancer Low-Level Design

A load balancer distributes incoming network traffic across a pool of upstream servers, ensuring no single server becomes a bottleneck. At the low level, the three decisions that define a load balancer are: which algorithm selects the upstream, how the balancer knows upstreams are alive, and how sessions are preserved across requests.

Algorithm Comparison

Round-Robin cycles through upstreams in order. It is optimal for stateless services where every request has similar cost, because it achieves even distribution with O(1) selection and zero state.

Least-Connections routes each new request to the upstream with the fewest active connections. It is optimal for services with variable request cost (e.g., mixed short and long queries) because it prevents a slow upstream from accumulating a backlog while idle peers wait.

Consistent Hash maps a hash of the request key (URL path, user ID, or IP) onto a ring of upstream tokens. The same key always maps to the same upstream node unless the ring changes. This is optimal when upstreams maintain local caches (object storage proxies, key-value shards) because cache hit rates degrade proportionally to how often requests are misrouted.

In practice, production balancers combine algorithms: consistent hash for cache-affinity tiers, least-connections for compute tiers, and round-robin for lightweight API tiers.

Active Health Checks

The balancer sends periodic HTTP or TCP probes to each upstream independently of real traffic. Each probe records latency and HTTP status. After failure_threshold consecutive failed probes, the upstream transitions to UNHEALTHY and is removed from the selection pool. After recovery_threshold consecutive successes, it is re-added.

The probe interval must be short enough to detect failures quickly but not so short that probe traffic saturates small upstreams. A typical configuration is a 5-second interval with a 2-second timeout and a threshold of 3 consecutive failures.

Circuit tripping is an extension: if an upstream returns 5xx responses above a configurable error rate on real traffic, it is tripped immediately without waiting for the health probe cycle.

For stateful applications (shopping carts, authenticated sessions stored in-process), requests from the same client must return to the same upstream. The balancer inserts a cookie (e.g., LB_AFFINITY) containing a hash of the selected upstream's address. On subsequent requests, the balancer reads the cookie and routes directly to that upstream if it is healthy. If the upstream is unhealthy, the balancer re-selects and updates the cookie.

Cookie affinity is preferable to IP-hash affinity because it survives NAT, proxies, and mobile network changes.

SSL Termination

In SSL termination mode, the balancer handles the full TLS handshake with the client, decrypts the traffic, and forwards plaintext HTTP to upstreams over a private network. This offloads certificate management and TLS computation from application servers and allows the balancer to inspect and route on HTTP headers.

In SSL passthrough mode, the balancer forwards the raw TLS stream to the upstream, which performs its own TLS handshake. Passthrough is required when end-to-end encryption is mandated or when the application needs to present its own certificate (e.g., mTLS to clients).

The trade-off: termination enables L7 routing and header inspection; passthrough provides E2E encryption but limits routing to L4 (IP/port only).

Connection Draining

When an upstream is removed from the pool (health failure, deployment, scale-in), in-flight requests must complete gracefully. The balancer marks the upstream as draining: no new requests are routed to it, but existing connections are allowed to finish. After a drain timeout (typically 30 seconds), remaining connections are forcibly closed. This prevents request errors during rolling deployments.

SQL Schema


CREATE TABLE Upstream (
    id            BIGSERIAL PRIMARY KEY,
    address       TEXT NOT NULL,
    port          INTEGER NOT NULL,
    weight        INTEGER NOT NULL DEFAULT 1,
    status        TEXT NOT NULL DEFAULT 'HEALTHY', -- HEALTHY | DRAINING | UNHEALTHY
    active_connections INTEGER NOT NULL DEFAULT 0,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE HealthProbe (
    id            BIGSERIAL PRIMARY KEY,
    upstream_id   BIGINT NOT NULL REFERENCES Upstream(id),
    status        TEXT NOT NULL,    -- SUCCESS | FAILURE
    latency_ms    INTEGER,
    checked_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX ON HealthProbe (upstream_id, checked_at DESC);

CREATE TABLE StickySession (
    cookie_value  TEXT PRIMARY KEY,
    upstream_id   BIGINT NOT NULL REFERENCES Upstream(id),
    expires_at    TIMESTAMPTZ NOT NULL
);

CREATE INDEX ON StickySession (upstream_id);
CREATE INDEX ON StickySession (expires_at);

Python Implementation


import hashlib, random, time
from dataclasses import dataclass, field
from typing import List, Optional

@dataclass
class Upstream:
    id: int
    address: str
    port: int
    weight: int = 1
    status: str = "HEALTHY"
    active_connections: int = 0

def select_upstream(algorithm: str, upstreams: List[Upstream], request_key: str = "") -> Optional[Upstream]:
    healthy = [u for u in upstreams if u.status == "HEALTHY"]
    if not healthy:
        return None
    if algorithm == "round_robin":
        # Caller maintains RR index externally; here we simulate with random for brevity
        return healthy[int(time.time()) % len(healthy)]
    if algorithm == "least_connections":
        return min(healthy, key=lambda u: u.active_connections)
    if algorithm == "consistent_hash":
        ring_points = 150  # virtual nodes per upstream
        ring = {}
        for u in healthy:
            for i in range(ring_points):
                h = int(hashlib.md5(f"{u.id}:{i}".encode()).hexdigest(), 16)
                ring[h] = u
        if not ring:
            return None
        key_hash = int(hashlib.md5(request_key.encode()).hexdigest(), 16)
        sorted_keys = sorted(ring.keys())
        for k in sorted_keys:
            if key_hash  dict:
    import urllib.request, time as t
    url = f"http://{upstream.address}:{upstream.port}/health"
    start = t.monotonic()
    try:
        with urllib.request.urlopen(url, timeout=2) as resp:
            latency_ms = int((t.monotonic() - start) * 1000)
            return {"status": "SUCCESS" if resp.status == 200 else "FAILURE",
                    "latency_ms": latency_ms}
    except Exception:
        latency_ms = int((t.monotonic() - start) * 1000)
        return {"status": "FAILURE", "latency_ms": latency_ms}

def drain_upstream(upstream_id: int, upstreams: List[Upstream], drain_timeout_s: int = 30) -> None:
    for u in upstreams:
        if u.id == upstream_id:
            u.status = "DRAINING"
            print(f"Upstream {upstream_id} set to DRAINING; waiting {drain_timeout_s}s")
            deadline = time.monotonic() + drain_timeout_s
            while time.monotonic() < deadline:
                if u.active_connections == 0:
                    break
                time.sleep(1)
            u.status = "UNHEALTHY"
            print(f"Upstream {upstream_id} drained and marked UNHEALTHY")
            return
    print(f"Upstream {upstream_id} not found")

Frequently Asked Questions

How does consistent hashing minimize redistribution when upstreams change?

The ring assigns each upstream multiple virtual node positions. When one upstream is removed, only the keys that mapped to that upstream's virtual nodes are redistributed to the next node on the ring; all other keys remain unaffected. With 150 virtual nodes per upstream, removing one upstream redistributes approximately 1/N of keys rather than all keys.

What is connection draining and why is it necessary during deployments?

Connection draining allows in-flight requests to complete on an upstream that has been removed from the selection pool. Without draining, a rolling deployment would terminate upstream processes mid-request, causing HTTP 502 or 504 errors for users whose requests were in progress. The balancer stops routing new requests to the draining upstream but waits for active connections to close naturally.

Why is cookie-based session affinity preferred over IP-hash?

IP-hash breaks under NAT (many users share one IP) and under mobile clients that change IP between requests. A load balancer-inserted cookie is tied to the individual browser session and survives network changes, giving deterministic routing per client without depending on IP stability.

What is the difference between SSL termination and SSL passthrough?

SSL termination decrypts TLS at the load balancer and forwards plaintext to upstreams, enabling L7 routing on HTTP headers and centralizing certificate management. SSL passthrough forwards the raw encrypted stream to the upstream, which handles its own TLS, providing end-to-end encryption but restricting routing to L4 (IP/port) with no header visibility.