Load Balancing: Algorithms, Layers, and Failure Handling

A load balancer sits in front of a pool of servers and distributes incoming requests across them. It is one of the first components you add when a single server can no longer handle your traffic, and it appears in virtually every system design interview. The question is rarely “do you need a load balancer?” — it’s “which algorithm, which layer, and how do you handle failures?”

Strategy

Before describing a load balancer, make sure you’re solving the right problem:

Throughput: One server is at 100% CPU — distribute requests across more servers.
Availability: If one server crashes, traffic must automatically route to healthy ones.
Geographic distribution: Route users to the nearest data center (DNS-based load balancing, Anycast).

L4 vs. L7 Load Balancers

This is the most important distinction to get right.

L4 (Transport Layer)

Operates at the TCP/UDP level. Routes traffic based on IP address and port number. It doesn’t inspect the content of packets — it just forwards TCP connections.

Fast and low-overhead — minimal processing per packet.
Can’t make routing decisions based on content (URL path, headers, cookies).
One TCP connection from client goes to one backend for its lifetime (connection-level routing).

Use when: Raw TCP throughput, non-HTTP protocols (databases, game servers, streaming), or when you need minimal latency overhead.

L7 (Application Layer)

Operates at the HTTP/HTTPS level. Inspects request content — URL, headers, cookies, body. Can make intelligent routing decisions.

Route /api/* to API servers and /static/* to CDN or file servers.
Route based on the Host header (virtual hosting).
Terminate TLS — decrypt HTTPS once at the load balancer, forward plain HTTP to backends.
Set sticky sessions via cookies.
Higher overhead than L4 (must parse HTTP).

Use when: Web applications, APIs, microservices, anything where you need to route by URL or header. nginx, HAProxy (in HTTP mode), AWS ALB, and Cloudflare are L7 load balancers.

# nginx L7 routing example
upstream api_servers {
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
}
upstream static_servers {
    server 10.0.0.3:80;
}

server {
    location /api/ {
        proxy_pass http://api_servers;
    }
    location /static/ {
        proxy_pass http://static_servers;
    }
}

Load Balancing Algorithms

Round Robin

Requests are distributed sequentially across servers. Request 1 → Server A, Request 2 → Server B, Request 3 → Server C, Request 4 → Server A, and so on.

Pros: Simple. Works well when all servers have equal capacity and requests have similar cost.

Cons: Doesn’t account for server load. If one server is processing a slow request, it still receives the next one in rotation. Weighted round-robin fixes this: assign more capacity to more powerful servers.

Least Connections

Route each new request to the server with the fewest active connections.

Pros: Adapts to varying request durations. If one server is processing many long-running requests, it gets fewer new ones.

Cons: Requires the load balancer to track active connections — more state, slightly more overhead.

When to use: Workloads with variable request duration (long-polling, streaming, WebSockets).

IP Hash (Sticky by IP)

Hash the client’s IP address to consistently route them to the same server.

Pros: Natural sticky sessions — the same user always hits the same server (useful if in-memory session state lives on the server).

Cons: Uneven distribution if many users share an IP (corporate NAT, CDN). Doesn’t adapt when a server is overloaded. Poor choice for applications behind a proxy.

Least Response Time

Route to the server with the lowest average response time and fewest active connections combined. Most sophisticated; used by premium load balancers (HAProxy, F5).

Random

Pick a random server. Surprisingly effective at scale — the law of large numbers produces even distribution. Used by Netflix’s Ribbon client-side load balancer for inter-service calls.

Health Checks

A load balancer must know when a backend is unhealthy and stop sending traffic to it. Two types:

Passive health checks: The load balancer watches real traffic. If a backend returns 5xx errors or times out repeatedly, it’s marked unhealthy. Low overhead but slow to detect failures.

Active health checks: The load balancer periodically sends synthetic requests to a health endpoint.

# nginx active health check
upstream api_servers {
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;

    # nginx Plus (commercial) or use upstream_check module
    check interval=3000 rise=2 fall=3 timeout=1000 type=http;
    check_http_send "GET /health HTTP/1.0

";
    check_http_expect_alive http_2xx;
}

Health endpoint design: GET /health should check that the service can actually handle requests — DB connection alive, dependencies reachable — not just that the HTTP server is up.

# Good health endpoint
@app.route("/health")
def health():
    try:
        db.execute("SELECT 1")      # verify DB connection
        cache.ping()                # verify cache connection
        return {"status": "ok"}, 200
    except Exception as e:
        return {"status": "error", "detail": str(e)}, 503

Sticky Sessions

Some applications store session state on the server (in-memory). The load balancer must route a user’s requests to the same server every time.

Cookie-based stickiness: The load balancer sets a cookie identifying the backend server. On subsequent requests, it reads the cookie and routes accordingly. This is the L7 approach — AWS ALB calls this “sticky sessions.”

Problems with sticky sessions:

Uneven load — one server may accumulate many heavy users while others are idle.
Server failure breaks all sticky sessions on that server.

Better approach: Don’t store session state on the server. Use a shared session store (Redis) so any backend can handle any request. Then sticky sessions are unnecessary.

The Load Balancer as a Single Point of Failure

Interviewers will ask: “What if the load balancer itself goes down?” The answer: run two load balancers in active-passive or active-active mode.

Active-passive: One load balancer handles traffic; the other is on standby. A keepalived/VRRP protocol detects failure and promotes the passive one. The virtual IP floats to the active node.
Active-active: Both load balancers handle traffic. DNS round-robin points to both IPs. More complex but no idle capacity.
Cloud managed: AWS ELB, GCP Cloud Load Balancing, and Cloudflare are managed services with redundancy built in — no SPOF to worry about.

DNS Load Balancing

Return multiple A records for a domain. The client picks one (usually the first). Simple but crude — DNS TTLs are long, so failover is slow and you can’t control which record clients use.

Used for geographic routing (Route 53 latency-based routing, Cloudflare Anycast) and as an outer load balancing layer that routes to regional clusters, each of which has its own L7 load balancer.

Summary

Load balancers distribute traffic for throughput and availability. L4 balancers route by IP/port with minimal overhead; L7 balancers route by HTTP content and terminate TLS. Round-robin works for homogeneous workloads; least-connections adapts to variable request duration. Health checks detect failed backends quickly — design a real health endpoint, not just a ping. Eliminate sticky sessions by moving session state to Redis. Avoid making the load balancer itself a single point of failure. In a cloud environment, use managed load balancers (ELB, ALB, GCP LB) and get redundancy for free.