What is the difference between L4 and L7 load balancing?

L4 load balancers operate at the transport layer (TCP/UDP). They route based on IP address and port without inspecting the payload. They are faster because there is no application-layer parsing. L7 load balancers operate at the application layer (HTTP/HTTPS). They can route based on URL path, host header, cookies, and request body. L7 enables features like SSL termination, sticky sessions via cookies, and content-based routing.

How does least connections load balancing work?

The load balancer tracks the active connection count for each backend server. Each new request is routed to the server with the fewest active connections. This distributes load more evenly than round robin when requests have varying duration. Weighted least connections adjusts for heterogeneous backends by dividing connection count by server weight.

How do health checks work in a load balancer?

Active health checks: the load balancer periodically sends probe requests (TCP connect, HTTP GET to /health) to each backend. If a backend fails N consecutive probes, it is removed from the pool. Passive health checks: the load balancer monitors actual request outcomes and removes backends that return too many errors or timeouts. Backends are re-added after passing a configurable number of health check successes.

What is Direct Server Return (DSR) and why is it used?

In DSR mode, the load balancer forwards the request to the backend server while preserving the client's destination IP. The backend processes the request and responds directly to the client, bypassing the load balancer for the response path. This dramatically increases throughput for response-heavy workloads (video streaming) since the load balancer only handles inbound traffic. DSR requires the backend to share the load balancer's virtual IP on a loopback interface.

How does a load balancer handle SSL termination?

The load balancer terminates TLS: it holds the SSL certificate and private key, decrypts the client connection, and forwards plain HTTP to the backends. This offloads TLS handshake CPU cost from backends. The backend may receive the original client IP via X-Forwarded-For and the original protocol via X-Forwarded-Proto headers. For end-to-end encryption, SSL passthrough (L4) or backend re-encryption is used.

Low Level Design: Load Balancer Internals

⏱ 7 min read

A load balancer distributes incoming traffic across multiple backend instances to maximize throughput, minimize latency, and avoid overloading any single server. Understanding load balancer internals — from the OSI layer at which they operate to the algorithms they use and the mechanisms that keep backends healthy — is a core topic in system design interviews.

L4 vs L7 Load Balancing

Load balancers are classified by the OSI layer at which they make routing decisions:

L4 (Transport Layer): Routes based on IP address and TCP/UDP port. The load balancer sees the TCP connection but not the HTTP content. It’s fast (low overhead, no TLS termination required), handles any TCP/UDP protocol, and is suitable for raw throughput. AWS Network Load Balancer (NLB) operates at L4. Cannot route based on URL path, HTTP headers, or cookie values.
L7 (Application Layer): Routes based on HTTP headers, URL paths, cookies, query parameters, or request body content. Can terminate TLS (offloading crypto from backends), rewrite URLs, add/remove headers, and route /api/* to one pool and /static/* to another. AWS Application Load Balancer (ALB), Nginx, and HAProxy in HTTP mode operate at L7. Higher CPU overhead than L4 due to full HTTP parsing.

Load Balancing Algorithms

The algorithm determines which backend instance receives each request:

Round Robin: Requests distributed sequentially across backends. Simple, stateless, works well when all backends have equal capacity and request cost is uniform.
Weighted Round Robin: Each backend has a weight proportional to its capacity. A backend with weight 3 receives 3x the traffic of one with weight 1. Handles heterogeneous backends (different instance types).
Least Connections: Routes new requests to the backend with the fewest active connections. Better than round robin when request duration varies significantly (e.g., mix of fast and slow queries).
Least Response Time: Routes to the backend with the lowest combination of active connections and average response time. More adaptive than least connections; requires the load balancer to track response latency per backend.
IP Hash: Hashes the client IP to select a backend. Provides session affinity (same client always hits same backend) without cookies. Brittle — reshuffles all mappings when backends are added or removed.
Consistent Hash: Uses a hash ring so adding or removing a backend only remaps a fraction of clients. Critical for stateful backends (caches, session stores) where you want to minimize cache misses on topology changes. Commonly used with memcached/Redis clusters.

Health Checks

The load balancer must know which backends are healthy before routing to them. Two approaches:

Active health checks: The load balancer periodically sends probe requests to each backend — TCP connect (just checks the port is open) or HTTP GET to a /health endpoint (checks application-level health). If N consecutive probes fail, the backend is marked unhealthy and removed from rotation. When probes succeed again, it’s re-added.
Passive health checks (outlier detection): The load balancer monitors real traffic responses. If a backend returns 5xx errors or times out above a threshold, it’s ejected from the pool. Lower overhead (no extra probe traffic) but slower to detect failure (requires real requests to fail first).

In production, both are typically combined: active checks for fast detection of total failure, passive checks for graceful degradation under partial failure.

Session Persistence (Sticky Sessions)

Some applications store session state locally on the backend server. Session persistence ensures a client always hits the same backend for the duration of its session:

Cookie-based stickiness: The load balancer sets a cookie (e.g., AWSALB) on the first response identifying which backend handled the request. Subsequent requests from the same client include the cookie, and the load balancer routes accordingly. Works at L7 only.
IP hash stickiness: Hash the client IP to select a backend deterministically. Works at L4. Breaks under NAT (many clients share one IP) or CGNAT.

Sticky sessions are an antipattern for scalability — they cause uneven load distribution and complicate rolling deployments. The preferred solution is to externalize session state to a shared store (Redis, DynamoDB) so any backend can handle any request.

Connection Pooling

Opening a TCP connection (and TLS handshake) for every request is expensive. L7 load balancers maintain persistent connection pools to upstream backends, reusing established connections across multiple client requests. This decouples frontend connections (many, short-lived) from backend connections (fewer, long-lived).

HAProxy’s connection pool settings include maxconn per backend server and queue for requests waiting for a free connection slot. Nginx upstream uses keepalive to maintain idle connections to upstreams.

Anycast and Geographic Load Balancing

Anycast assigns the same IP address to multiple data centers globally. BGP routing directs each client to the topologically nearest data center. Used by CDNs (Cloudflare, Fastly) and DNS providers for global load balancing with no application-layer overhead. Failover is handled at the routing layer when a data center withdraws its BGP announcement.

ECMP (Equal-Cost Multipath)

ECMP is a routing technique where multiple paths of equal cost to a destination exist simultaneously, and traffic is distributed across them. Used inside data centers to load balance traffic across multiple spine switches or to multiple next-hop routers. The hash function for path selection typically uses a 5-tuple (source IP, destination IP, source port, destination port, protocol) to keep flows on a single path (avoiding TCP reordering).

HAProxy vs Nginx vs AWS ALB/NLB

HAProxy: Purpose-built load balancer and proxy. Extremely high performance (event-driven, single-threaded model). Rich ACL-based routing, detailed statistics, fine-grained connection and timeout control. The reference implementation for software load balancers. Config is declarative (haproxy.cfg).
Nginx: Web server that also functions as a reverse proxy and load balancer. More versatile (serves static files, handles SSL, rewrites URLs), slightly less tuned for pure load balancing than HAProxy. Widely used due to familiarity. nginx.conf upstream blocks configure backend pools.
AWS ALB: Managed L7 load balancer. Native integration with EC2, ECS, Lambda targets. Supports path-based and host-based routing, WebSocket, HTTP/2, WAF integration. Auto-scales transparently. No infrastructure to manage.
AWS NLB: Managed L4 load balancer. Handles millions of requests per second with ultra-low latency. Supports static IPs (important for firewall whitelisting), TLS termination (optional), and preserves source IP to backends.

DSR (Direct Server Return)

In standard NAT-based load balancing, both inbound and outbound traffic flows through the load balancer, which can become a bottleneck for high-throughput responses (e.g., video streaming, large file downloads).

With Direct Server Return, inbound requests go through the load balancer (which rewrites the destination MAC address, not the IP), but outbound responses go directly from the backend to the client, bypassing the load balancer entirely. This requires backends to have the load balancer’s VIP configured as a loopback address (so they accept the packets) and requires L2 adjacency (all backends on the same subnet). DSR is used by high-throughput L4 load balancers (LVS/IPVS, some hardware load balancers) but is operationally complex.

Interview Checklist

Explain L4 vs L7 load balancing: what each layer can and cannot do, and when to choose each.
Walk through at least 4 load balancing algorithms: round robin, weighted round robin, least connections, consistent hash.
Explain why consistent hash is preferred for stateful backends and how it differs from IP hash.
Describe active vs passive health checks and why both are used together.
Explain sticky sessions, the two mechanisms, and why externalizing session state is the better approach.
Know what connection pooling does and why it matters at L7.
Explain DSR at a high level — why it exists and what the tradeoff is.
Know when to pick HAProxy vs Nginx vs AWS ALB vs NLB.