Load balancers distribute incoming traffic across multiple servers to ensure no single server becomes a bottleneck. They are a fundamental component of every scalable web architecture. This guide covers load balancer internals — Layer 4 vs Layer 7, routing algorithms, health checking, SSL termination, and production deployment patterns — essential knowledge for system design interviews and infrastructure engineering.
Layer 4 vs Layer 7 Load Balancing
Layer 4 (transport layer) load balancers route traffic based on IP address and TCP/UDP port without inspecting the application payload. They forward raw TCP connections to backend servers. The load balancer sees the source IP, destination IP, source port, and destination port — nothing about HTTP headers, URLs, or cookies. Advantages: extremely fast (no payload parsing), protocol-agnostic (works with any TCP/UDP application). Disadvantages: cannot make routing decisions based on content (no URL-based routing, no cookie-based session affinity). Examples: AWS NLB, Linux IPVS, F5 BIG-IP in L4 mode. Layer 7 (application layer) load balancers inspect the HTTP request and make routing decisions based on URL path, host header, HTTP method, headers, cookies, and request body. Advantages: content-based routing (/api -> api-servers, /static -> CDN), cookie-based session affinity, HTTP/2 multiplexing, request/response modification, SSL termination, and Web Application Firewall (WAF) integration. Disadvantages: higher latency (must parse HTTP), higher resource usage (TLS termination is CPU-intensive). Examples: AWS ALB, Nginx, HAProxy, Envoy.
Load Balancing Algorithms
Common algorithms: (1) Round Robin — distribute requests sequentially: server 1, server 2, server 3, server 1, … Simple and effective when all servers have equal capacity and all requests have similar cost. (2) Weighted Round Robin — assign weights proportional to server capacity. A server with weight 3 receives 3x the requests of a server with weight 1. Use when servers have different hardware specs. (3) Least Connections — route to the server with the fewest active connections. Better than round robin when request durations vary widely (a slow request keeps a connection open longer). (4) Least Response Time — route to the server with the lowest average response time and fewest active connections. Adapts to server performance in real-time. (5) IP Hash — hash the client IP to determine the server. The same client always reaches the same server (sticky sessions without cookies). Downside: uneven distribution if client IPs are not uniformly distributed (NAT, corporate proxies). (6) Consistent Hashing — similar to IP hash but minimizes redistribution when servers are added or removed. Used by CDNs and distributed caches. For most web applications, Least Connections is the best default — it naturally adapts to varying request costs and server speeds.
Health Checks and Failover
Health checks determine whether a backend server is capable of receiving traffic. Without health checks, the load balancer sends requests to dead servers, causing errors. Types: (1) TCP health check — attempt a TCP connection to the server port. If the connection succeeds, the server is healthy. Fast but only verifies the port is open, not that the application is functioning. (2) HTTP health check — send an HTTP GET request to a health endpoint (e.g., /health). The server returns 200 if healthy. The health endpoint should verify dependencies: database connectivity, cache connectivity, sufficient disk space. If any dependency is unhealthy, return 503. (3) gRPC health check — use the standard gRPC health checking protocol (grpc.health.v1.Health). Health check parameters: interval (how often to check — typically every 5-10 seconds), timeout (how long to wait for a response — typically 2-3 seconds), unhealthy threshold (consecutive failures before marking unhealthy — typically 3), healthy threshold (consecutive successes before marking healthy again — typically 2). The healthy threshold prevents flapping: a server that passes one check after failing should not immediately receive full traffic.
SSL/TLS Termination
SSL termination decrypts HTTPS traffic at the load balancer and forwards plain HTTP to backend servers. Benefits: (1) Certificate management — manage TLS certificates in one place (the load balancer) instead of on every backend server. (2) CPU offload — TLS handshakes and encryption/decryption are CPU-intensive. The load balancer handles this, freeing backend servers to process application logic. Modern load balancers use hardware TLS acceleration. (3) HTTP inspection — after decryption, the load balancer can inspect HTTP content for routing decisions. (4) HTTP/2 to HTTP/1.1 translation — terminate HTTP/2 at the load balancer and proxy HTTP/1.1 to backends that do not support HTTP/2. Security consideration: traffic between the load balancer and backend servers is unencrypted (plain HTTP). In a trusted network (same VPC, same datacenter), this is acceptable. For zero-trust environments, use SSL re-encryption: the load balancer decrypts, inspects, and re-encrypts before forwarding. Or use SSL passthrough: the load balancer forwards the encrypted traffic without decryption (but loses the ability to inspect HTTP content). AWS ALB supports SSL termination by default. Certificates are managed through AWS Certificate Manager (ACM) at no additional cost.
Nginx and HAProxy in Production
Nginx: originally a web server, now the most popular reverse proxy and load balancer. Event-driven architecture handles thousands of concurrent connections with minimal memory. Configuration: define upstream blocks with backend servers and proxy_pass in location blocks. Supports round robin, least connections (least_conn), IP hash (ip_hash), and consistent hashing. Dynamic upstream management with the commercial Nginx Plus (adds health checks, active monitoring, and API-driven configuration). Open-source Nginx relies on passive health checks (mark a server as failed after N errors). HAProxy: purpose-built for high-performance load balancing. Supports both L4 and L7 modes. Advanced health checking (HTTP, TCP, custom scripts), connection queuing, rate limiting, and request buffering. HAProxy stats page provides real-time metrics (requests per second, error rates, connection counts per backend). Configuration is declarative: define frontends (incoming traffic), backends (server pools), and ACLs (routing rules). HAProxy is the default choice for high-throughput, low-latency load balancing. In Kubernetes environments, Ingress controllers (nginx-ingress, HAProxy Ingress) provide load balancing functionality configured via Kubernetes Ingress or Gateway API resources.
Global Server Load Balancing (GSLB)
GSLB distributes traffic across multiple geographic regions. Implementation: DNS-based routing. When a user queries your domain, the DNS server returns the IP address of the nearest (or healthiest) datacenter. Methods: (1) GeoDNS — return different IP addresses based on the client geographic location (derived from the DNS resolver IP). A user in Europe resolves to the EU datacenter; a user in Asia resolves to the APAC datacenter. AWS Route 53 geolocation routing, Cloudflare load balancing. (2) Latency-based routing — return the IP of the datacenter with the lowest latency to the client. AWS Route 53 measures latency from each region and routes accordingly. (3) Weighted routing — distribute a percentage of traffic to each datacenter. Use for gradual migration or A/B testing across regions. (4) Failover routing — return the primary datacenter IP. If the primary health check fails, return the secondary datacenter IP. Recovery: when the primary recovers, traffic gradually shifts back (considering DNS TTL propagation). DNS TTL tradeoff: short TTL (30 seconds) enables fast failover but increases DNS query volume. Long TTL (5 minutes) reduces DNS load but delays failover. A 60-second TTL is a common compromise for production systems.