What is the difference between Layer 4 and Layer 7 load balancing?

Layer 4 load balancers operate at the TCP/UDP level—they see IP addresses and ports but not HTTP content. They are faster because they do no content parsing. Layer 7 load balancers operate at the HTTP/HTTPS level and can route based on URL path, headers, cookies, and body content, enabling advanced patterns like path-based routing, A/B testing, and SSL termination. Use L4 (AWS NLB) for raw throughput; use L7 (AWS ALB, NGINX) when content-aware routing is needed.

How does consistent hashing work in load balancing?

Consistent hashing maps both servers and requests to positions on a virtual ring using a hash function. Each request routes to the nearest server clockwise on the ring. When a server is added or removed, only ~1/n of requests are remapped (vs all requests with modular hashing). Virtual nodes (each server mapped to 150+ ring positions) ensure uniform distribution even with few servers.

How do you achieve high availability for a load balancer itself?

Run load balancers in active-passive or active-active pairs with a floating virtual IP (VIP). In active-passive mode, the standby monitors the primary via heartbeat (VRRP protocol) and takes over the VIP within seconds if the primary fails. In active-active mode, both handle traffic simultaneously with shared state sync, eliminating failover delay. At the DNS level, use health-check-based GeoDNS to route around entire datacenter failures.

System Design Interview: Design a Load Balancer

⏱ 4 min read

System Design: Load Balancer — How to Design Traffic Distribution at Scale

Load balancers are one of the most fundamental infrastructure components you’ll design in system design interviews. They distribute incoming traffic across multiple servers to maximize throughput, minimize latency, and eliminate single points of failure.

What a Load Balancer Does

Traffic distribution — spreads requests across healthy backend servers
Health checking — detects and routes around failed servers
SSL termination — decrypts HTTPS at the LB, passes HTTP to backends
Session persistence — sticky sessions route same client to same server
DDoS protection — absorbs traffic spikes, rate-limits abusive IPs

Layer 4 vs Layer 7 Load Balancing

Dimension	Layer 4 (Transport)	Layer 7 (Application)
Operates on	TCP/UDP packets	HTTP/HTTPS requests
Content awareness	No — sees IP/port only	Yes — sees URL, headers, cookies
Performance	Faster (no parsing)	Slower (full HTTP parsing)
Routing flexibility	Low	High (path-based, header-based)
Example tools	AWS NLB, HAProxy TCP	AWS ALB, NGINX, Envoy

Load Balancing Algorithms

Round Robin

Requests rotate sequentially across servers. Simple, works well when servers are homogeneous. Weighted round robin assigns more requests to higher-capacity servers.

Least Connections

Routes to the server with fewest active connections. Better than round robin for long-lived connections (WebSockets, file uploads). Least Response Time adds latency measurement.

Consistent Hashing

Maps requests to servers via hash ring. Same client (IP or session ID) always routes to same server unless that server fails. Minimizes cache invalidation when servers are added/removed. Used by distributed caches (Memcached clusters, DynamoDB partitioning).

import hashlib

class ConsistentHashRing:
    def __init__(self, servers, virtual_nodes=150):
        self.ring = {}
        self.sorted_keys = []
        for server in servers:
            for i in range(virtual_nodes):
                key = self._hash(f"{server}-{i}")
                self.ring[key] = server
                self.sorted_keys.append(key)
        self.sorted_keys.sort()

    def _hash(self, s):
        return int(hashlib.md5(s.encode()).hexdigest(), 16)

    def get_server(self, request_key):
        h = self._hash(request_key)
        for k in self.sorted_keys:
            if h <= k:
                return self.ring[k]
        return self.ring[self.sorted_keys[0]]  # wrap around

Health Checks

Load balancers detect unhealthy backends via:

TCP health check — can the server accept a connection?
HTTP health check — does GET /health return 200?
Application health check — does /health verify DB connectivity, cache, dependencies?

Typical config: check every 5s, mark unhealthy after 2 failures, restore after 3 successes. Circuit breaker pattern adds retry budgets and exponential backoff.

High Availability Architecture

Internet
    │
┌───▼───────────────────────────────────────┐
│  DNS (Route 53 / Cloudflare)              │
│  GeoDNS → nearest PoP                    │
└───┬───────────────────────────────────────┘
    │
┌───▼───────────────────────────────────────┐
│  Edge / CDN layer (static, caching)       │
└───┬───────────────────────────────────────┘
    │
┌───▼──────────────┐   ┌────────────────────┐
│  LB Primary      │──▶│  LB Standby        │
│  (active)        │   │  (heartbeat/VRRP)  │
└───┬──────────────┘   └────────────────────┘
    │
    ├──▶ App Server 1
    ├──▶ App Server 2
    └──▶ App Server N

Active-passive: standby takes over if primary fails (30-60s failover). Active-active: both LBs handle traffic simultaneously (no failover gap, requires session sync).

Global Load Balancing

GeoDNS — routes clients to nearest data center based on IP geolocation
Anycast — same IP advertised from multiple PoPs; BGP routes to nearest (used by Cloudflare, AWS Global Accelerator)
Latency-based routing — measure actual latency to each region, route to lowest

Interview Design Questions

“Design a load balancer that handles 1M RPS” — focus on horizontal scaling, consistent hashing, health checks
“How do you handle sticky sessions without a centralized session store?” — consistent hashing by session ID
“How does AWS ALB differ from NLB?” — Layer 7 vs Layer 4, use cases
“What happens when a backend goes down mid-request?” — connection draining (graceful shutdown), circuit breaker

Key Metrics to Monitor

Requests per second (RPS) per backend
Active connections per backend
P50/P95/P99 latency
Error rate (5xx responses)
Health check failure rate

Shopify Interview Guide

Airbnb Interview Guide

Twitter Interview Guide

Uber Interview Guide

Netflix Interview Guide

Cloudflare Interview Guide