Design a Content Delivery Network like Cloudflare, Akamai, or AWS CloudFront. A CDN is infrastructure that most large-scale systems depend on — understanding how it works internally is expected knowledge at staff-level system design interviews, and it directly impacts the architecture decisions you make for every service you build.
Requirements Clarification
- Content types: Static assets (images, JS, CSS, fonts), video (streaming), software downloads, API responses (dynamic caching).
- Scale: Cloudflare serves ~50 million HTTP requests/sec globally. Akamai has ~340,000 servers in 4,000+ PoPs.
- Goals: Reduce latency (serve from edge close to user), reduce origin load (cache hits don’t hit origin), improve availability (serve cached content even if origin is down).
- Consistency: Eventual — cached content may be stale by up to TTL seconds. This is acceptable for static assets; dynamic content requires shorter TTLs or cache invalidation.
Why CDNs Work: Physics
Speed of light in fiber: ~200,000 km/sec. New York to London (5,500km): minimum round-trip time ≈ 55ms, just from physics. With TCP handshake (1 RTT) + TLS (1–2 RTTs) + request + response: 150–300ms for the first byte.
A CDN PoP in Frankfurt serving European users: <5ms RTT. The CDN absorbs 90%+ of traffic at the edge so origin servers in US data centers only handle cache misses.
Architecture: Points of Presence (PoPs)
User (London)
↓ DNS resolves cdn.example.com → nearest PoP IP
London PoP (Edge Server)
↓ Cache miss
Regional Cache (Frankfurt)
↓ Cache miss
Origin Server (US-East)
Each PoP has dozens to hundreds of servers with large SSD/NVMe caches. The tiered hierarchy means a cache miss at the edge hits the regional cache first — reducing origin load further.
Request Routing: How Users Reach the Right PoP
DNS-based routing: The CDN’s authoritative DNS server returns different IP addresses based on the resolver’s location. A user in Tokyo gets a Tokyo PoP IP; a user in Berlin gets a Frankfurt PoP IP. Latency-based routing uses real-time RTT measurements between PoPs and resolver clusters to pick the closest PoP dynamically.
Anycast routing: The CDN advertises the same IP prefix from every PoP using BGP. The internet’s routing protocol naturally sends traffic to the closest (by BGP path) PoP. No DNS manipulation needed. All Cloudflare PoPs use the same 1.1.1.1 / 104.x.x.x IP ranges — BGP handles routing. Anycast also provides automatic failover: if a PoP goes down, BGP re-routes to the next closest.
Edge Server: Cache Miss Flow
- Check in-memory cache (LRU, ~100GB RAM) → hit: return immediately, <1ms
- Check SSD cache (~10TB) → hit: return in <5ms
- Forward request to regional cache → hit: return, cache locally
- Forward to origin → receive response, cache at all levels, return to user
Cache key: URL + relevant request headers (Accept-Encoding, Accept-Language for localized content). Vary header from origin controls which headers are part of the cache key.
Pull CDN vs Push CDN
Pull CDN (most common): Edge nodes fetch content from origin on cache miss. You don’t do anything — just set cache headers on your origin responses. Cache warms up automatically as users request content. Good for long-tail content (many URLs, each requested rarely).
Push CDN: You proactively upload content to CDN storage. Edge serves from CDN storage, never hits origin. Better for large, predictable assets (software releases, game patches, video files). Content is always warm — no cold-start latency spike on first request. Requires explicit CDN upload workflow.
Most production systems use pull CDN for the web tier and push CDN (or origin shield) for large media files.
Cache Headers: Controlling CDN Behavior
# Cache for 1 year (immutable — content-addressed URL with hash)
Cache-Control: public, max-age=31536000, immutable
# Use for: JS/CSS bundles with content hash in filename (/app.3f9a2b.js)
# Cache for 5 minutes, allow stale for 1 hour while revalidating
Cache-Control: public, max-age=300, stale-while-revalidate=3600
# Use for: homepage HTML, product listings
# Don't cache (user-specific or real-time data)
Cache-Control: private, no-store
# Use for: account pages, payment flows, API endpoints with auth
# CDN-specific header (Cloudflare, Akamai)
Surrogate-Control: max-age=86400 # CDN cache TTL independent of browser TTL
Cache Invalidation
Two strategies:
Version URLs (best): Embed a content hash or version in the URL. When content changes, the URL changes. Old URL stays cached indefinitely; new URL starts fresh. Zero invalidation required. Example: /static/app.a3f9b2c1.js. Webpack, Vite, and Rails asset pipeline do this automatically.
Explicit purge: Call the CDN API to invalidate specific URLs or cache tags. Necessary for content without versioned URLs (HTML pages, non-versioned assets). Cloudflare’s Cache-Tag purge lets you tag all cache entries for a product page and purge them atomically when the product updates. Most CDN purge APIs propagate globally within seconds.
# Cloudflare Cache-Tag purge (tag-based invalidation)
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache"
-H "Authorization: Bearer {token}"
-d '{"tags":["product-123", "category-shoes"]}'
Video Streaming via CDN
Video files (gigabytes) are delivered using HTTP range requests — the client requests byte ranges as it buffers. CDN serves ranges from the cached file without downloading the whole thing first.
HLS (HTTP Live Streaming) splits video into 6–10 second segments (.ts files) referenced by a manifest (.m3u8). The client fetches the manifest, then fetches segments on demand. Each segment is a small, independently cacheable file — CDN caches them like any other asset. Adaptive bitrate (ABR) works by having multiple quality-level manifests; the client switches between them based on available bandwidth.
Security at the Edge
TLS termination: HTTPS connections terminate at the CDN edge. The CDN re-encrypts to origin (or uses an internal private network). This offloads TLS handshake computation from origin servers and enables the CDN to inspect and filter HTTP content.
DDoS protection: CDN edge absorbs volumetric attacks (Gbps of traffic) by rate-limiting, IP reputation filtering, and challenge pages (CAPTCHAs, proof-of-work). The distributed nature of CDN means a 100Gbps DDoS is spread across 4,000 PoPs — each PoP absorbs 25Mbps, easily handled.
WAF (Web Application Firewall): CDN edge inspects HTTP requests for OWASP Top 10 patterns (SQL injection, XSS). Cloudflare, Akamai, and AWS WAF run at edge, filtering malicious requests before they reach origin.
CDN for APIs (Dynamic Caching)
CDNs aren’t just for static content. API responses can be cached at the edge with short TTLs:
- Product catalog: 60-second TTL. Prices update infrequently; serve slightly stale data.
- Search results: 30-second TTL per query string.
- User-specific data: Cache-Control: private (bypasses CDN, goes to origin).
Edge computing (Cloudflare Workers, Lambda@Edge): run code at CDN edge nodes. Personalize responses (inject user-specific data into cached HTML), A/B test at edge, authenticate JWT tokens without origin round-trip.
Interview Follow-ups
- How does the CDN handle cache stampede — a highly popular cache entry expiring and thousands of simultaneous cache misses flooding origin?
- How would you design a CDN for a live streaming event with 10 million concurrent viewers?
- How does CDN handle HTTPS with custom domains? What’s involved in provisioning TLS certificates at 340,000 edge servers?
- How do you build a CDN that can serve content within 10ms for 99% of global users?
- How would you implement request coalescing — collapsing many simultaneous cache misses for the same URL into a single origin request?
Related System Design Topics
- Caching Strategies — CDN edge caches are the outermost tier; the same TTL, invalidation, and stale-while-revalidate patterns apply
- Load Balancing — anycast BGP routing is a form of network-layer load balancing; CDN PoPs also use L7 load balancers internally
- Design YouTube / Video Streaming Platform — CDN is the delivery layer for video segments; HLS/DASH over CDN is the standard architecture
- Design Google Maps / Navigation System — map tile serving at scale is a CDN problem: immutable tiles with long TTLs, served from edge PoPs