A Content Delivery Network (CDN) caches content at edge servers distributed globally — close to users — to reduce latency and origin server load. When a user in Tokyo requests a file hosted on a US server, the CDN serves it from a Tokyo edge node in ~5ms instead of ~150ms from the US. Cloudflare, Akamai, and AWS CloudFront are major CDNs. Understanding how a CDN works internally — routing, caching, cache invalidation, and origin shielding — is essential for system design interviews.
Anycast Routing
CDNs use anycast IP routing: the same IP address is announced from multiple geographic locations simultaneously. The internet’s BGP routing protocol naturally routes each user’s request to the topologically nearest announcement of that IP — usually the nearest CDN edge data center. No DNS-based routing is needed — the routing happens at the network layer. The user’s browser sends a TCP SYN to the CDN’s IP; BGP routes it to the nearest edge. Advantages over DNS-based routing: (1) No DNS TTL delay — failover happens at network speed, not DNS propagation speed. (2) Sub-millisecond routing to the nearest edge. (3) DDoS mitigation — large volumetric DDoS attacks are absorbed across all edge locations rather than concentrating at one. Cloudflare’s anycast network covers 300+ cities — most users are within 50ms of an edge node. DNS-based routing (used by Route 53 geolocation routing) assigns different DNS responses by user location, directing users to the closest CDN cluster. Less efficient than anycast but works without BGP announcements.
Edge Caching and Cache Keys
When a request reaches an edge node: (1) Compute the cache key: typically hash(URL + Vary headers). The Vary HTTP response header specifies which request headers affect the response (Vary: Accept-Encoding means the compressed and uncompressed versions are cached separately). (2) Check the local cache (in-memory LRU + disk). Cache hit: return the cached response immediately. Cache miss: fetch from origin (or parent cache), store locally, return to client. Cache TTL: determined by Cache-Control: max-age=3600 (cache for 1 hour) or Cache-Control: s-maxage=86400 (cache at CDN for 24 hours, browser for max-age). CDN-specific headers: Cloudflare honors Cache-Control: s-maxage for CDN TTL independently of browser TTL. Tiered caching (two-level cache): edge nodes (PoPs in each city) → regional parent caches (one per continent) → origin. On a cache miss at the edge: check the regional parent before hitting the origin. Reduces origin load by 90%+ for assets with moderate traffic.
Cache Invalidation
Cached content must be invalidated when origin content changes. Strategies: (1) URL versioning (cache-busting): change the URL when content changes (style.css?v=abc123 or style.abc123.css). The CDN treats the new URL as a never-cached resource — immediate cache miss on the new URL. Simple and reliable; no purge API calls needed. Use content-hashed URLs for static assets. (2) Purge API: CDNs provide a purge endpoint to immediately evict specific URLs or URL patterns from all edge caches. Cloudflare Cache Purge: POST /client/v4/zones/{zone_id}/purge_cache with {files: [“https://example.com/image.jpg”]} or {tags: [“product-images”]}. Propagates globally within 150ms on Cloudflare. Use after content updates when URL versioning isn’t feasible (HTML pages, API responses). (3) Surrogate keys (cache tags): tag cached responses with logical keys (Cache-Tag: product-123). Purge by tag: evict all responses tagged product-123 across all edge nodes simultaneously. Varnish and Fastly use this for granular invalidation without knowing all cached URLs. (4) Short TTL: for content that changes frequently, use short TTLs (60-300 seconds). Accept minor staleness rather than complex invalidation logic.
Origin Shielding
Without origin shielding, a cache miss at any of the CDN’s 300+ edge nodes results in a request to the origin. For a popular file with a 1-hour TTL: on cache expiry, hundreds of edge nodes simultaneously send cache-fill requests to the origin (thundering herd). Origin shielding: designate a single “shield” data center (typically geographically close to the origin) as the only location that talks to the origin. All other edge nodes go to the shield on a cache miss — the shield checks its own cache, and only the shield fetches from the origin. This reduces origin requests by 99%+ for globally popular assets. Collapse (request coalescing): if 1000 concurrent requests arrive at the shield for the same uncached URL, only one goes to the origin — the other 999 wait for the first to complete and then receive the cached response. This prevents the thundering herd even at the shield level. Implementation: hold the waiting requests in a queue keyed by the cache key; on origin response, fulfill all queued requests and populate the cache.
Dynamic Content and Edge Computing
CDNs traditionally cache static content; dynamic responses (personalized pages, API responses) can’t be cached by URL alone. Approaches: (1) Micro-caching: cache dynamic responses for 1-5 seconds. Even a 1-second cache dramatically reduces origin load for high-traffic APIs (1000 requests/second → 1 origin request/second per edge). Acceptable for near-real-time data that’s not user-specific. (2) Edge computing (Cloudflare Workers, Lambda@Edge): run JavaScript or WebAssembly code at the edge node. The code can: fetch user data from a KV store at the edge (Cloudflare KV), generate personalized responses without hitting the origin, perform A/B testing and feature flag evaluation at the edge, handle authentication (validate JWT) before proxying to origin. Edge computing moves compute closer to users — personalized responses with <10ms latency. (3) Stale-while-revalidate: serve the stale cached response immediately while asynchronously refreshing it from the origin in the background. The next request gets the fresh content. Reduces user-perceived latency at the cost of occasionally serving slightly stale content.