Content Delivery Network (CDN) System Design

What a CDN Does

A Content Delivery Network serves content from geographically close edge nodes, dramatically reducing latency. A request from New York to a London origin server might take 200ms round-trip. The same request served from a local edge node in New York takes 20ms. Beyond latency, CDNs offload the origin server – a popular asset cached at edge nodes serves millions of requests without touching the origin.

Edge Node Architecture

CDNs deploy Points of Presence (PoPs) in major cities worldwide. Each PoP contains dozens of servers with large SSD caches, typically 10-100TB of storage per location. PoPs connect to local Internet Service Providers via peering agreements, allowing traffic to stay on fast local networks rather than traversing the public internet backbone. Cloudflare operates 310+ PoPs; Akamai operates 4,000+.

Cache Hierarchy

CDNs use a three-tier cache hierarchy:

  • L1 – Edge cache: The PoP closest to the user. First stop for every request.
  • L2 – Regional cache / mid-tier: A larger cache serving a region of multiple L1 PoPs. When an L1 misses, it checks L2 before going to origin.
  • L3 – Origin server: The actual application server. Only hit when both L1 and L2 miss.

The L2 mid-tier is critical at scale. Without it, every L1 miss hits the origin directly. With hundreds of PoPs, a cache miss for a moderately popular asset could generate thousands of concurrent origin requests (a “thundering herd”). The L2 absorbs most of these misses, reducing origin connections from thousands to tens.

Cache Key Design

The cache key determines whether a request hits an existing cached entry. A naive cache key is just the URL. In practice:

  • URL normalization: example.com/image.jpg and example.com/image.jpg? should map to the same key. Normalize before hashing.
  • Vary header: If the origin returns Vary: Accept-Encoding, cache separate copies for gzip vs brotli vs uncompressed. The Vary header fields become part of the cache key.
  • Query parameters: Include relevant query params, exclude tracking params like utm_source. Most CDNs let you configure which params affect the cache key.
  • Host header: Always included – different domains on the same CDN should have separate caches.

Cache Invalidation

Three mechanisms to control cache freshness:

  • TTL-based expiration: Origin sets Cache-Control: max-age=3600. CDN caches the response for 1 hour, then re-fetches. Simple but stale content can persist until TTL expires.
  • Purge API: CDN exposes an API to immediately invalidate specific URLs or URL patterns. Use this when you publish updated content and cannot wait for TTL. Purge propagates across all PoPs, typically within seconds.
  • Surrogate keys / cache tags: Tag cached responses with logical group names (e.g., Surrogate-Key: product-123 category-shoes). Purge all entries with a given tag in one API call. Useful for CMS deployments where one content update affects hundreds of URLs.

Content Routing

Getting users to the nearest PoP requires intelligent routing:

  • Anycast DNS: The CDN’s nameservers return the IP of the closest PoP based on the resolver’s location. The user’s DNS resolver (often geographically close to the user) gets a different answer than a resolver in another country.
  • BGP anycast: Multiple PoPs advertise the same IP prefix via BGP. Internet routers automatically send traffic to the topologically nearest PoP. Used by Cloudflare for their anycast network.
  • GeoDNS: DNS responses vary by the geographic region of the requesting resolver. More explicit than BGP anycast – specific IP ranges map to specific PoPs.

Dynamic Content

Dynamic content (personalized pages, API responses) is typically non-cacheable. CDNs still add value for dynamic content:

  • TCP connection optimization: Edge nodes maintain persistent connections (connection pools) to the origin. User-to-edge TCP handshake is fast (nearby); edge-to-origin connection is already established.
  • TLS termination at edge: TLS handshake happens between user and edge node (low latency). The edge-to-origin connection may use a pre-established TLS tunnel. Eliminates multiple round-trips for TLS negotiation.
  • Protocol optimization: CDNs use HTTP/2 or HTTP/3 between user and edge even if origin only supports HTTP/1.1.

Origin Shield

Origin shield is a designated mid-tier node that acts as the single point of contact with the origin server. Configuration:

  • All edge cache misses route to the origin shield rather than directly to origin.
  • Origin shield has its own large cache. Many requests that miss at the edge hit the shield cache.
  • Only shield-cache misses reach the actual origin server.

Result: Instead of 300 PoPs each making independent requests to origin on a cache miss, only the single origin shield node contacts origin. Origin connections drop from hundreds to single digits for most traffic patterns. Trade-off: adds one extra network hop for cache misses routed through the shield.

Cache Hit Rate

Cache hit rate is the primary performance metric for CDN efficiency:

  • Static assets (images, CSS, JS): Target 90%+ hit rate. These rarely change and have long TTLs.
  • Semi-dynamic content (product pages, articles): Target 40-60%. Shorter TTLs or surrogate-key invalidation keeps content fresh.
  • Fully dynamic (user-specific, real-time): 0% cache – served from origin through CDN for network benefits only.

Factors that increase hit rate: longer TTL, broader cache key scope (fewer variations), popular content (power-law distribution means top 1% of content gets 90% of requests), larger cache size at edge.

Scale Numbers

Reference numbers for system design interviews:

  • Cloudflare: Handles ~10% of all internet requests, operates 310+ PoPs globally, network capacity of 140 Tbps.
  • Akamai: 4,000+ PoPs, serves 15-30% of all web traffic.
  • Typical edge server: 10 Gbps NIC, 100TB SSD cache, serves 50,000+ requests/second.
  • Cache miss penalty: 10-200ms depending on origin distance; cache hit: 1-10ms from nearby edge node.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does a CDN reduce latency for end users?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A CDN places edge servers (PoPs – Points of Presence) in dozens to hundreds of cities worldwide. When a user requests a resource, they are directed to the nearest PoP via anycast DNS or GeoDNS. A cross-ocean request might take 200ms RTT; the same request to a local PoP takes 5-20ms. Beyond network distance, CDNs maintain persistent TCP connections and TLS sessions with clients at the edge, eliminating the 1-3 RTT overhead of connection establishment. For cached content, the response never leaves the PoP. For dynamic content, the CDN still reduces latency by 50-70% by terminating the connection close to the user and forwarding over an optimized backbone to the origin.”}},{“@type”:”Question”,”name”:”How does CDN cache invalidation work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Three mechanisms: (1) TTL expiry: the origin sets Cache-Control: max-age=N. After N seconds, the cached version is considered stale and the next request fetches a fresh copy. Simple but requests may serve stale content up to N seconds after an update. (2) Purge API: call the CDN provider's API to immediately invalidate a specific URL or URL pattern. Takes effect within seconds across all PoPs. Used when content must be updated instantly (breaking news, price changes). (3) Surrogate keys / cache tags: tag cached responses with logical names (e.g., product-123). Purge all responses tagged product-123 in one API call. Enables efficient bulk invalidation without knowing all individual URLs. Cloudflare Cache-Tag, Fastly Surrogate-Key, and Varnish use this mechanism.”}},{“@type”:”Question”,”name”:”What is an origin shield and why is it used?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”An origin shield is a mid-tier caching layer designated as the single entry point through which all CDN edge caches fetch content from the origin. Without an origin shield, if a popular URL misses at 200 edge PoPs simultaneously (cache expiry or first deployment), all 200 PoPs send requests to the origin – the "thundering herd." With an origin shield: all edge misses go to a single shield PoP. The shield fetches from origin once, caches the response, and serves all requesting edges. Result: origin receives at most one request per URL per TTL period, regardless of CDN size. The shield adds ~30ms of extra latency for cache misses but dramatically reduces origin load and cost. Critical for any origin that cannot handle N * CDN_PoP_count concurrent requests.”}},{“@type”:”Question”,”name”:”How does anycast routing direct users to the nearest CDN PoP?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Anycast assigns the same IP address to multiple servers in different locations. When a user sends a packet to that IP, internet routing (BGP) automatically delivers it to the topologically nearest server that announces the route. The user never knows which physical server they hit – it is transparent. CDNs advertise their anycast IP blocks from all PoPs. The internet routes each user's requests to the closest PoP based on BGP path metrics (hop count, AS path, local preferences). Anycast also provides automatic failover: if a PoP goes down, BGP withdraws its route announcement and traffic automatically reroutes to the next nearest PoP, typically within 60 seconds. Cloudflare uses anycast for all its 310+ PoPs from a single IP range.”}},{“@type”:”Question”,”name”:”What is the difference between a CDN and a reverse proxy?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A reverse proxy sits in front of a single origin server, forwarding requests and optionally caching responses. It operates in one location. A CDN is a globally distributed network of reverse proxies (edge nodes) deployed in dozens or hundreds of cities. A CDN adds geographic distribution, anycast routing, and the cache hierarchy – capabilities a single-location reverse proxy cannot provide. Every CDN edge node is functionally a reverse proxy; a reverse proxy is not a CDN. Use a reverse proxy (nginx, HAProxy) for: load balancing between app servers, SSL termination, local caching. Use a CDN for: serving users globally, static asset caching, DDoS mitigation, and reducing cross-region latency.”}}]}

Cloudflare builds CDN infrastructure at global scale. See system design questions for Cloudflare interview: CDN architecture and edge computing.

Netflix Open Connect CDN delivers streaming content globally. See system design patterns for Netflix interview: CDN and video delivery system design.

Vercel uses edge CDN for frontend performance. See system design patterns for Vercel interview: edge CDN and frontend delivery design.

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

Scroll to Top