System Design Interview: Content Delivery Network (CDN)

⏱ 7 min read

What Is a CDN?

A Content Delivery Network is a globally distributed network of servers (Points of Presence, or PoPs) that cache and serve content from locations geographically close to users. Instead of every user fetching a video or image from an origin server in Virginia, users in Tokyo get it from a Tokyo PoP — reducing latency from 200ms to under 5ms. Cloudflare, Akamai, Fastly, and AWS CloudFront are major CDNs. This question appears in senior backend, infrastructure, and platform engineering interviews.

Architecture: PoPs and Cache Hierarchy

CDNs use a two-tier cache hierarchy. Edge PoPs (tier 1) are deployed in 100-200 cities worldwide, physically close to users. Regional clusters (tier 2) sit between edge PoPs and the origin, serving as a second cache layer. When a user requests content: (1) Edge PoP checks its local cache. Cache hit: serve immediately. (2) Cache miss: edge PoP queries its regional parent cluster. (3) Regional cluster cache miss: fetch from origin server. The two-tier hierarchy dramatically reduces origin load — only cache misses at tier 2 reach the origin, and popular content stays in tier-2 cache indefinitely.

Anycast Routing

CDNs use BGP anycast: the same IP address is announced from all PoPs simultaneously. When a user sends a DNS query or TCP connection to the CDN IP, the internet routing protocol (BGP) automatically routes them to the topologically nearest PoP that announced that address. This requires no DNS-based geo-routing — the network itself routes the user optimally. Cloudflare uses anycast for all its services, which also provides inherent DDoS resilience: attack traffic is absorbed across all PoPs rather than overwhelming a single IP.

Cache Control and TTLs

Content is cached based on HTTP Cache-Control headers from the origin. Key directives: max-age=3600 means cache for 1 hour; s-maxage overrides max-age specifically for CDN edge caches; no-store means never cache; private means browser-only, not CDN. Correct cache headers are critical: set max-age too long and users see stale content; too short and cache hit rate collapses, putting load back on origin. Static assets (JS, CSS, images) use immutable content-addressed URLs (hash in filename) with very long TTLs (1 year). HTML pages use short TTLs (60 seconds) or no-cache to keep them fresh.

Origin Pull vs Push

Two models for populating CDN caches: (1) Origin pull (lazy loading): content is fetched from origin only when first requested at a PoP. The PoP caches the response for subsequent requests. Simple to operate but causes a cold start on cache miss — the first user from each PoP pays full origin latency. (2) Push CDN: content is explicitly uploaded to all edge nodes before any user requests it. Used for large static assets (software downloads, video files) where you want zero-miss delivery from day one. Most CDNs use pull for dynamic web content and push for large media files.

Cache Invalidation

Cache invalidation is one of the hardest problems in CDNs. When content changes before its TTL expires, you need to purge stale copies from all PoPs. Approaches: (1) URL-based purge: issue a purge API call for specific URLs. Cloudflare propagates purges to all PoPs in under 150ms. (2) Tag-based invalidation (Fastly, Cloudflare): attach surrogate keys (cache tags) to responses. Purge by tag to invalidate all content with that tag simultaneously — useful for invalidating all images for a specific product. (3) Versioned URLs: change the URL when content changes (add ?v=hash or hash in path). No purge needed — old URL stays cached, new URL starts fresh. Best practice for static assets.

TLS Termination and Performance

CDNs terminate TLS at the edge PoP, close to the user. TLS handshake round trips are the biggest HTTPS overhead — terminating at the nearest PoP instead of the origin reduces handshake latency from 200ms to 5ms. The CDN maintains a persistent, pre-warmed HTTPS connection back to the origin (TLS session resumption, HTTP/2 multiplexing), so origin connections are amortized across many user requests. CDNs also handle HTTP/2 and HTTP/3 (QUIC) protocol negotiation with users while using HTTP/1.1 to older origin servers, bridging the protocol gap without requiring origin upgrades.

Estimating Cache Hit Rate

Cache hit rate depends on content popularity distribution and TTL. For a news site: top 1000 articles receive 80% of traffic (power law). With TTL=3600 and 100 edge PoPs: each article must be requested at least once per hour per PoP to stay cached. At 10M requests/day: popular articles are requested thousands of times per hour per PoP, so cache hit rate approaches 99.9%. Long-tail articles (visited once a week) never cache effectively — these are served from origin anyway and contribute a tiny fraction of traffic. Overall CDN hit rate for a large content site: typically 85-95%.

Interview Tips

Start by clarifying: static vs dynamic content, video streaming, API acceleration, or all three?
Explain the two-tier hierarchy — edge + regional reduces origin load multiplicatively
Cache invalidation strategy is a strong differentiator — know tag-based invalidation
Anycast routing is the right answer for global load balancing, not just DNS geo-routing
Mention TLS termination at edge — interviewers expect awareness of HTTPS overhead

Frequently Asked Questions

How does a CDN reduce latency?

A CDN reduces latency by serving content from a Point of Presence (PoP) physically close to the user, rather than from a central origin server. A user in Tokyo fetching a cached image from a Tokyo PoP experiences under 5ms latency; the same request to an origin server in Virginia would take 150-200ms due to transoceanic routing. The CDN also terminates TLS at the edge, eliminating multiple round trips for the handshake. For dynamic content that cannot be cached, CDNs still help by maintaining pre-warmed persistent connections to the origin over optimized routes, reducing connection setup overhead from each user request.

What is the difference between a CDN cache hit and cache miss?

A cache hit occurs when the requested content is present in the CDN edge PoP and has not expired (TTL is still valid). The PoP serves the cached copy without contacting the origin — this is fast (sub-5ms) and free in terms of origin load. A cache miss occurs when the content is absent or expired. The PoP must fetch it from the origin (or a parent tier-2 PoP), cache the response for future requests, and serve it to the user. Cache hit rate is the key efficiency metric: a 95% hit rate means only 5% of requests reach origin. Hit rate depends on: content popularity (power law distribution means popular content caches very well), TTL (longer TTL = fewer misses), and number of unique URLs (URL proliferation creates a long tail that never caches).

How does CDN cache invalidation work?

Cache invalidation removes or updates stale cached content before its TTL expires. Three approaches: (1) URL purge — send a purge API request for specific URLs to the CDN. Major CDNs (Cloudflare, Fastly) propagate purges globally in under 150ms. (2) Tag-based purge — attach cache tags (surrogate keys) to responses, then purge by tag. A single API call can invalidate thousands of URLs that share a tag. Useful for invalidating all cached pages that reference a product that just changed. (3) Versioned URLs — embed a content hash or version in the URL itself. When content changes, the URL changes, so the old cached version is naturally bypassed. This is the most reliable approach for static assets and requires no active purge management.

LinkedIn Interview Guide

Airbnb Interview Guide

Twitter/X Interview Guide

Shopify Interview Guide

Netflix Interview Guide

Cloudflare Interview Guide