A Content Delivery Network caches content at servers distributed across the globe, serving users from the nearest location. CDNs reduce latency from 200ms+ (cross-continent origin fetch) to under 20ms (local edge), while offloading 80-95% of traffic from origin servers. This guide covers CDN architecture internals — how edge caching works, cache invalidation strategies, and production CDN design — essential for system design interviews involving any web-scale application.
CDN Architecture: PoPs and Edge Servers
A CDN consists of Points of Presence (PoPs) distributed globally. Each PoP contains edge servers that cache and serve content. Cloudflare has 300+ PoPs, Akamai has 4000+ edge locations. When a user requests a resource (image, CSS, API response): (1) DNS resolution routes the request to the nearest PoP. The CDN DNS server returns the IP of the closest edge server based on the user geographic location (GeoDNS) or network topology (Anycast). (2) The edge server checks its local cache. On cache hit (the resource is cached and not expired): return immediately. Latency: 1-20ms (within the same city). On cache miss: fetch from the origin server, cache the response, and return to the user. (3) Subsequent requests from any user near the same PoP are served from cache. Anycast routing: all edge servers share the same IP address. BGP routing directs packets to the nearest server based on network distance. This is simpler than GeoDNS (no per-user DNS resolution needed) and provides automatic failover (if a PoP goes down, traffic routes to the next nearest). Cloudflare and Fastly use Anycast. Akamai uses a combination of DNS-based and Anycast routing.
Cache Control and TTL
The origin server controls caching behavior via HTTP headers: Cache-Control: public, max-age=86400 — the resource can be cached by the CDN for 86400 seconds (24 hours). Cache-Control: private, no-cache — do not cache (user-specific content, real-time data). Cache-Control: public, s-maxage=3600, max-age=86400 — CDN caches for 1 hour (s-maxage), browser caches for 24 hours (max-age). s-maxage overrides max-age for shared caches (CDNs). ETag and Last-Modified: enable conditional requests. After the cache expires, the CDN sends If-None-Match (with the ETag) or If-Modified-Since to the origin. If the content has not changed, the origin returns 304 Not Modified (no body), and the CDN extends the cache. This revalidation avoids re-downloading unchanged content. Vary header: tells the CDN to cache different versions based on request headers. Vary: Accept-Encoding caches gzip and brotli versions separately. Vary: Accept caches different image formats (WebP, AVIF, JPEG) separately. Caching strategy by content type: static assets (images, CSS, JS) — long TTL (1 year), use content hashing in filenames (app.abc123.js) for instant invalidation. API responses — short TTL (1-60 seconds) or no cache for user-specific data. HTML pages — moderate TTL (5-60 minutes) with stale-while-revalidate for freshness.
Cache Invalidation
Cache invalidation — removing stale content from CDN caches — is notoriously difficult. Strategies: (1) TTL-based expiration — the simplest approach. Content expires after the max-age/s-maxage period. No active invalidation needed. Downside: stale content is served until TTL expires. For breaking changes, TTL may be too slow. (2) Purge API — CDN providers expose an API to purge cached content immediately. Cloudflare: purge by URL, tag, or everything. Fastly: purge by URL, surrogate key, or all. After a content update, call the purge API. The CDN removes the cached version; the next request fetches fresh content from the origin. Latency: purge propagation takes 1-5 seconds across all PoPs. (3) Cache busting with versioned URLs — append a version or hash to the URL: /images/logo.v3.png or /app.abc123.js. When content changes, the URL changes. The old URL remains cached (harmless), and the new URL triggers a cache miss and origin fetch. This is the most reliable invalidation method for static assets. (4) Stale-while-revalidate — serve the cached (potentially stale) content immediately while asynchronously fetching a fresh version from the origin. The user gets a fast response; the cache is updated for the next request. Cache-Control: public, max-age=60, stale-while-revalidate=300.
Origin Shield
Without origin shield: each PoP independently fetches from the origin on cache miss. With 300 PoPs and a cache miss after TTL expiration, the origin receives 300 simultaneous requests for the same resource (thundering herd). Origin shield: a designated intermediate cache layer between edge PoPs and the origin. When an edge PoP has a cache miss, it requests from the origin shield instead of the origin directly. If the shield has the content cached, it serves the edge PoP (no origin hit). If the shield also misses, only one request goes to the origin. Benefits: (1) Reduces origin load by 10-50x (the shield absorbs cache misses from all PoPs). (2) Coalesces concurrent requests — if 100 PoPs request the same resource simultaneously, the shield sends one request to the origin and fans out the response. (3) Improves cache hit ratio — the shield has a larger effective cache (aggregating demand from all PoPs) and sees more traffic for each resource. Cloudflare Tiered Cache, Fastly Shield, and AWS CloudFront Origin Shield implement this pattern. For high-traffic origins, origin shield is essential to prevent the origin from being overwhelmed during cache expiration events.
CDN for Dynamic Content and APIs
CDNs are not just for static files. Modern CDNs accelerate dynamic content: (1) TCP/TLS optimization — the CDN maintains persistent, optimized connections between edge PoPs and the origin (connection pooling, TLS session resumption). The user TLS handshake is with the nearby edge (fast); the edge-to-origin connection is pre-established (no per-request handshake). (2) Edge compute — Cloudflare Workers, Fastly Compute, and AWS Lambda@Edge run custom code at the edge. Use cases: A/B testing (route users to different origins), geolocation-based content (serve different content based on country), authentication (validate JWT at the edge before forwarding to the origin), and response transformation (add headers, modify HTML). (3) Short-TTL API caching — cache API responses for 1-5 seconds. Even a 1-second cache eliminates 90%+ of origin hits for popular endpoints during traffic spikes. Stale-while-revalidate ensures users always get a fast response while the cache refreshes. (4) WebSocket and streaming — modern CDNs support WebSocket proxying and live video streaming (HLS/DASH) at the edge, reducing latency for real-time applications. In system design interviews: always include a CDN layer for any user-facing application. Mention specific caching strategies (long TTL + cache busting for static, short TTL + stale-while-revalidate for API) to demonstrate depth.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does a CDN reduce latency and offload origin servers?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A CDN caches content at edge servers distributed globally in Points of Presence (PoPs). When a user requests a resource, DNS routes them to the nearest PoP (via GeoDNS or Anycast). The edge server checks its cache: on hit, it returns immediately (1-20ms latency vs 200ms+ from a distant origin). On miss, it fetches from the origin, caches the response, and returns to the user. Subsequent requests from users near the same PoP are served from cache. CDNs offload 80-95% of traffic from origin servers. Cloudflare has 300+ PoPs, Akamai 4000+. Anycast routing shares one IP across all PoPs — BGP routing directs packets to the nearest server automatically and provides failover if a PoP goes down.”}},{“@type”:”Question”,”name”:”What is an origin shield and why does it matter?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Without origin shield: each of 300 PoPs independently fetches from the origin on cache miss. When a popular resource TTL expires, the origin receives 300 simultaneous requests (thundering herd). Origin shield adds an intermediate cache between edge PoPs and the origin. Edge misses go to the shield first. If the shield has it cached, it serves the edge (no origin hit). If the shield also misses, only ONE request goes to the origin. It coalesces concurrent requests: 100 PoPs requesting the same resource simultaneously = one origin fetch. Benefits: 10-50x reduction in origin load, higher effective cache hit ratio (shield aggregates demand from all PoPs), and protection during cache expiration storms. Cloudflare Tiered Cache, Fastly Shield, and CloudFront Origin Shield implement this. Essential for high-traffic origins.”}},{“@type”:”Question”,”name”:”How do you handle cache invalidation in a CDN?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Four strategies: (1) TTL expiration — content expires after max-age. Simple, no active invalidation. Downside: stale content served until TTL. (2) Purge API — CDN APIs remove cached content immediately. Cloudflare/Fastly support purge by URL, tag, or all. Propagation: 1-5 seconds across all PoPs. Use after content updates that cannot wait for TTL. (3) Cache busting with versioned URLs — append hash to filename: app.abc123.js. When content changes, URL changes. Old URL stays cached harmlessly; new URL triggers origin fetch. Most reliable for static assets. (4) Stale-while-revalidate — serve cached (possibly stale) content immediately while fetching fresh version in background. Cache-Control: max-age=60, stale-while-revalidate=300. Users get fast responses; cache refreshes asynchronously. Best practice: use versioned URLs for static assets (CSS, JS, images) and short TTL + stale-while-revalidate for API responses.”}},{“@type”:”Question”,”name”:”Can CDNs accelerate dynamic API content, not just static files?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Yes, modern CDNs accelerate dynamic content in several ways: (1) TCP/TLS optimization — persistent, optimized connections between edge and origin. The user TLS handshake is with the nearby edge (fast); edge-to-origin uses pre-established connections. (2) Short-TTL API caching — cache API responses for 1-5 seconds. Even 1-second cache eliminates 90%+ of origin hits during traffic spikes. Stale-while-revalidate ensures fast responses during refresh. (3) Edge compute — Cloudflare Workers, Fastly Compute, Lambda@Edge run custom logic at the edge: A/B testing, JWT validation, geolocation routing, response transformation. This moves logic closer to users. (4) WebSocket and streaming — modern CDNs proxy WebSocket connections and serve live video (HLS/DASH) from edge. In system design interviews: always include a CDN. Mention specific strategies: long TTL + cache busting for static, short TTL + stale-while-revalidate for APIs.”}}]}