Question 1

How do you choose the right TTL for different content types?

Accepted Answer

TTL is a trade-off: longer TTL = lower origin load + faster responses; shorter TTL = fresher content + more purge flexibility. Practical guidelines by content type: (1) Static assets with content-hash in URL (app.a3f4b2.js, logo.c1d9e.png): cache forever (max-age=31536000, immutable) — the URL changes when content changes, so stale content is impossible; (2) API responses for public, slowly-changing data (product catalog, pricing): 60–300 seconds — short enough that a price change propagates in minutes; (3) User-personalized content (shopping cart, user profile): cache-control: private, no-store — must not be cached by CDN at all; (4) HTML pages with embedded version numbers: 60 seconds with stale-while-revalidate=60 — users get fast loads, and updates roll out within 2 minutes; (5) Images, fonts, CSS: 1 year with content-hash, 7 days without.

Question 2

How does a CDN decide which edge node to serve a request from?

Accepted Answer

Anycast routing: the CDN operates many servers worldwide, all announced under the same IP address prefixes via BGP Anycast. When a user's DNS resolver queries the CDN's domain, BGP routing (not DNS) directs the request to the geographically closest edge node — the one with the fewest network hops from the user's ISP. This works at the network layer, not application layer, so the selection happens before the first TCP packet reaches the CDN. Alternative: GeoDNS — the authoritative DNS server returns different IP addresses based on the requester's IP geolocation. Anycast is generally preferred because it doesn't require DNS TTL expiry for failover (BGP re-routes automatically), while GeoDNS requires waiting for DNS TTL to expire when re-routing. CDNs like Cloudflare use Anycast; older CDNs used GeoDNS.

Question 3

How do you handle cache poisoning attacks on a CDN?

Accepted Answer

Cache poisoning: an attacker crafts a request that causes the CDN to cache a malicious response, which is then served to all subsequent users. Common vectors: (1) HTTP header injection — if the cache key doesn't include a header that influences the response, an attacker can inject a malicious value via that header; (2) unkeyed inputs — query parameters stripped during normalization that still affect the response body. Mitigations: (1) strict cache key construction: include in the key ALL inputs that affect the response; (2) normalize and validate inputs before caching: reject requests with unusual header values; (3) Content-Security-Policy headers on cached responses limit damage from injected content; (4) use Vary headers explicitly — Cache-Control: Vary=Accept-Language means the CDN creates separate cache entries per language, preventing cross-language poisoning; (5) never cache 4xx responses except 404 (brief TTL) and 410 (long TTL for gone resources).

Question 4

How does stale-while-revalidate improve perceived performance without sacrificing freshness?

Accepted Answer

Standard TTL-based caching has a "cold miss" problem: when a popular object expires, all in-flight requests simultaneously miss the cache and hammer the origin. stale-while-revalidate splits the TTL into two windows: the fresh window (serve cached) and the stale window (serve cached while fetching fresh in background). Example: Cache-Control: max-age=60, stale-while-revalidate=300. For the first 60 seconds: serve cached (fresh). For seconds 61–360: serve cached (stale) AND send one background revalidation request to origin. After 360 seconds: full cache miss, synchronous origin fetch. Result: users NEVER wait for an origin round-trip except for the very first request and rare full-expiry misses. The background revalidation keeps cache near-fresh (usually

Question 5

How do you purge a CDN cache during a hotfix deployment without a full cache wipe?

Accepted Answer

A full cache wipe (flush all) is nuclear: it instantly makes every cached object stale, causing a thundering herd to the origin. For a hotfix, use surgical purge: (1) URL purge: if you know exactly which URLs changed, purge those specific URLs — most CDNs offer an API endpoint (Cloudflare: DELETE /zones/{zone_id}/purge_cache with {"files": [url1, url2]}). (2) Tag purge: if your hotfix affects a category of objects tagged at cache time ("component:header", "product:SKU-123"), purge by tag. Most enterprise CDNs (Fastly, Akamai, Cloudflare Enterprise) support Surrogate-Key or Cache-Tag headers. (3) Cache-Control: no-cache with deploy hash: change the asset URL (version in filename) so new deploys are automatically served without purge. For emergency situations where none of these work: set a short TTL (30s) temporarily, wait for natural expiry, then restore the original TTL.

CDN Cache System Low-Level Design: Cache Keys, TTL Management, Tag Purge, and Stampede Protection

CDN Cache System: Low-Level Design

Core Data Model (Origin-Side Cache Manifest)

Cache Key Construction

Cache Storage Layer (Edge Node)

Purge API

Key Design Decisions