How does the origin shield reduce load on the origin server?

An origin shield is a designated mid-tier PoP that all edge nodes route cache misses through before contacting the origin, collapsing thousands of simultaneous edge misses for the same object into a single upstream request via request coalescing. This dramatically reduces origin fan-out during cache cold starts or invalidation storms, protecting the origin from traffic spikes proportional to the number of edge PoPs.

How are cache keys designed to maximize cache hit rate?

A cache key is constructed from only the request attributes that materially affect the response — typically scheme, host, path, and a normalized subset of query parameters — while stripping tracking parameters (utm_*, fbclid) and irrelevant headers. Vary header normalization (e.g., collapsing Accept-Encoding values to a canonical form) further consolidates cache variants and raises the effective hit ratio.

How does tag-based (surrogate key) cache invalidation work?

The origin attaches Surrogate-Key or Cache-Tag response headers listing one or more logical tags (e.g., product-42, category-shoes) associated with the content, which the CDN stores in a tag-to-URL index. When the origin issues a purge API call for a tag, the CDN looks up all cached URLs carrying that tag and evicts them atomically, enabling precise bulk invalidation without knowing individual URLs.

What is stale-while-revalidate and when is it used?

The stale-while-revalidate Cache-Control extension allows a CDN to serve a stale cached response immediately while asynchronously fetching a fresh copy from the origin in the background, keeping the user's perceived latency at zero. It is most effective for content with high read volume and tolerance for briefly outdated data, such as news feeds or product listings, where the cost of a blocking revalidation round-trip outweighs the risk of serving a response that is seconds old.

CDN Edge Cache Low-Level Design: Cache Hierarchy, Origin Shield, Cache Invalidation, and Routing

⏱ 6 min read

CDN Edge Cache: What It Does

A content delivery network (CDN) accelerates content delivery by caching responses at Points of Presence (PoPs) geographically close to users. Instead of every user request traveling to the origin server, the CDN edge node responds from cache. This reduces latency (from hundreds of milliseconds to single-digit milliseconds), reduces origin load, and improves availability (cached content survives origin downtime).

CDN Architecture

Edge PoPs (L1): Hundreds of locations globally. Handle user-facing requests. Small-to-medium cache storage per PoP. Cache miss rate: 5–30% depending on content popularity and TTL.
Origin shield / mid-tier cache (L2): One or a few regional aggregation points between edge PoPs and the origin. On L1 cache miss, the PoP fetches from the shield rather than the origin directly. The shield has a much higher cache hit rate than individual PoPs because it aggregates cache misses from many edge PoPs for the same region.
Origin server (L3): The authoritative source. Receives only the cache misses that pass through both L1 and L2. With a well-configured shield, origin receives 5–10x less traffic than without.

Request Routing via Anycast DNS

When a user resolves cdn.example.com, the DNS response contains an IP address that routes via BGP anycast to the nearest PoP. Multiple PoPs announce the same IP prefix; routers deliver packets to the topologically closest one. No application-layer routing logic required — the network does it. Latency to the edge PoP is typically 5–30ms for most of the world with a well-distributed PoP network.

Cache Key Design

The cache key determines whether two requests share a cached response. By default: scheme + host + path + query string. Customize for your content:

Strip tracking params: utm_source, fbclid, gclid do not affect content but create unique cache entries. Strip them from the cache key (while preserving them in analytics logs).
Vary on headers: If the origin serves different content based on Accept-Encoding (gzip vs brotli) or Accept-Language, include the relevant header in the cache key. The Vary response header signals which request headers to include.
Normalize: Case-fold the path, sort query parameters alphabetically, strip default ports — ensures /Page and /page and /page?b=2&a=1 resolve to the same cache key when appropriate.

Cache Hierarchy and L1 Miss Handling

On an L1 edge cache miss:

Check if the shield (L2) has the response. If yes, return from shield and populate L1 cache.
If shield also misses, the shield fetches from origin (L3), caches the response, returns to L1 edge, which caches and returns to user.

The shield coalesces concurrent cache misses for the same object: if 1000 edge PoPs all miss on the same object simultaneously, only one request reaches the origin (request collapsing / cache stampede prevention). The origin shield is what makes CDNs viable for large-scale traffic absorption — without it, a cache miss storm during viral content propagation would DDoS your origin.

TTL-Based Expiry and Stale-While-Revalidate

The origin controls TTL via Cache-Control: max-age=3600. After the TTL expires, the cached object is stale. Handling stale objects:

Standard: Expired object is evicted; next request triggers a synchronous origin fetch — adds latency for the unlucky first user after TTL.
Stale-while-revalidate: Cache-Control: max-age=3600, stale-while-revalidate=86400. Serve the stale cached response immediately (zero added latency), and asynchronously fetch a fresh copy in the background. The next request after the background refresh will get the updated content. Ideal for content that changes infrequently but must eventually be fresh.
Stale-if-error: Serve stale content if the origin returns a 5xx error. Provides resilience against origin outages at the cost of serving potentially outdated content.

Cache Invalidation

TTL-based expiry is eventually consistent. For immediate invalidation:

URL purge: API call to the CDN specifying one or more URLs to invalidate immediately across all PoPs. Propagation takes 1–10 seconds globally. Rate-limited by CDN providers (typically 1000 purges/second).
Surrogate keys / Cache tags: Tag cached responses with logical identifiers at serve time (response header: Surrogate-Key: product-42 category-shoes). On purge, invalidate by tag: all responses tagged product-42 are purged simultaneously regardless of URL. Enables purging all product pages for a changed product with a single API call. Fastly and Cloudflare support this natively.
Versioned URLs: Include a content hash or version in the URL (/app.a8f3b2.js). Never purge — old URL is simply abandoned; new URL is always a cache miss initially, then cached with a long TTL (1 year). Best for immutable static assets.

Negative Caching

Cache 404 and other error responses briefly (30–300 seconds). Without negative caching, a request for a non-existent resource hammers the origin on every request. With it, the CDN absorbs the repeated requests. Do not cache 404s for too long — legitimate resources may be created after a miss.

TLS Termination and Protocol Optimization

TLS handshake at the edge (20–50ms) is amortized across many requests via session resumption and TLS 1.3 0-RTT. HTTP/2 multiplexes multiple requests over a single TCP connection, eliminating head-of-line blocking at the HTTP layer. HTTP/3 (QUIC) eliminates TCP head-of-line blocking entirely and provides faster connection establishment over lossy mobile networks.

Edge Compute

Modern CDNs support running lightweight logic at the PoP — Cloudflare Workers, Fastly Compute@Edge, Lambda@Edge. Use cases: A/B testing (modify response before delivery), authentication token validation (reject unauthorized requests at edge before they reach origin), geo-based redirects, request/response header manipulation, dynamic HTML stitching. Constraints: limited CPU time (1–50ms), limited memory (128MB), no arbitrary I/O — these are not general-purpose compute environments.

Trade-offs and Failure Modes

Cache poisoning: If an attacker can cause the CDN to cache a malicious response (e.g., by injecting headers that cause the CDN to treat a user-specific URL as cacheable), all users receive the poisoned response. Strict cache key normalization and avoiding caching responses that vary by user-controlled headers mitigate this.
Cold start after purge: A global purge of popular content causes a cache miss storm — millions of users simultaneously trigger origin fetches. Use stale-while-revalidate instead of hard purge where possible, or warm the cache (pre-fetch) before purging the old version.
Long TTL and stale data: Setting max-age=86400 for content that changes frequently means users see stale data for up to 24 hours. Use surrogate keys + event-driven purge on content change rather than relying on TTL alone for mutable content.
PoP unavailability: If a PoP fails, anycast routing automatically shifts traffic to the next-closest PoP. The CDN provides resilience that origin servers alone cannot match.