PoP Selection and GeoDNS
When a client issues a DNS query, GeoDNS maps the client IP to a geographic region and returns the IP of the nearest Point of Presence (PoP). GeoIP databases (MaxMind or proprietary) are used for mapping. DNS TTL is kept short (30-60s) to allow rapid failover.
Anycast Routing
The same anycast IP address is announced via BGP from every PoP. The internet's routing infrastructure delivers packets to the topologically closest PoP. No application-level routing is required for initial connection establishment.
Cache Hierarchy
Client
└─ L1: Edge Cache (PoP, city-level)
└─ L2: Regional Cache / Origin Shield (region-level)
└─ L3: Origin Server
Cache key is composed of the URL plus normalized request headers (Accept-Encoding, Accept). Vary headers drive per-representation caching.
Cache-Control Propagation
Edge nodes respect Cache-Control directives from origin:
Cache-Control: public, max-age=86400, s-maxage=3600, stale-while-revalidate=60
s-maxage overrides max-age for shared caches. stale-while-revalidate allows serving stale content while fetching a fresh copy in the background, avoiding latency spikes on expiry.
Cache Miss Flow
On L1 miss, the edge requests from the regional origin shield. On shield miss, the shield fetches from origin. The fetched response is cached at both L1 and L2 on the way back. This collapses many concurrent cache misses into a single origin request.
Origin Shield
One PoP per region is designated as the origin shield. All other PoPs in the region treat the shield as their upstream. This concentrates origin traffic and dramatically reduces origin load during cache cold-start or invalidation waves.
Cache Invalidation
An invalidation API accepts URL patterns (exact, prefix, or regex). The control plane publishes invalidation tokens to all PoPs via pub/sub (e.g., Kafka or a proprietary push channel). Each PoP marks matching cache entries as stale on receipt.
POST /invalidate
{"patterns": ["/assets/app.js", "/images/*"]}
Propagation: eventual consistency, 30-60 seconds to all PoPs.
Cache Hit Ratio Metrics
Each PoP reports per-URL-class hit/miss counts to a central metrics store. Cache hit ratio is monitored per PoP and per content type. Low hit ratios trigger alerts and investigation of cache key configuration or TTL settings.
Health Checks and Failover
Each PoP probes the origin (and shield) every 30 seconds via HTTP health check. On failure, the PoP fails over to a secondary origin or an alternate shield. DNS failover is coordinated via the GeoDNS control plane to route around unhealthy PoPs.
GET /health HTTP/1.1
Host: origin.example.com
200 OK → origin healthy
5xx/timeout → mark unhealthy, failover to secondary
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does GeoDNS select the nearest CDN PoP?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “GeoDNS maps the client's IP address to a geographic region using a GeoIP database and returns the IP of the nearest Point of Presence. DNS TTL is kept short (30-60 seconds) to enable rapid failover when a PoP becomes unhealthy.”
}
},
{
“@type”: “Question”,
“name”: “What is an origin shield in a CDN architecture?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An origin shield is a single designated PoP per region that acts as an intermediary between edge PoPs and the origin server. All regional edges fetch from the shield on cache miss, consolidating origin requests and dramatically reducing origin load.”
}
},
{
“@type”: “Question”,
“name”: “How does CDN cache invalidation work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An invalidation API accepts URL patterns. The control plane publishes invalidation tokens to all PoPs via pub/sub. Each PoP marks matching cache entries as stale on receipt. Propagation is eventual consistency with a typical 30-60 second window.”
}
},
{
“@type”: “Question”,
“name”: “How does anycast routing work in a CDN?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The same IP address is announced via BGP from every PoP worldwide. The internet's routing infrastructure delivers client packets to the topologically closest PoP automatically, without any application-level routing logic required.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety