What a Response Cache Does
A response cache sits in front of upstream services and stores HTTP responses so that repeated equivalent requests can be served without hitting the origin. The hard parts are defining “equivalent request” correctly, handling content negotiation, invalidating stale entries on data change, and deciding what not to cache.
Cache Key Construction
The cache key must be stable across equivalent requests and differentiated for non-equivalent ones. Steps:
- Normalize the URL: Sort query parameters alphabetically. Strip tracking parameters (
utm_source,fbclid,gclid) that do not affect the response. - Include Vary header values: If the upstream response includes
Vary: Accept-Encoding, Accept-Language, append the normalized values of those request headers to the key. - Scheme + host + path + sorted-params + vary-values → SHA-256 → hex key
Example: GET /search?z=1&a=2&utm_source=email with Accept-Encoding: gzip normalizes to key over /search?a=2&z=1 + gzip.
Vary Header Semantics
The Vary response header tells the cache which request headers affect the response content:
Vary: Accept-Encoding— store separate entries for gzip, br, identityVary: Accept-Language— separate entries per languageVary: Cookie— effectively disables shared caching; every cookie variation gets its own entry, which explodes cache space and should be avoided
The cache must parse the Vary value from the first upstream response and use it to build the key for all subsequent requests to that resource.
What Is Cacheable
- Cacheable: GET and HEAD with status 200, 301, 404, 410
- Not cacheable by default: POST, PUT, DELETE (mutating); responses with
Cache-Control: no-store; responses withSet-Cookieheaders targeting shared caches - Authenticated responses: cacheable only if upstream explicitly sends
Cache-Control: public; otherwise treat as private
TTL Sources
The cache determines TTL in this priority order:
s-maxageinCache-Control— authoritative for shared cachesmax-ageinCache-ControlExpiresheader (legacy)- Configured default TTL per route (e.g., 60 seconds for API, 3600 for static assets)
Stale-While-Revalidate
When a cached entry expires, the cache can serve the stale response immediately and trigger an async background revalidation request to the upstream. The stale response goes to the current requester with zero added latency. The upstream response arrives and updates the cache entry. The next requester gets a fresh response. This eliminates thundering-herd on popular expiring entries.
The stale-while-revalidate=N directive defines the window in seconds during which stale serving is permitted.
Conditional Requests and 304 Handling
The cache stores ETag and Last-Modified from upstream responses. On revalidation, it sends:
If-None-Match: "etag-value"If-Modified-Since: Thu, 01 Jan 2024 00:00:00 GMT
If the upstream returns 304 Not Modified, the cache serves the stored body without any data transfer from upstream. The entry TTL is reset. This makes revalidation nearly free for unchanged content.
Tag-Based Invalidation
Upstreams tag responses with entity identifiers using the Surrogate-Key or Cache-Tag header:
Cache-Tag: product-42 category-7 homepage
The cache maintains a tag → set of cache keys index. When the product-42 record is updated, a purge call deletes all cache entries tagged product-42 instantly. This is more precise than TTL-based expiry and enables real-time invalidation without short TTLs.
Bypass Rules
Cache-Control: no-store— never store, always fetch from upstream- Authenticated requests with session cookies — bypass unless upstream sends explicit
public - Streaming responses (
Transfer-Encoding: chunkedwith noContent-Length) — bypass; cannot buffer for caching - Request has
Cache-Control: no-cache— revalidate even if fresh entry exists - Admin or internal routes — explicitly excluded by config pattern
Cache Hit Rate Metrics
Emit labeled counters for every cache decision:
cache_requests_total{result="hit"}cache_requests_total{result="miss"}cache_requests_total{result="stale"}cache_requests_total{result="bypass"}cache_revalidations_total{result="304|200"}
Hit rate = hits / (hits + misses). Track per route to identify endpoints worth caching vs those that are uncacheable by design.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is a cache key constructed from a URL and headers?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A cache key is typically a hash (e.g., SHA-256) of the normalized request URL combined with a sorted subset of headers that affect the response, such as Accept-Encoding and Accept-Language. Normalization includes lowercasing the scheme and host, removing default ports, and canonicalizing query parameter order to maximize hit rates.”
}
},
{
“@type”: “Question”,
“name”: “How does the Vary header affect response caching?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The Vary header instructs caches to store separate response variants for each unique combination of the specified request header values; for example, 'Vary: Accept-Encoding' causes gzip and identity responses to be cached independently. Caches must include the listed headers' values in the cache key lookup, which can cause key space explosion if high-cardinality headers like User-Agent are listed.”
}
},
{
“@type”: “Question”,
“name”: “How does stale-while-revalidate improve perceived latency?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “stale-while-revalidate allows a cache to serve a stale entry immediately while asynchronously fetching a fresh copy in the background, eliminating the synchronous origin hit from the critical path for the requesting client. The staleness window is bounded by the directive's delta-seconds value, after which the entry must be revalidated synchronously.”
}
},
{
“@type”: “Question”,
“name”: “How does tag-based cache invalidation work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each cached response is tagged with one or more logical identifiers (e.g., 'product:42') stored in a reverse index mapping tag to cache keys; when an entity changes, a single purge call on its tag atomically invalidates all associated entries. This avoids the need to enumerate specific URLs, making it practical for pages that aggregate many data entities.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety