Response Cache Low-Level Design: Cache Key Design, Vary Header Handling, and Invalidation Strategies

What a Response Cache Does

A response cache sits in front of upstream services and stores HTTP responses so that repeated equivalent requests can be served without hitting the origin. The hard parts are defining “equivalent request” correctly, handling content negotiation, invalidating stale entries on data change, and deciding what not to cache.

Cache Key Construction

The cache key must be stable across equivalent requests and differentiated for non-equivalent ones. Steps:

  • Normalize the URL: Sort query parameters alphabetically. Strip tracking parameters (utm_source, fbclid, gclid) that do not affect the response.
  • Include Vary header values: If the upstream response includes Vary: Accept-Encoding, Accept-Language, append the normalized values of those request headers to the key.
  • Scheme + host + path + sorted-params + vary-values → SHA-256 → hex key

Example: GET /search?z=1&a=2&utm_source=email with Accept-Encoding: gzip normalizes to key over /search?a=2&z=1 + gzip.

Vary Header Semantics

The Vary response header tells the cache which request headers affect the response content:

  • Vary: Accept-Encoding — store separate entries for gzip, br, identity
  • Vary: Accept-Language — separate entries per language
  • Vary: Cookie — effectively disables shared caching; every cookie variation gets its own entry, which explodes cache space and should be avoided

The cache must parse the Vary value from the first upstream response and use it to build the key for all subsequent requests to that resource.

What Is Cacheable

  • Cacheable: GET and HEAD with status 200, 301, 404, 410
  • Not cacheable by default: POST, PUT, DELETE (mutating); responses with Cache-Control: no-store; responses with Set-Cookie headers targeting shared caches
  • Authenticated responses: cacheable only if upstream explicitly sends Cache-Control: public; otherwise treat as private

TTL Sources

The cache determines TTL in this priority order:

  1. s-maxage in Cache-Control — authoritative for shared caches
  2. max-age in Cache-Control
  3. Expires header (legacy)
  4. Configured default TTL per route (e.g., 60 seconds for API, 3600 for static assets)

Stale-While-Revalidate

When a cached entry expires, the cache can serve the stale response immediately and trigger an async background revalidation request to the upstream. The stale response goes to the current requester with zero added latency. The upstream response arrives and updates the cache entry. The next requester gets a fresh response. This eliminates thundering-herd on popular expiring entries.

The stale-while-revalidate=N directive defines the window in seconds during which stale serving is permitted.

Conditional Requests and 304 Handling

The cache stores ETag and Last-Modified from upstream responses. On revalidation, it sends:

  • If-None-Match: "etag-value"
  • If-Modified-Since: Thu, 01 Jan 2024 00:00:00 GMT

If the upstream returns 304 Not Modified, the cache serves the stored body without any data transfer from upstream. The entry TTL is reset. This makes revalidation nearly free for unchanged content.

Tag-Based Invalidation

Upstreams tag responses with entity identifiers using the Surrogate-Key or Cache-Tag header:

Cache-Tag: product-42 category-7 homepage

The cache maintains a tag → set of cache keys index. When the product-42 record is updated, a purge call deletes all cache entries tagged product-42 instantly. This is more precise than TTL-based expiry and enables real-time invalidation without short TTLs.

Bypass Rules

  • Cache-Control: no-store — never store, always fetch from upstream
  • Authenticated requests with session cookies — bypass unless upstream sends explicit public
  • Streaming responses (Transfer-Encoding: chunked with no Content-Length) — bypass; cannot buffer for caching
  • Request has Cache-Control: no-cache — revalidate even if fresh entry exists
  • Admin or internal routes — explicitly excluded by config pattern

Cache Hit Rate Metrics

Emit labeled counters for every cache decision:

  • cache_requests_total{result="hit"}
  • cache_requests_total{result="miss"}
  • cache_requests_total{result="stale"}
  • cache_requests_total{result="bypass"}
  • cache_revalidations_total{result="304|200"}

Hit rate = hits / (hits + misses). Track per route to identify endpoints worth caching vs those that are uncacheable by design.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is a cache key constructed from a URL and headers?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A cache key is typically a hash (e.g., SHA-256) of the normalized request URL combined with a sorted subset of headers that affect the response, such as Accept-Encoding and Accept-Language. Normalization includes lowercasing the scheme and host, removing default ports, and canonicalizing query parameter order to maximize hit rates.”
}
},
{
“@type”: “Question”,
“name”: “How does the Vary header affect response caching?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The Vary header instructs caches to store separate response variants for each unique combination of the specified request header values; for example, 'Vary: Accept-Encoding' causes gzip and identity responses to be cached independently. Caches must include the listed headers' values in the cache key lookup, which can cause key space explosion if high-cardinality headers like User-Agent are listed.”
}
},
{
“@type”: “Question”,
“name”: “How does stale-while-revalidate improve perceived latency?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “stale-while-revalidate allows a cache to serve a stale entry immediately while asynchronously fetching a fresh copy in the background, eliminating the synchronous origin hit from the critical path for the requesting client. The staleness window is bounded by the directive's delta-seconds value, after which the entry must be revalidated synchronously.”
}
},
{
“@type”: “Question”,
“name”: “How does tag-based cache invalidation work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each cached response is tagged with one or more logical identifiers (e.g., 'product:42') stored in a reverse index mapping tag to cache keys; when an entity changes, a single purge call on its tag atomically invalidates all associated entries. This avoids the need to enumerate specific URLs, making it practical for pages that aggregate many data entities.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

Scroll to Top