Response Cache Low-Level Design: Cache Key Design, Vary Header Handling, and Invalidation Strategies

What a Response Cache Does

A response cache sits in front of upstream services and stores HTTP responses so that repeated equivalent requests can be served without hitting the origin. The hard parts are defining “equivalent request” correctly, handling content negotiation, invalidating stale entries on data change, and deciding what not to cache.

Cache Key Construction

The cache key must be stable across equivalent requests and differentiated for non-equivalent ones. Steps:

  • Normalize the URL: Sort query parameters alphabetically. Strip tracking parameters (utm_source, fbclid, gclid) that do not affect the response.
  • Include Vary header values: If the upstream response includes Vary: Accept-Encoding, Accept-Language, append the normalized values of those request headers to the key.
  • Scheme + host + path + sorted-params + vary-values → SHA-256 → hex key

Example: GET /search?z=1&a=2&utm_source=email with Accept-Encoding: gzip normalizes to key over /search?a=2&z=1 + gzip.

Vary Header Semantics

The Vary response header tells the cache which request headers affect the response content:

  • Vary: Accept-Encoding — store separate entries for gzip, br, identity
  • Vary: Accept-Language — separate entries per language
  • Vary: Cookie — effectively disables shared caching; every cookie variation gets its own entry, which explodes cache space and should be avoided

The cache must parse the Vary value from the first upstream response and use it to build the key for all subsequent requests to that resource.

What Is Cacheable

  • Cacheable: GET and HEAD with status 200, 301, 404, 410
  • Not cacheable by default: POST, PUT, DELETE (mutating); responses with Cache-Control: no-store; responses with Set-Cookie headers targeting shared caches
  • Authenticated responses: cacheable only if upstream explicitly sends Cache-Control: public; otherwise treat as private

TTL Sources

The cache determines TTL in this priority order:

  1. s-maxage in Cache-Control — authoritative for shared caches
  2. max-age in Cache-Control
  3. Expires header (legacy)
  4. Configured default TTL per route (e.g., 60 seconds for API, 3600 for static assets)

Stale-While-Revalidate

When a cached entry expires, the cache can serve the stale response immediately and trigger an async background revalidation request to the upstream. The stale response goes to the current requester with zero added latency. The upstream response arrives and updates the cache entry. The next requester gets a fresh response. This eliminates thundering-herd on popular expiring entries.

The stale-while-revalidate=N directive defines the window in seconds during which stale serving is permitted.

Conditional Requests and 304 Handling

The cache stores ETag and Last-Modified from upstream responses. On revalidation, it sends:

  • If-None-Match: "etag-value"
  • If-Modified-Since: Thu, 01 Jan 2024 00:00:00 GMT

If the upstream returns 304 Not Modified, the cache serves the stored body without any data transfer from upstream. The entry TTL is reset. This makes revalidation nearly free for unchanged content.

Tag-Based Invalidation

Upstreams tag responses with entity identifiers using the Surrogate-Key or Cache-Tag header:

Cache-Tag: product-42 category-7 homepage

The cache maintains a tag → set of cache keys index. When the product-42 record is updated, a purge call deletes all cache entries tagged product-42 instantly. This is more precise than TTL-based expiry and enables real-time invalidation without short TTLs.

Bypass Rules

  • Cache-Control: no-store — never store, always fetch from upstream
  • Authenticated requests with session cookies — bypass unless upstream sends explicit public
  • Streaming responses (Transfer-Encoding: chunked with no Content-Length) — bypass; cannot buffer for caching
  • Request has Cache-Control: no-cache — revalidate even if fresh entry exists
  • Admin or internal routes — explicitly excluded by config pattern

Cache Hit Rate Metrics

Emit labeled counters for every cache decision:

  • cache_requests_total{result="hit"}
  • cache_requests_total{result="miss"}
  • cache_requests_total{result="stale"}
  • cache_requests_total{result="bypass"}
  • cache_revalidations_total{result="304|200"}

Hit rate = hits / (hits + misses). Track per route to identify endpoints worth caching vs those that are uncacheable by design.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

Scroll to Top