What a Response Cache Does
A response cache sits in front of upstream services and stores HTTP responses so that repeated equivalent requests can be served without hitting the origin. The hard parts are defining “equivalent request” correctly, handling content negotiation, invalidating stale entries on data change, and deciding what not to cache.
Cache Key Construction
The cache key must be stable across equivalent requests and differentiated for non-equivalent ones. Steps:
- Normalize the URL: Sort query parameters alphabetically. Strip tracking parameters (
utm_source,fbclid,gclid) that do not affect the response. - Include Vary header values: If the upstream response includes
Vary: Accept-Encoding, Accept-Language, append the normalized values of those request headers to the key. - Scheme + host + path + sorted-params + vary-values → SHA-256 → hex key
Example: GET /search?z=1&a=2&utm_source=email with Accept-Encoding: gzip normalizes to key over /search?a=2&z=1 + gzip.
Vary Header Semantics
The Vary response header tells the cache which request headers affect the response content:
Vary: Accept-Encoding— store separate entries for gzip, br, identityVary: Accept-Language— separate entries per languageVary: Cookie— effectively disables shared caching; every cookie variation gets its own entry, which explodes cache space and should be avoided
The cache must parse the Vary value from the first upstream response and use it to build the key for all subsequent requests to that resource.
What Is Cacheable
- Cacheable: GET and HEAD with status 200, 301, 404, 410
- Not cacheable by default: POST, PUT, DELETE (mutating); responses with
Cache-Control: no-store; responses withSet-Cookieheaders targeting shared caches - Authenticated responses: cacheable only if upstream explicitly sends
Cache-Control: public; otherwise treat as private
TTL Sources
The cache determines TTL in this priority order:
s-maxageinCache-Control— authoritative for shared cachesmax-ageinCache-ControlExpiresheader (legacy)- Configured default TTL per route (e.g., 60 seconds for API, 3600 for static assets)
Stale-While-Revalidate
When a cached entry expires, the cache can serve the stale response immediately and trigger an async background revalidation request to the upstream. The stale response goes to the current requester with zero added latency. The upstream response arrives and updates the cache entry. The next requester gets a fresh response. This eliminates thundering-herd on popular expiring entries.
The stale-while-revalidate=N directive defines the window in seconds during which stale serving is permitted.
Conditional Requests and 304 Handling
The cache stores ETag and Last-Modified from upstream responses. On revalidation, it sends:
If-None-Match: "etag-value"If-Modified-Since: Thu, 01 Jan 2024 00:00:00 GMT
If the upstream returns 304 Not Modified, the cache serves the stored body without any data transfer from upstream. The entry TTL is reset. This makes revalidation nearly free for unchanged content.
Tag-Based Invalidation
Upstreams tag responses with entity identifiers using the Surrogate-Key or Cache-Tag header:
Cache-Tag: product-42 category-7 homepage
The cache maintains a tag → set of cache keys index. When the product-42 record is updated, a purge call deletes all cache entries tagged product-42 instantly. This is more precise than TTL-based expiry and enables real-time invalidation without short TTLs.
Bypass Rules
Cache-Control: no-store— never store, always fetch from upstream- Authenticated requests with session cookies — bypass unless upstream sends explicit
public - Streaming responses (
Transfer-Encoding: chunkedwith noContent-Length) — bypass; cannot buffer for caching - Request has
Cache-Control: no-cache— revalidate even if fresh entry exists - Admin or internal routes — explicitly excluded by config pattern
Cache Hit Rate Metrics
Emit labeled counters for every cache decision:
cache_requests_total{result="hit"}cache_requests_total{result="miss"}cache_requests_total{result="stale"}cache_requests_total{result="bypass"}cache_revalidations_total{result="304|200"}
Hit rate = hits / (hits + misses). Track per route to identify endpoints worth caching vs those that are uncacheable by design.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety