Query Cache Low-Level Design
A query cache stores database query results in fast storage (typically Redis) so repeated identical queries can be served without hitting the database. Done correctly it reduces database load significantly for read-heavy workloads. Done incorrectly it serves stale data or makes invalidation impossible to reason about.
Cache Key Generation
The cache key must uniquely identify a query and its parameters. The generation process:
- Normalize the query string: lowercase, collapse whitespace, sort parameter names alphabetically
- Serialize bound parameter values deterministically
- If results are user-scoped or tenant-scoped: include
user_idortenant_idin the key material - SHA-256 hash the combined string → use as the Redis key
Normalization is critical: two queries that differ only in whitespace or parameter ordering should produce the same cache key and share a cache entry.
Cache Storage
Store results in Redis. The value is the serialized query result (list of rows). Use SET key value EX ttl_seconds. Choose serialization format based on access patterns: MessagePack or Protocol Buffers are more compact than JSON and faster to deserialize. For large result sets, compress before storing — gzip typically reduces JSON result size by 60–80%.
TTL Selection
TTL should match data freshness requirements, not be a uniform global setting:
- Reference/config data (countries, categories, settings): 1 hour
- Product catalog: 5 minutes
- Real-time inventory or pricing: 30 seconds
- Per-second metrics or live counts: no cache
Per-query TTL configuration is stored alongside the query definition or set by the calling code via a cache hint.
Cache Lookup Flow
- Compute cache key from query + parameters
Redis GET key- Hit: decompress and deserialize → return result; record cache hit metric
- Miss: execute query against database → serialize and compress result →
Redis SET key value EX ttl NX(NX prevents overwriting a value another thread just set) → return result; record cache miss metric
Event-Based Invalidation
TTL-based expiry alone causes stale reads for the TTL duration. Event-based invalidation removes cache entries immediately on write:
- Maintain a mapping from table name to the set of cache keys that include data from that table
- On any write to table
orders: delete all cache keys in thetable:ordersset - Implement with Redis sets:
SADD table:orders {key1}on cache write;SMEMBERS table:orders+DELon table write
Tag-Based Invalidation
For entity-level granularity, tag cache entries with entity identifiers. When order #123 is updated, invalidate only entries tagged order:123, not all order queries:
- On cache write:
SADD tag:order:123 {cache_key} - On update of order 123:
SMEMBERS tag:order:123→DELeach key →DEL tag:order:123
More targeted than table-level invalidation; preserves cache entries for unrelated orders.
Cache Bypass Cases
Never cache queries that contain non-deterministic functions:
NOW(),CURRENT_TIMESTAMP,RAND(),RANDOM()— different result on each call- Queries that are user-specific but the user_id was not included in the cache key
- Queries in a write transaction — the transaction may not be committed yet
Negative Caching
An empty result set is still a valid result. Cache [] (empty result) with a short TTL (30 seconds). Without negative caching, repeated requests for a non-existent entity hit the database on every request. A short TTL ensures freshness if the entity is later created.
Thundering Herd Prevention
When a popular cache key expires, many concurrent requests will all miss and query the database simultaneously. Prevent this with a distributed lock:
- On cache miss: attempt to acquire a Redis lock for the cache key (
SET lock:{key} 1 EX 5 NX) - If lock acquired: execute query, populate cache, release lock
- If lock not acquired: wait briefly and retry the cache GET — the first holder will have populated it
Cache Warming and Metrics
- Cache warming: on deploy, pre-populate cache with results for the most frequently requested queries (identified from query analytics). Avoids cold-start latency where the first minutes after deploy have low hit rates.
- Metrics: hit rate, miss rate, eviction rate (Redis memory pressure), cache latency by key pattern, invalidation count per table/tag — alert if hit rate drops significantly (may indicate an invalidation bug or query pattern change)
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you generate a stable cache key for a database query result?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Normalize the query before hashing to avoid key misses from semantically identical but textually different queries. Normalization steps: parse the SQL into an AST, canonicalize whitespace and keyword casing, sort commutative clauses (e.g., IN list elements, AND predicates with no order dependency), then serialize back to a canonical string. Append the parameter values in order to the canonical string, then SHA-256 hash the result. Prefix the hash with the database name and schema version to namespace keys across environments and invalidate all keys on DDL changes by bumping the schema version. For ORM-generated queries, implement normalization at the ORM layer rather than the cache layer to intercept queries before they reach the wire.”
}
},
{
“@type”: “Question”,
“name”: “Compare tag-based invalidation and TTL-based expiry for query cache invalidation — when do you use each?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “TTL-based expiry is simple: every cached result expires after N seconds regardless of whether the underlying data changed. Use it when the result is naturally time-bounded (e.g., a leaderboard refreshed every 60 seconds) or when approximate freshness is acceptable and the invalidation cost outweighs consistency benefit. Tag-based invalidation associates each cached result with one or more entity tags (e.g., 'user:42', 'product:99'). When a write touches a tagged entity, invalidate all cache entries sharing that tag. This gives strong consistency but requires tracking which queries depend on which entities — either via a reverse index in Redis (tag → set of cache keys) or by having the application declare dependencies at cache-store time. Use tag-based invalidation for OLTP workloads where data changes frequently and stale reads cause visible correctness bugs. Combine both: short TTL as a safety net, tag invalidation for immediate consistency on writes.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle stale reads in a query cache during high write throughput without making every read synchronously wait for invalidation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use a stale-while-revalidate pattern: serve the cached (potentially stale) result immediately, then asynchronously dispatch a background job to re-execute the query and update the cache. The background job uses a per-key lock (Redis SETNX) to ensure only one revalidation runs at a time per key, preventing a thundering herd of revalidation goroutines. Set two TTLs per entry: a 'fresh' TTL (e.g., 5 seconds) and a 'stale' TTL (e.g., 60 seconds). Within the fresh window, serve from cache directly. Between fresh and stale TTL expiry, serve stale while revalidating in background. After the stale TTL, block and recompute synchronously. For write-heavy workloads, implement write-through caching: on each write, update both the database and the cache in the same transaction (using a two-phase approach or best-effort with short TTL fallback) so reads always find a warm cache entry.”
}
},
{
“@type”: “Question”,
“name”: “How do you prevent a cache stampede when a popular query's TTL expires simultaneously for many callers?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use three complementary techniques: (1) Probabilistic early expiration (XFetch) — each cache reader, when checking a key, computes a 'virtual expiry' time that randomly triggers recomputation before the actual TTL expires, with probability inversely proportional to remaining TTL. This spreads recomputation over a window rather than letting all readers hit the cache miss at exactly the same time. (2) Request coalescing — when a cache miss is detected, one designated leader (selected via Redis SETNX on a 'recomputing' key) executes the database query, while other concurrent readers either wait on a channel/condition variable or serve the stale result if available. (3) Jittered TTLs — when populating the cache, add a random jitter (e.g., ±10% of the base TTL) to the expiry so that entries loaded together do not all expire at the same wall-clock second.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety