Query Cache Low-Level Design: Result Caching, Cache Invalidation, and Stale Read Handling

Query Cache Low-Level Design

A query cache stores database query results in fast storage (typically Redis) so repeated identical queries can be served without hitting the database. Done correctly it reduces database load significantly for read-heavy workloads. Done incorrectly it serves stale data or makes invalidation impossible to reason about.

Cache Key Generation

The cache key must uniquely identify a query and its parameters. The generation process:

  1. Normalize the query string: lowercase, collapse whitespace, sort parameter names alphabetically
  2. Serialize bound parameter values deterministically
  3. If results are user-scoped or tenant-scoped: include user_id or tenant_id in the key material
  4. SHA-256 hash the combined string → use as the Redis key

Normalization is critical: two queries that differ only in whitespace or parameter ordering should produce the same cache key and share a cache entry.

Cache Storage

Store results in Redis. The value is the serialized query result (list of rows). Use SET key value EX ttl_seconds. Choose serialization format based on access patterns: MessagePack or Protocol Buffers are more compact than JSON and faster to deserialize. For large result sets, compress before storing — gzip typically reduces JSON result size by 60–80%.

TTL Selection

TTL should match data freshness requirements, not be a uniform global setting:

  • Reference/config data (countries, categories, settings): 1 hour
  • Product catalog: 5 minutes
  • Real-time inventory or pricing: 30 seconds
  • Per-second metrics or live counts: no cache

Per-query TTL configuration is stored alongside the query definition or set by the calling code via a cache hint.

Cache Lookup Flow

  1. Compute cache key from query + parameters
  2. Redis GET key
  3. Hit: decompress and deserialize → return result; record cache hit metric
  4. Miss: execute query against database → serialize and compress result → Redis SET key value EX ttl NX (NX prevents overwriting a value another thread just set) → return result; record cache miss metric

Event-Based Invalidation

TTL-based expiry alone causes stale reads for the TTL duration. Event-based invalidation removes cache entries immediately on write:

  • Maintain a mapping from table name to the set of cache keys that include data from that table
  • On any write to table orders: delete all cache keys in the table:orders set
  • Implement with Redis sets: SADD table:orders {key1} on cache write; SMEMBERS table:orders + DEL on table write

Tag-Based Invalidation

For entity-level granularity, tag cache entries with entity identifiers. When order #123 is updated, invalidate only entries tagged order:123, not all order queries:

  • On cache write: SADD tag:order:123 {cache_key}
  • On update of order 123: SMEMBERS tag:order:123DEL each key → DEL tag:order:123

More targeted than table-level invalidation; preserves cache entries for unrelated orders.

Cache Bypass Cases

Never cache queries that contain non-deterministic functions:

  • NOW(), CURRENT_TIMESTAMP, RAND(), RANDOM() — different result on each call
  • Queries that are user-specific but the user_id was not included in the cache key
  • Queries in a write transaction — the transaction may not be committed yet

Negative Caching

An empty result set is still a valid result. Cache [] (empty result) with a short TTL (30 seconds). Without negative caching, repeated requests for a non-existent entity hit the database on every request. A short TTL ensures freshness if the entity is later created.

Thundering Herd Prevention

When a popular cache key expires, many concurrent requests will all miss and query the database simultaneously. Prevent this with a distributed lock:

  1. On cache miss: attempt to acquire a Redis lock for the cache key (SET lock:{key} 1 EX 5 NX)
  2. If lock acquired: execute query, populate cache, release lock
  3. If lock not acquired: wait briefly and retry the cache GET — the first holder will have populated it

Cache Warming and Metrics

  • Cache warming: on deploy, pre-populate cache with results for the most frequently requested queries (identified from query analytics). Avoids cold-start latency where the first minutes after deploy have low hit rates.
  • Metrics: hit rate, miss rate, eviction rate (Redis memory pressure), cache latency by key pattern, invalidation count per table/tag — alert if hit rate drops significantly (may indicate an invalidation bug or query pattern change)

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

Scroll to Top