Cache invalidation — ensuring cached data reflects the current state of the source of truth — is famously difficult. Phil Karlton’s quip (“There are only two hard things in computer science: cache invalidation and naming things”) captures a real engineering challenge. A stale cache causes incorrect behavior; an overly aggressive cache invalidation wastes the performance benefits of caching. Understanding write-through, write-around, write-back, and invalidation strategies is essential for system design interviews.
Cache Write Strategies
Write-through: every write updates both the cache and the database synchronously before returning success. Reads always hit the cache (warm cache). Consistency: the cache is always in sync with the database. Downside: write latency = database write latency (no write performance gain). Write amplification: every write hits both cache and database even if the data is never read. Use when: data is frequently read after write; consistency is critical. Write-around (write to database, invalidate cache): write goes directly to the database; the cache entry is deleted (not updated). The next read fetches from the database and repopulates the cache. Advantage: avoids caching data that won’t be read again (cache pollution). Disadvantage: the first read after a write is a cache miss (higher latency). Use when: write-heavy workloads with infrequent reads of the same data. Write-back (write-behind): write updates the cache only; the database is updated asynchronously in the background. Lowest write latency (cache writes are fast in-memory operations). Risk: if the cache node fails before the async write to the database, data is lost. Use when: write throughput is critical and some data loss is acceptable (analytics counters, view counts).
Cache Invalidation Patterns
Time-based expiry (TTL): set a TTL on every cached entry. After TTL expires, the next read fetches fresh data from the source. Simplest approach — no invalidation logic needed. Trade-off: stale data for up to TTL duration; cache misses at expiry time may spike (thundering herd). Choose TTL based on acceptable staleness: user profiles (5 minutes), product prices (1 minute), stock levels (10 seconds). Event-based invalidation: when data changes in the database, explicitly delete or update the cache entry. Implementation: the service layer that writes to the database also calls Redis DEL or SET after the database write succeeds. More complex but immediate consistency. Risk: if the cache deletion fails (Redis unavailable), the cache is stale until TTL expires — use short TTLs as a safety net. Change Data Capture (CDC) invalidation: a CDC pipeline (Debezium) reads database changes and publishes them to Kafka. A cache invalidation consumer subscribes and deletes the corresponding cache keys. Decoupled from the write path — the application doesn’t need to know which cache keys to invalidate.
Cache Key Design
Cache key design determines granularity and invalidation complexity. Entity-level cache: cache entire objects (user:{user_id} → full user JSON). Invalidation: DEL user:{user_id} on any user update. Simple but coarse — a comment count change invalidates the entire user object. Field-level cache: cache individual fields (user:{user_id}:name, user:{user_id}:email). Granular invalidation — only invalidate the specific field that changed. More Redis keys to manage. Aggregation cache: cache computed results (top_products_by_sales_{category}_{date}). Invalidation is complex — any order change could affect the aggregation. Use short TTLs (5-10 minutes) and accept slight staleness rather than attempting immediate invalidation. Cache key versioning: prefix all cache keys with a version number (v1:user:123). To invalidate all cached data: increment the global version number (v2:). All v1 keys are now unreachable — effectively a cache flush without FLUSHDB. Old keys expire via TTL. Namespace invalidation: tag cache entries with a namespace (user:*, product:*). To invalidate all user data: SCAN + DEL all keys matching user:*. Use Lua scripts for atomic scan-and-delete (SCAN + DEL is not atomic in Redis).
Thundering Herd and Cache Stampede
When a popular cache entry expires, all requests simultaneously see a miss and all go to the database — causing a spike that can overwhelm the database. Thundering herd solutions: (1) Mutex/lock: only the first request to see the miss acquires a lock and fetches from the database. Other requests wait for the lock, then read the freshly-populated cache. Implementation: SET lock:{key} 1 NX PX 5000 — only the winner (NX = set if not exists) fetches from the database. Waiters sleep and retry reading the cache. (2) Probabilistic early expiration: before the TTL expires, some requests (probabilistically) begin refreshing the cache. As expiry approaches, more requests trigger a refresh — spreading the database load rather than a simultaneous stampede. Math: refresh_probability = -current_time + TTL + beta * log(rand()). (3) Background refresh: cache entries are never actually expired — a background job proactively refreshes entries before they expire. The application always reads from cache; the background job keeps cache warm. Requires knowing in advance which cache entries to refresh (not always feasible).
Distributed Cache Consistency
With multiple cache nodes (Redis Cluster, Memcached cluster), maintaining consistency across shards adds complexity. Look-aside cache with write-invalidate: (1) Read: check cache first; on miss, read from database and populate cache. (2) Write: write to database, then DELETE the cache key (not update). The next read repopulates from the fresh database value. Why delete instead of update: updating the cache after a database write has a race condition — two concurrent writers may store different values, and the last writer wins (overwriting the correct value with a stale one). Deleting is safe — the next reader always fetches from the authoritative database. Two-cache race condition: with read replicas, reader A reads from replica (which may be behind), writes to cache. Meanwhile, writer B updates the primary database. Reader A’s cache entry is now stale. Solution: on write, invalidate all cache entries for the affected keys. Use a short TTL as a safety net for any invalidation failures. Read from primary for consistency-sensitive reads (e.g., immediately after a write), at the cost of higher database load.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering
See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Atlassian Interview Guide
See also: Coinbase Interview Guide
See also: Shopify Interview Guide
See also: Snap Interview Guide
See also: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems