Cache Invalidation System: Low-Level Design
Cache invalidation is famously the hardest problem in computer science. A well-designed invalidation system must handle entity-level purges without requiring the caller to enumerate every affected key, survive cache stampedes under high concurrency, and support write strategies suited to different consistency requirements. This post covers tag-based invalidation, write-through, write-behind, stampede prevention, and dependency tracking.
Cache Key Taxonomy
A consistent key naming convention is the foundation of any manageable cache layer. Two categories of keys cover most patterns:
- Entity keys:
user:123:profile,product:456:detail— a single object identified by its ID. - Query keys:
user:123:orders:page:1,search:laptop:sort:price:page:2— a result set derived from one or more entities.
Query keys are the hard part. When user:123 is updated, every query key that included data from user 123 must be invalidated. Enumerating them individually is error-prone. Tag-based invalidation solves this.
Tag-Based Invalidation
Each cache entry is associated with one or more tags at write time. A tag represents an entity whose change should invalidate all entries carrying that tag. For example, a page showing user 123's orders carries both tag user:123 and tag order:list:user:123.
In Redis, a tag is implemented as a Set whose members are the cache keys carrying that tag. Invalidating a tag means fetching all member keys and deleting them in a pipeline.
import redis
r = redis.Redis()
def set_with_tags(key: str, value: str, tags: list[str], ttl: int) -> None:
"""Store a cache value and register it under each tag."""
pipe = r.pipeline()
pipe.setex(key, ttl, value)
for tag in tags:
tag_set_key = f"tag:{tag}"
pipe.sadd(tag_set_key, key)
# Tag set TTL is extended to at least 2x the entry TTL
pipe.expire(tag_set_key, ttl * 2)
pipe.execute()
def invalidate_by_tag(tag: str) -> int:
"""Delete all cache keys associated with a tag. Returns count deleted."""
tag_set_key = f"tag:{tag}"
members = r.smembers(tag_set_key)
if not members:
return 0
pipe = r.pipeline()
for key in members:
pipe.delete(key)
pipe.delete(tag_set_key)
results = pipe.execute()
# results[-1] is the delete count for the tag set itself
return len(members)
The pipeline batches all DEL commands into a single round-trip. For very large tag sets (thousands of members), use SSCAN to iterate in chunks and delete in batches of 500 to avoid blocking the Redis event loop.
Write-Through Caching
On a write, the cache is updated synchronously before returning to the caller. The DB and cache are always in sync from the caller's perspective. The tradeoff is added latency on every write and the need to handle the case where the cache write succeeds but the DB write fails (or vice versa).
def update_user_profile(user_id: int, data: dict, db) -> None:
"""Write-through: update DB, then update cache."""
db.execute(
"UPDATE users SET name=%s, email=%s WHERE id=%s",
(data["name"], data["email"], user_id)
)
key = f"user:{user_id}:profile"
r.setex(key, 300, json.dumps(data))
# Invalidate any query keys tagged with this user
invalidate_by_tag(f"user:{user_id}")
Write-Behind (Write-Back) Caching
On a write, the cache is updated immediately and the DB write is deferred to an async worker. This reduces write latency to near-zero but risks data loss if the cache is evicted or the worker crashes before the DB write completes. Suitable for metrics, counters, and non-critical user preference data.
import queue
_write_queue: queue.Queue = queue.Queue()
def update_user_last_seen(user_id: int, ts: str) -> None:
"""Write-behind: cache immediately, DB write enqueued."""
key = f"user:{user_id}:last_seen"
r.setex(key, 3600, ts)
_write_queue.put({"user_id": user_id, "last_seen": ts})
def flush_write_queue(db) -> int:
"""Worker: drains the queue and persists to DB."""
count = 0
while not _write_queue.empty():
item = _write_queue.get_nowait()
db.execute(
"UPDATE users SET last_seen=%s WHERE id=%s",
(item["last_seen"], item["user_id"])
)
count += 1
return count
Cache Stampede Prevention: XFetch Algorithm
A stampede occurs when a cached value expires and many concurrent requests all find a miss, all fetch from the DB simultaneously, and all try to repopulate the cache. The XFetch algorithm prevents this by probabilistically recomputing the value slightly before its TTL expires, so only one request triggers a refresh while others still get the cached value.
import time, math, random
def xfetch_recompute(key: str, ttl: int, beta: float, fetch_fn) -> str:
"""
XFetch probabilistic early expiration.
beta: controls aggressiveness of early recomputation (default 1.0).
fetch_fn: callable that computes the fresh value.
"""
data = r.get(key)
remaining_ttl = r.ttl(key)
if data is not None and remaining_ttl > 0:
# delta: time it took to compute the value last time (stored alongside)
delta_key = f"{key}:delta"
delta = float(r.get(delta_key) or 1)
# Probabilistic check: recompute early if the condition holds
if remaining_ttl > delta * beta * math.log(random.random() * -1):
return data.decode()
# Cache miss or early recomputation triggered
start = time.monotonic()
value = fetch_fn()
delta = time.monotonic() - start
pipe = r.pipeline()
pipe.setex(key, ttl, value)
pipe.setex(f"{key}:delta", ttl, str(delta))
pipe.execute()
return value
For simpler stampede prevention, a distributed lock (Redis SET NX PX) ensures only one worker refreshes the key; other workers receive a slightly stale value or wait briefly.
Dependency Tracking and Cascading Invalidation
Some cache entries depend on other cache entries. Invalidating a parent should cascade to all children. A CacheDependency table (or Redis Set) tracks these relationships. When a parent key is invalidated, its children are queued for invalidation recursively.
Database Schema
-- Optional relational backing for tag and dependency metadata
-- (Redis is the primary store; these tables serve audit/debug purposes)
CREATE TABLE cache_tag (
cache_key VARCHAR(512) NOT NULL,
tag VARCHAR(255) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (cache_key, tag)
);
CREATE INDEX idx_ct_tag ON cache_tag(tag);
CREATE TABLE cache_dependency (
parent_key VARCHAR(512) NOT NULL,
child_key VARCHAR(512) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (parent_key, child_key)
);
CREATE INDEX idx_cd_parent ON cache_dependency(parent_key);
Write Strategy Comparison
| Strategy | Write Latency | Consistency | Data Loss Risk | Best For |
|---|---|---|---|---|
| Write-through | Higher | Strong | None | User profiles, orders |
| Write-behind | Low | Eventual | Low (on crash) | Counters, last-seen |
| Cache-aside | No overhead | Eventual | None | Read-heavy, rare writes |
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering