Cache Invalidation System: Low-Level Design
Cache invalidation is famously the hardest problem in computer science. A well-designed invalidation system must handle entity-level purges without requiring the caller to enumerate every affected key, survive cache stampedes under high concurrency, and support write strategies suited to different consistency requirements. This post covers tag-based invalidation, write-through, write-behind, stampede prevention, and dependency tracking.
Cache Key Taxonomy
A consistent key naming convention is the foundation of any manageable cache layer. Two categories of keys cover most patterns:
- Entity keys:
user:123:profile,product:456:detail— a single object identified by its ID. - Query keys:
user:123:orders:page:1,search:laptop:sort:price:page:2— a result set derived from one or more entities.
Query keys are the hard part. When user:123 is updated, every query key that included data from user 123 must be invalidated. Enumerating them individually is error-prone. Tag-based invalidation solves this.
Tag-Based Invalidation
Each cache entry is associated with one or more tags at write time. A tag represents an entity whose change should invalidate all entries carrying that tag. For example, a page showing user 123's orders carries both tag user:123 and tag order:list:user:123.
In Redis, a tag is implemented as a Set whose members are the cache keys carrying that tag. Invalidating a tag means fetching all member keys and deleting them in a pipeline.
import redis
r = redis.Redis()
def set_with_tags(key: str, value: str, tags: list[str], ttl: int) -> None:
"""Store a cache value and register it under each tag."""
pipe = r.pipeline()
pipe.setex(key, ttl, value)
for tag in tags:
tag_set_key = f"tag:{tag}"
pipe.sadd(tag_set_key, key)
# Tag set TTL is extended to at least 2x the entry TTL
pipe.expire(tag_set_key, ttl * 2)
pipe.execute()
def invalidate_by_tag(tag: str) -> int:
"""Delete all cache keys associated with a tag. Returns count deleted."""
tag_set_key = f"tag:{tag}"
members = r.smembers(tag_set_key)
if not members:
return 0
pipe = r.pipeline()
for key in members:
pipe.delete(key)
pipe.delete(tag_set_key)
results = pipe.execute()
# results[-1] is the delete count for the tag set itself
return len(members)
The pipeline batches all DEL commands into a single round-trip. For very large tag sets (thousands of members), use SSCAN to iterate in chunks and delete in batches of 500 to avoid blocking the Redis event loop.
Write-Through Caching
On a write, the cache is updated synchronously before returning to the caller. The DB and cache are always in sync from the caller's perspective. The tradeoff is added latency on every write and the need to handle the case where the cache write succeeds but the DB write fails (or vice versa).
def update_user_profile(user_id: int, data: dict, db) -> None:
"""Write-through: update DB, then update cache."""
db.execute(
"UPDATE users SET name=%s, email=%s WHERE id=%s",
(data["name"], data["email"], user_id)
)
key = f"user:{user_id}:profile"
r.setex(key, 300, json.dumps(data))
# Invalidate any query keys tagged with this user
invalidate_by_tag(f"user:{user_id}")
Write-Behind (Write-Back) Caching
On a write, the cache is updated immediately and the DB write is deferred to an async worker. This reduces write latency to near-zero but risks data loss if the cache is evicted or the worker crashes before the DB write completes. Suitable for metrics, counters, and non-critical user preference data.
import queue
_write_queue: queue.Queue = queue.Queue()
def update_user_last_seen(user_id: int, ts: str) -> None:
"""Write-behind: cache immediately, DB write enqueued."""
key = f"user:{user_id}:last_seen"
r.setex(key, 3600, ts)
_write_queue.put({"user_id": user_id, "last_seen": ts})
def flush_write_queue(db) -> int:
"""Worker: drains the queue and persists to DB."""
count = 0
while not _write_queue.empty():
item = _write_queue.get_nowait()
db.execute(
"UPDATE users SET last_seen=%s WHERE id=%s",
(item["last_seen"], item["user_id"])
)
count += 1
return count
Cache Stampede Prevention: XFetch Algorithm
A stampede occurs when a cached value expires and many concurrent requests all find a miss, all fetch from the DB simultaneously, and all try to repopulate the cache. The XFetch algorithm prevents this by probabilistically recomputing the value slightly before its TTL expires, so only one request triggers a refresh while others still get the cached value.
import time, math, random
def xfetch_recompute(key: str, ttl: int, beta: float, fetch_fn) -> str:
"""
XFetch probabilistic early expiration.
beta: controls aggressiveness of early recomputation (default 1.0).
fetch_fn: callable that computes the fresh value.
"""
data = r.get(key)
remaining_ttl = r.ttl(key)
if data is not None and remaining_ttl > 0:
# delta: time it took to compute the value last time (stored alongside)
delta_key = f"{key}:delta"
delta = float(r.get(delta_key) or 1)
# Probabilistic check: recompute early if the condition holds
if remaining_ttl > delta * beta * math.log(random.random() * -1):
return data.decode()
# Cache miss or early recomputation triggered
start = time.monotonic()
value = fetch_fn()
delta = time.monotonic() - start
pipe = r.pipeline()
pipe.setex(key, ttl, value)
pipe.setex(f"{key}:delta", ttl, str(delta))
pipe.execute()
return value
For simpler stampede prevention, a distributed lock (Redis SET NX PX) ensures only one worker refreshes the key; other workers receive a slightly stale value or wait briefly.
Dependency Tracking and Cascading Invalidation
Some cache entries depend on other cache entries. Invalidating a parent should cascade to all children. A CacheDependency table (or Redis Set) tracks these relationships. When a parent key is invalidated, its children are queued for invalidation recursively.
Database Schema
-- Optional relational backing for tag and dependency metadata
-- (Redis is the primary store; these tables serve audit/debug purposes)
CREATE TABLE cache_tag (
cache_key VARCHAR(512) NOT NULL,
tag VARCHAR(255) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (cache_key, tag)
);
CREATE INDEX idx_ct_tag ON cache_tag(tag);
CREATE TABLE cache_dependency (
parent_key VARCHAR(512) NOT NULL,
child_key VARCHAR(512) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (parent_key, child_key)
);
CREATE INDEX idx_cd_parent ON cache_dependency(parent_key);
Write Strategy Comparison
| Strategy | Write Latency | Consistency | Data Loss Risk | Best For |
|---|---|---|---|---|
| Write-through | Higher | Strong | None | User profiles, orders |
| Write-behind | Low | Eventual | Low (on crash) | Counters, last-seen |
| Cache-aside | No overhead | Eventual | None | Read-heavy, rare writes |
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does tag-based invalidation differ from key-based invalidation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Key-based invalidation requires the writer to know every cache key affected by a change and delete them individually. Tag-based invalidation requires only that the writer specifies which entity changed (e.g., user:123); the cache layer looks up all keys tagged with that entity and deletes them. Tags decouple the write path from needing to enumerate downstream keys.”
}
},
{
“@type”: “Question”,
“name”: “How does the XFetch algorithm prevent cache stampedes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “XFetch stores the time it took to compute the cached value (delta). As expiry approaches, it runs a probabilistic check that becomes more likely to trigger recomputation the closer the TTL gets to delta * beta. This means a single request recomputes the value slightly early while all others still receive the cached result, preventing a synchronized miss under high concurrency.”
}
},
{
“@type”: “Question”,
“name”: “When should write-behind caching be used instead of write-through?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Write-behind is appropriate when write latency matters more than strict consistency and the data can tolerate brief staleness or minor loss (e.g., view counters, last-seen timestamps, click tracking). Write-through is preferred for data where losing an update is unacceptable, such as financial balances, orders, or user credentials.”
}
},
{
“@type”: “Question”,
“name”: “How are cascading invalidations handled without infinite loops?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The invalidation worker tracks visited keys in a set during a single invalidation traversal. Before enqueuing a child key for invalidation, it checks whether the key is already in the visited set and skips it if so. This prevents cycles in the dependency graph from causing infinite recursion.”
}
}
]
}
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does tag-based cache invalidation work in Redis?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each cache entry is associated with one or more tags stored as Redis sets (SADD tag:entity:123 cache_key); invalidating entity 123 fetches all keys from the set and deletes them in a pipeline.”
}
},
{
“@type”: “Question”,
“name”: “How does the XFetch algorithm prevent cache stampedes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Before a key expires, a probabilistic check (random() < beta * recompute_time * log(remaining_ttl)) triggers early recomputation by one worker; the key is refreshed before expiry so other requests keep hitting the cache."
}
},
{
"@type": "Question",
"name": "When should write-through be preferred over write-behind?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Write-through is preferred when read-after-write consistency is required; write-behind is preferred when write throughput is the bottleneck and brief cache-DB divergence is acceptable."
}
},
{
"@type": "Question",
"name": "How are cascading invalidations handled for dependent cache keys?",
"acceptedAnswer": {
"@type": "Answer",
"text": "CacheDependency tracks parent-child key relationships; when a parent key is invalidated, a recursive traversal invalidates all descendants in the dependency graph."
}
}
]
}
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering