Cache Invalidation System Low-Level Design: Tag-Based Invalidation, Write-Through, and Stampede Prevention

Cache Invalidation System: Low-Level Design

Cache invalidation is famously the hardest problem in computer science. A well-designed invalidation system must handle entity-level purges without requiring the caller to enumerate every affected key, survive cache stampedes under high concurrency, and support write strategies suited to different consistency requirements. This post covers tag-based invalidation, write-through, write-behind, stampede prevention, and dependency tracking.

Cache Key Taxonomy

A consistent key naming convention is the foundation of any manageable cache layer. Two categories of keys cover most patterns:

  • Entity keys: user:123:profile, product:456:detail — a single object identified by its ID.
  • Query keys: user:123:orders:page:1, search:laptop:sort:price:page:2 — a result set derived from one or more entities.

Query keys are the hard part. When user:123 is updated, every query key that included data from user 123 must be invalidated. Enumerating them individually is error-prone. Tag-based invalidation solves this.

Tag-Based Invalidation

Each cache entry is associated with one or more tags at write time. A tag represents an entity whose change should invalidate all entries carrying that tag. For example, a page showing user 123's orders carries both tag user:123 and tag order:list:user:123.

In Redis, a tag is implemented as a Set whose members are the cache keys carrying that tag. Invalidating a tag means fetching all member keys and deleting them in a pipeline.


import redis

r = redis.Redis()

def set_with_tags(key: str, value: str, tags: list[str], ttl: int) -> None:
    """Store a cache value and register it under each tag."""
    pipe = r.pipeline()
    pipe.setex(key, ttl, value)
    for tag in tags:
        tag_set_key = f"tag:{tag}"
        pipe.sadd(tag_set_key, key)
        # Tag set TTL is extended to at least 2x the entry TTL
        pipe.expire(tag_set_key, ttl * 2)
    pipe.execute()

def invalidate_by_tag(tag: str) -> int:
    """Delete all cache keys associated with a tag. Returns count deleted."""
    tag_set_key = f"tag:{tag}"
    members = r.smembers(tag_set_key)
    if not members:
        return 0
    pipe = r.pipeline()
    for key in members:
        pipe.delete(key)
    pipe.delete(tag_set_key)
    results = pipe.execute()
    # results[-1] is the delete count for the tag set itself
    return len(members)

The pipeline batches all DEL commands into a single round-trip. For very large tag sets (thousands of members), use SSCAN to iterate in chunks and delete in batches of 500 to avoid blocking the Redis event loop.

Write-Through Caching

On a write, the cache is updated synchronously before returning to the caller. The DB and cache are always in sync from the caller's perspective. The tradeoff is added latency on every write and the need to handle the case where the cache write succeeds but the DB write fails (or vice versa).


def update_user_profile(user_id: int, data: dict, db) -> None:
    """Write-through: update DB, then update cache."""
    db.execute(
        "UPDATE users SET name=%s, email=%s WHERE id=%s",
        (data["name"], data["email"], user_id)
    )
    key = f"user:{user_id}:profile"
    r.setex(key, 300, json.dumps(data))
    # Invalidate any query keys tagged with this user
    invalidate_by_tag(f"user:{user_id}")

Write-Behind (Write-Back) Caching

On a write, the cache is updated immediately and the DB write is deferred to an async worker. This reduces write latency to near-zero but risks data loss if the cache is evicted or the worker crashes before the DB write completes. Suitable for metrics, counters, and non-critical user preference data.


import queue

_write_queue: queue.Queue = queue.Queue()

def update_user_last_seen(user_id: int, ts: str) -> None:
    """Write-behind: cache immediately, DB write enqueued."""
    key = f"user:{user_id}:last_seen"
    r.setex(key, 3600, ts)
    _write_queue.put({"user_id": user_id, "last_seen": ts})

def flush_write_queue(db) -> int:
    """Worker: drains the queue and persists to DB."""
    count = 0
    while not _write_queue.empty():
        item = _write_queue.get_nowait()
        db.execute(
            "UPDATE users SET last_seen=%s WHERE id=%s",
            (item["last_seen"], item["user_id"])
        )
        count += 1
    return count

Cache Stampede Prevention: XFetch Algorithm

A stampede occurs when a cached value expires and many concurrent requests all find a miss, all fetch from the DB simultaneously, and all try to repopulate the cache. The XFetch algorithm prevents this by probabilistically recomputing the value slightly before its TTL expires, so only one request triggers a refresh while others still get the cached value.


import time, math, random

def xfetch_recompute(key: str, ttl: int, beta: float, fetch_fn) -> str:
    """
    XFetch probabilistic early expiration.
    beta: controls aggressiveness of early recomputation (default 1.0).
    fetch_fn: callable that computes the fresh value.
    """
    data = r.get(key)
    remaining_ttl = r.ttl(key)

    if data is not None and remaining_ttl > 0:
        # delta: time it took to compute the value last time (stored alongside)
        delta_key = f"{key}:delta"
        delta = float(r.get(delta_key) or 1)
        # Probabilistic check: recompute early if the condition holds
        if remaining_ttl > delta * beta * math.log(random.random() * -1):
            return data.decode()

    # Cache miss or early recomputation triggered
    start = time.monotonic()
    value = fetch_fn()
    delta = time.monotonic() - start

    pipe = r.pipeline()
    pipe.setex(key, ttl, value)
    pipe.setex(f"{key}:delta", ttl, str(delta))
    pipe.execute()
    return value

For simpler stampede prevention, a distributed lock (Redis SET NX PX) ensures only one worker refreshes the key; other workers receive a slightly stale value or wait briefly.

Dependency Tracking and Cascading Invalidation

Some cache entries depend on other cache entries. Invalidating a parent should cascade to all children. A CacheDependency table (or Redis Set) tracks these relationships. When a parent key is invalidated, its children are queued for invalidation recursively.

Database Schema


-- Optional relational backing for tag and dependency metadata
-- (Redis is the primary store; these tables serve audit/debug purposes)

CREATE TABLE cache_tag (
    cache_key VARCHAR(512) NOT NULL,
    tag       VARCHAR(255) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (cache_key, tag)
);
CREATE INDEX idx_ct_tag ON cache_tag(tag);

CREATE TABLE cache_dependency (
    parent_key VARCHAR(512) NOT NULL,
    child_key  VARCHAR(512) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (parent_key, child_key)
);
CREATE INDEX idx_cd_parent ON cache_dependency(parent_key);

Write Strategy Comparison

Strategy Write Latency Consistency Data Loss Risk Best For
Write-through Higher Strong None User profiles, orders
Write-behind Low Eventual Low (on crash) Counters, last-seen
Cache-aside No overhead Eventual None Read-heavy, rare writes

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

Scroll to Top