Read-Through Cache Low-Level Design: Cache Population, Stale-While-Revalidate, and Consistency Patterns

What Is a Read-Through Cache?

In a read-through cache, the cache layer itself is responsible for loading data from the database on a miss. The application calls cache.get(key) and always receives a value — the cache either returns a cached entry or transparently fetches it from the DB, populates the cache, and returns it. The application has no explicit cache-miss handling logic.

This is in contrast to a cache-aside (lazy loading) pattern, where the application checks the cache, handles a miss itself by querying the DB, and manually populates the cache. Read-through centralizes that logic in the cache layer.

Cache Loader: Pluggable Population

A cache loader is a function registered per key pattern that the cache calls on a miss. For example:

Key pattern user:* → loader queries the users table by user ID.
Key pattern product:* → loader queries the products table.

Loaders are registered at startup. The cache calls loader(key) when the pattern matches a missed key, populates the cache with the result, and returns the value to the caller — all transparently.

Stale-While-Revalidate

Stale-while-revalidate (SWR) is a technique to eliminate cache-miss latency at the cost of brief staleness. Instead of blocking the caller while the cache refreshes an expired entry:

Serve the stale value immediately.
Trigger an async background refresh to reload from DB.
The next request (after refresh completes) gets the fresh value.

SWR requires two TTL values per entry: a fresh TTL (serve without refresh) and a stale TTL (serve stale while refreshing). Once the stale TTL expires, the entry is considered fully expired and the next request blocks on a synchronous load.

Versioned Cache Entries

For stronger consistency, entries carry a version_id sourced from the DB row (e.g., a row-level version counter or updated_at timestamp). On a write to the DB:

The DB increments the version on the row.
The write path sends an invalidation message (or updates the cache directly) with the new version.
On next read, the cache compares its stored version against the expected version. A version mismatch triggers a synchronous reload.

This gives fine-grained invalidation without a blanket TTL-based expiry.

Consistent Hashing Across Cache Nodes

In a distributed read-through cache cluster, consistent hashing ensures that a given key always maps to the same cache node. This is critical for read-through correctness: if different nodes could serve the same key, they would each maintain separate loader state and potentially issue redundant DB queries. With consistent hashing, a cache miss is always handled by the same authoritative node for that key.

Background Pre-Fetch

A pre-fetch job scans the cache for entries approaching their TTL expiry and proactively refreshes them before they expire. This eliminates the latency spike that would occur if many entries expired simultaneously (thundering herd). Pre-fetch candidates are identified by querying access frequency — only frequently accessed entries are worth the pre-fetch cost.

SQL Schema

CREATE TABLE CacheEntry (
    cache_key    TEXT PRIMARY KEY,
    value        JSONB        NOT NULL,
    version_id   BIGINT,
    loaded_at    TIMESTAMPTZ  NOT NULL DEFAULT now(),
    expires_at   TIMESTAMPTZ  NOT NULL
);

CREATE INDEX idx_cache_expiry ON CacheEntry (expires_at);

CREATE TABLE CacheLoader (
    key_pattern    TEXT PRIMARY KEY,
    loader_type    TEXT         NOT NULL,
    ttl_seconds    INT          NOT NULL,
    stale_ttl_seconds INT       NOT NULL
);

Python Implementation Sketch

import time, threading
from typing import Callable, Optional

class ReadThroughCache:
    def __init__(self, db):
        self.db = db
        self.store: dict[str, dict] = {}
        self.loaders: dict[str, Callable] = {}
        self.lock = threading.Lock()

    def register_loader(self, key_pattern: str, loader_fn: Callable, ttl: int, stale_ttl: int):
        self.loaders[key_pattern] = {'fn': loader_fn, 'ttl': ttl, 'stale_ttl': stale_ttl}

    def get(self, key: str) -> Optional[dict]:
        with self.lock:
            entry = self.store.get(key)
        now = time.time()
        if entry:
            if now < entry['expires_at']:
                return entry['value']
            elif now  Optional[dict]:
        loader_cfg = self._find_loader(key)
        if not loader_cfg:
            return None
        value = loader_cfg['fn'](key)
        if value is None:
            return None
        now = time.time()
        entry = {
            'value': value,
            'expires_at': now + loader_cfg['ttl'],
            'stale_expires_at': now + loader_cfg['ttl'] + loader_cfg['stale_ttl'],
        }
        with self.lock:
            self.store[key] = entry
        return value

    def stale_while_revalidate(self, key: str):
        self.load_from_db(key)

    def invalidate_version(self, key: str, new_version: int):
        with self.lock:
            entry = self.store.get(key)
            if entry and entry.get('version_id', -1) < new_version:
                del self.store[key]

    def _find_loader(self, key: str):
        import fnmatch
        for pattern, cfg in self.loaders.items():
            if fnmatch.fnmatch(key, pattern):
                return cfg
        return None

Consistency with Write Operations

Read-through caches must be paired with a consistent write strategy. Options:

Write-through: update cache and DB synchronously on every write — cache always current.
Write-invalidate: on write, delete the cache entry; next read reloads from DB via the loader.
Version-based invalidation: as described above, propagate new version to trigger stale detection.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the staleness window in stale-while-revalidate?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The staleness window is the period between when a cache entry's fresh TTL expires and when its stale TTL expires. During this window, the stale value is served immediately and an async refresh is triggered. The staleness window duration equals stale_ttl minus fresh_ttl. After both TTLs expire, the next request blocks on a synchronous DB load.”
}
},
{
“@type”: “Question”,
“name”: “What happens if the cache loader fails on a miss?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “If the loader function throws an exception or returns null, the cache should not populate an entry and should propagate the error to the caller. Optionally, a negative cache entry (null value with a short TTL) can be stored to prevent a thundering herd of repeated DB queries for a missing key. Retries should use exponential backoff.”
}
},
{
“@type”: “Question”,
“name”: “How does version-based invalidation work in a read-through cache?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each cache entry stores the version_id of the DB row at load time. When the DB row is updated, the new version is broadcast (via pub/sub or direct cache call). The cache checks if its stored version is stale and evicts the entry if so. The next read triggers the loader to fetch the current row. This avoids TTL-based latency while maintaining consistency.”
}
},
{
“@type”: “Question”,
“name”: “Why is consistent hashing important for a read-through cache cluster?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Consistent hashing ensures a given key always routes to the same cache node. Without it, multiple nodes might each handle a miss for the same key independently, causing N parallel DB queries and N separate cached copies that can diverge. With consistent hashing, each key has exactly one authoritative node that owns its loader state, preventing redundant loads and ensuring a single cached copy.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does read-through cache populate on a miss?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “On a cache miss the cache itself (not the application) transparently fetches the value from the backing store, stores it under the requested key with a configured TTL, and returns it to the caller. This keeps cache-population logic centralized in the cache layer rather than scattered across application code.”
}
},
{
“@type”: “Question”,
“name”: “How is stale-while-revalidate implemented?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The cache returns the stale (expired) entry immediately to the caller while triggering an asynchronous background refresh against the backing store, so the next request receives a fresh value with no added latency. A per-key lock or CAS flag ensures only one background refresh is in flight at a time, preventing redundant revalidation requests.”
}
},
{
“@type”: “Question”,
“name”: “How are thundering herds prevented on cache miss?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A per-key mutex or probabilistic early expiration (PER) pattern ensures that only the first request triggers a backing-store fetch while concurrent requests for the same key block or receive the stale value. Distributed cache systems use a lock token stored atomically (e.g., via Redis SET NX) so only one node performs the fetch across the entire fleet.”
}
},
{
“@type”: “Question”,
“name”: “How does read-through handle TTL expiry?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When the TTL of a cached entry expires, the next read triggers a synchronous or asynchronous reload from the backing store using the same read-through path as a cold miss. To avoid latency spikes at expiry boundaries, TTL values are often jittered (randomized within a range) so cache entries for different keys expire at staggered times rather than all at once.”
}
}
]
}