What Is a Write-Behind Cache?
A write-behind cache (also called write-back cache) acknowledges writes to the client immediately after updating the cache, then flushes data to the database asynchronously. The primary goal is removing synchronous DB writes from the hot request path, dramatically reducing write latency for high-throughput workloads.
The tradeoff is durability: if the cache crashes before a flush completes, unsynced writes are lost. Mitigating this risk is the central design challenge of a write-behind system.
Core Write Flow
On every write:
- The application calls
cache.write(key, value). - The cache stores the value in memory and marks the entry as dirty.
- The cache appends a record to a Write-Ahead Log (WAL) on durable storage before acknowledging the client.
- The client receives a success response. The DB has not yet been updated.
The WAL entry is written synchronously to disk before the ack. This ensures that even on a crash, the write is recoverable. The cache stays fast because sequential WAL appends are far cheaper than random DB writes.
Flush Strategy
A background flush worker periodically drains dirty entries to the database. Two common strategies:
- Interval-based: flush all dirty entries every N seconds (e.g., every 5 seconds). Simple to implement; staleness bounded by interval.
- Count-based: flush when dirty entry count exceeds threshold (e.g., 10,000 dirty keys). Bounds memory pressure. Often combined with interval-based for a belt-and-suspenders approach.
Write Coalescing
If the same key is written multiple times before a flush cycle, only the latest value needs to be flushed. The cache maintains a dirty map keyed by cache key; each new write simply overwrites the pending value in that map. This last-write-wins coalescing collapses N updates to one DB write, which is especially valuable for counter increments, leaderboard scores, and analytics events that update at high frequency.
Coalescing semantics must be documented to callers: intermediate values are never persisted. If intermediate states matter (e.g., audit logs), write-behind is not appropriate.
Failure Recovery via WAL Replay
On cache restart after a crash:
- Open the WAL file and read all records with no corresponding
flushed_atconfirmation. - Re-apply each pending write to the in-memory cache.
- Trigger a flush of all recovered dirty entries.
WAL records are marked as flushed only after the DB write succeeds. This gives at-least-once flush semantics — the DB write may be retried and must be idempotent (upsert, not blind insert).
Conflict Detection on Flush
A concurrent writer (another process, or a direct DB write) may have updated the DB row between when the cache entry was written and when the flush happens. To handle this:
- Store a
versionorupdated_atalongside the cached value at write time. - On flush, use an optimistic locking UPDATE with a WHERE clause checking the expected version.
- If zero rows updated, the row was concurrently modified — log the conflict, and choose a resolution strategy (last-writer-wins, alert, or discard).
SQL Schema
-- Dirty entries waiting to be flushed
CREATE TABLE WriteBehindEntry (
cache_key TEXT PRIMARY KEY,
value JSONB NOT NULL,
dirty_since TIMESTAMPTZ NOT NULL DEFAULT now(),
flush_attempts INT NOT NULL DEFAULT 0,
flushed_at TIMESTAMPTZ
);
-- Write-ahead log for crash recovery
CREATE TABLE WALRecord (
seq BIGSERIAL PRIMARY KEY,
cache_key TEXT NOT NULL,
value JSONB NOT NULL,
wal_at TIMESTAMPTZ NOT NULL DEFAULT now(),
flushed_at TIMESTAMPTZ
);
CREATE INDEX idx_wal_unflushed ON WALRecord (wal_at) WHERE flushed_at IS NULL;
Python Implementation Sketch
import json, time, threading
from collections import defaultdict
class WriteBehindCache:
def __init__(self, db, wal_path, flush_interval=5):
self.db = db
self.wal_path = wal_path
self.flush_interval = flush_interval
self.dirty: dict[str, dict] = {}
self.lock = threading.Lock()
self._open_wal()
self._start_flush_thread()
def _open_wal(self):
self.wal = open(self.wal_path, 'a')
def write(self, key: str, value: dict):
entry = {'key': key, 'value': value, 'ts': time.time()}
self.wal.write(json.dumps(entry) + 'n')
self.wal.flush()
with self.lock:
self.dirty[key] = value
def flush_dirty_entries(self):
with self.lock:
snapshot = dict(self.dirty)
for key, value in snapshot.items():
self._flush_one(key, value)
with self.lock:
for key in snapshot:
if self.dirty.get(key) == snapshot[key]:
del self.dirty[key]
def _flush_one(self, key: str, value: dict):
self.db.execute(
"INSERT INTO target_table (key, value, updated_at) VALUES (%s, %s, now()) "
"ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value, updated_at = EXCLUDED.updated_at",
(key, json.dumps(value))
)
self.db.execute(
"UPDATE WALRecord SET flushed_at = now() WHERE cache_key = %s AND flushed_at IS NULL",
(key,)
)
def replay_wal(self):
with open(self.wal_path, 'r') as f:
for line in f:
entry = json.loads(line.strip())
with self.lock:
if entry['key'] not in self.dirty:
self.dirty[entry['key']] = entry['value']
self.flush_dirty_entries()
def coalesce_writes(self, key: str):
# Returns the latest coalesced value for a key without flushing
with self.lock:
return self.dirty.get(key)
def _start_flush_thread(self):
def loop():
while True:
time.sleep(self.flush_interval)
self.flush_dirty_entries()
t = threading.Thread(target=loop, daemon=True)
t.start()
Use Cases
- High-write-rate counters: page view counts, API call tallies — intermediate values irrelevant, only final count matters.
- Analytics events: buffering events before bulk insert into a data warehouse.
- Leaderboard scores: score updates are frequent; only the latest score matters per flush cycle.
When Not to Use Write-Behind
Avoid write-behind when intermediate state must be durable (financial transactions, inventory decrements), when data loss on cache failure is unacceptable, or when the system lacks a reliable WAL infrastructure. For those cases, write-through or write-around are safer choices.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the durability vs latency tradeoff in a write-behind cache?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Write-behind caches deliver lower write latency by acknowledging writes from cache without waiting for a DB flush. The tradeoff is durability risk: if the cache crashes before the async flush, uncommitted writes can be lost. A WAL mitigates this by persisting writes to durable storage before acknowledging the client, enabling replay on restart.”
}
},
{
“@type”: “Question”,
“name”: “How does WAL-based recovery work in a write-behind cache?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “On every write, an entry is appended to a Write-Ahead Log on durable disk before the client is acknowledged. On restart after a crash, the cache reads all WAL entries that have no flushed_at timestamp, re-populates the dirty map, and triggers a flush to the DB. WAL records are only marked flushed after the DB write succeeds, giving at-least-once semantics.”
}
},
{
“@type”: “Question”,
“name”: “What are write coalescing semantics in write-behind cache?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Write coalescing means multiple updates to the same key between flush cycles are collapsed into one DB write. The dirty map stores only the latest value per key. This last-write-wins approach reduces DB write amplification significantly for high-frequency update patterns like counters and scores. Intermediate values are never persisted.”
}
},
{
“@type”: “Question”,
“name”: “How is a conflict handled when flushing a write-behind entry to the DB?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “If a concurrent process modified the DB row after the cache entry was written but before the flush, an optimistic locking check fails. The flush uses a conditional UPDATE with a WHERE clause on the expected version. If zero rows are updated, a conflict is detected. Resolution options include last-writer-wins (proceed anyway with upsert), logging and alerting, or discarding the stale cache value.”
}
}
]
}
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does write-behind cache differ from write-through?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Write-through synchronously writes to both cache and backing store on every write, ensuring immediate durability at the cost of higher write latency. Write-behind (write-back) acknowledges the write after updating only the cache and asynchronously flushes dirty entries to the backing store in batches, trading durability for lower write latency and higher throughput.”
}
},
{
“@type”: “Question”,
“name”: “How is data loss prevented in write-behind cache?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Dirty cache entries are persisted to a durable write-ahead log (WAL) or append-only journal before the write is acknowledged, so a node crash can be recovered by replaying the log. Additionally, replication of dirty data across cache replicas ensures that a single node failure does not result in unflushed writes being permanently lost.”
}
},
{
“@type”: “Question”,
“name”: “How are write batches coalesced?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The cache tracks dirty keys in a queue or sorted structure; multiple writes to the same key within a flush window are collapsed so only the latest value is written to the backing store, reducing I/O amplification. A background flusher thread drains the dirty queue on a configurable schedule or when queue depth exceeds a threshold.”
}
},
{
“@type”: “Question”,
“name”: “How does write-behind handle cache eviction before flush?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A dirty-bit or dirty-list prevents the eviction policy from evicting entries that have not yet been flushed to the backing store; if eviction pressure is high, the cache synchronously flushes the dirty entry before evicting it. Alternatively, dirty entries can be promoted to a protected segment to shield them from the normal LRU eviction path until the flush completes.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety