Question 1

What is the durability vs latency tradeoff in a write-behind cache?

Accepted Answer

Write-behind caches deliver lower write latency by acknowledging writes from cache without waiting for a DB flush. The tradeoff is durability risk: if the cache crashes before the async flush, uncommitted writes can be lost. A WAL mitigates this by persisting writes to durable storage before acknowledging the client, enabling replay on restart.

Question 2

How does WAL-based recovery work in a write-behind cache?

Accepted Answer

On every write, an entry is appended to a Write-Ahead Log on durable disk before the client is acknowledged. On restart after a crash, the cache reads all WAL entries that have no flushed_at timestamp, re-populates the dirty map, and triggers a flush to the DB. WAL records are only marked flushed after the DB write succeeds, giving at-least-once semantics.

Question 3

What are write coalescing semantics in write-behind cache?

Accepted Answer

Write coalescing means multiple updates to the same key between flush cycles are collapsed into one DB write. The dirty map stores only the latest value per key. This last-write-wins approach reduces DB write amplification significantly for high-frequency update patterns like counters and scores. Intermediate values are never persisted.

Question 4

How is a conflict handled when flushing a write-behind entry to the DB?

Accepted Answer

If a concurrent process modified the DB row after the cache entry was written but before the flush, an optimistic locking check fails. The flush uses a conditional UPDATE with a WHERE clause on the expected version. If zero rows are updated, a conflict is detected. Resolution options include last-writer-wins (proceed anyway with upsert), logging and alerting, or discarding the stale cache value.

Question 5

How does write-behind cache differ from write-through?

Accepted Answer

Write-through synchronously writes to both cache and backing store on every write, ensuring immediate durability at the cost of higher write latency. Write-behind (write-back) acknowledges the write after updating only the cache and asynchronously flushes dirty entries to the backing store in batches, trading durability for lower write latency and higher throughput.

Question 6

How is data loss prevented in write-behind cache?

Accepted Answer

Dirty cache entries are persisted to a durable write-ahead log (WAL) or append-only journal before the write is acknowledged, so a node crash can be recovered by replaying the log. Additionally, replication of dirty data across cache replicas ensures that a single node failure does not result in unflushed writes being permanently lost.

Question 7

How are write batches coalesced?

Accepted Answer

The cache tracks dirty keys in a queue or sorted structure; multiple writes to the same key within a flush window are collapsed so only the latest value is written to the backing store, reducing I/O amplification. A background flusher thread drains the dirty queue on a configurable schedule or when queue depth exceeds a threshold.

Question 8

How does write-behind handle cache eviction before flush?

Accepted Answer

A dirty-bit or dirty-list prevents the eviction policy from evicting entries that have not yet been flushed to the backing store; if eviction pressure is high, the cache synchronously flushes the dirty entry before evicting it. Alternatively, dirty entries can be promoted to a protected segment to shield them from the normal LRU eviction path until the flush completes.

Write-Behind Cache Low-Level Design: Async Persistence, Durability Guarantees, and Failure Recovery

What Is a Write-Behind Cache?

Core Write Flow

Flush Strategy

Write Coalescing

Failure Recovery via WAL Replay

Conflict Detection on Flush

SQL Schema

Python Implementation Sketch

Use Cases

When Not to Use Write-Behind