What Are Monotonic Reads?
Monotonic reads is a consistency guarantee that prevents a client from observing time going backward. Once a client reads a value at version V, all subsequent reads by that same client must return a value at version V or later. A client should never read an older version of data after having seen a newer one, even if it switches to a different replica.
This guarantee is one component of session consistency and is often paired with read-your-writes to form a complete per-client consistency model.
Per-Client Version Tracking
The foundation of monotonic reads is tracking the highest version observed by each client. This can be implemented as:
- Global LSN: A single log sequence number representing the most advanced replication position the client has observed across all data. Simple but conservative — it prevents the client from reading any data older than its highest observed LSN even on unrelated tables or keys.
- Per-key LSN: A separate LSN tracked per data key or table. More precise — only enforces monotonicity on the specific data the client actually read. Requires more client state (one entry per key accessed).
The client stores last_seen_version and includes it in every read request. The server uses it to validate whether its current state is sufficient to serve the read.
Replica Selection
Given a required minimum LSN (last_seen_version), the system must route the read to a replica whose apply_lsn is at least that value. The routing logic:
- Query each available replica for its current apply_lsn (or maintain a cached apply_lsn map updated via heartbeat).
- Filter to replicas where apply_lsn >= last_seen_version.
- Among eligible replicas, select by load, latency, or round-robin.
- If no replica is eligible, wait for the fastest-replicating replica to catch up, or fall back to the primary.
This approach eliminates the need for sticky sessions at the load balancer while still enforcing monotonicity.
Version Watermark Updates
After each read, the server includes its current apply_lsn in the response. The client updates its last_seen_version:
last_seen_version = max(last_seen_version, returned_lsn)
This means the client's watermark only ever moves forward, never backward. The watermark naturally advances as the client reads more recent data, making subsequent replica eligibility broader as replicas catch up over time.
Sticky Sessions: Simple but Fragile
The simplest implementation of monotonic reads is sticky sessions: always route the same client to the same replica. Because the replica's state only advances, the client never sees older data on the same replica.
Drawbacks:
- Replica failure requires session reassignment, which may result in a version regression if the replacement replica is less advanced.
- Uneven load distribution if clients are skewed to certain replicas.
- Does not survive client reconnection to a different server.
Version-based routing solves all three problems but adds complexity to the routing layer.
Version-Based Routing: Flexible and Fault-Tolerant
Version-based routing selects any replica whose apply_lsn satisfies the client's last_seen_version. This allows:
- Failover to any up-to-date replica without session stickiness.
- Load balancing across multiple eligible replicas.
- Geographic routing (prefer nearby replicas if they are sufficiently advanced).
The routing layer must maintain a low-latency view of each replica's apply_lsn, typically via a heartbeat mechanism where replicas report their current position every few hundred milliseconds.
Global vs. Per-Key Watermark
Global tracking uses a single LSN across all data. If the client reads a user profile at LSN 1000 and then reads a product listing, it requires the product listing replica to be at LSN >= 1000 even though the product data may be entirely independent. This is conservative and may cause unnecessary wait or primary fallback.
Per-key tracking maintains a separate LSN per key or table. The product listing read only requires the replica to be at the LSN of the last product listing read, not the user profile read. This is more precise and causes fewer unnecessary waits, but requires the client to maintain a map of key → last_seen_lsn, which grows with the number of distinct keys accessed.
SQL Schema
-- Per-client monotonic read session state
CREATE TABLE MonotonicReadSession (
client_id VARCHAR(128) PRIMARY KEY,
last_seen_lsn BIGINT NOT NULL DEFAULT 0,
last_read_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Current apply state of each replica
CREATE TABLE ReplicaApplyState (
replica_id VARCHAR(64) PRIMARY KEY,
apply_lsn BIGINT NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_replicaapply_lsn ON ReplicaApplyState(apply_lsn DESC);
Python Implementation
import time
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Tuple
@dataclass
class Replica:
host: str
apply_lsn: int
latency_ms: float
class MonotonicReadClient:
def __init__(self, replicas: List[Replica], global_watermark: bool = True):
self.replicas = replicas
self.global_watermark = global_watermark
self.last_seen_lsn: int = 0
self.per_key_lsn: Dict[str, int] = {}
def select_replica(self, required_lsn: int) -> Optional[Replica]:
"""Select the lowest-latency replica whose apply_lsn >= required_lsn."""
eligible = [r for r in self.replicas if r.apply_lsn >= required_lsn]
if not eligible:
return None
return min(eligible, key=lambda r: r.latency_ms)
def update_watermark(self, key: str, returned_lsn: int) -> None:
"""Advance the client's watermark after a successful read."""
self.last_seen_lsn = max(self.last_seen_lsn, returned_lsn)
if not self.global_watermark:
self.per_key_lsn[key] = max(
self.per_key_lsn.get(key, 0), returned_lsn
)
def required_lsn_for(self, key: str) -> int:
"""Determine the minimum LSN required to serve a monotonic read."""
if self.global_watermark:
return self.last_seen_lsn
return self.per_key_lsn.get(key, 0)
def read(self, key: str, fallback_primary: Optional[Replica] = None) -> Optional[str]:
"""Perform a monotonic read for key."""
required = self.required_lsn_for(key)
replica = self.select_replica(required)
if replica is None:
# Refresh replica apply_lsn cache and retry once
self._refresh_replica_state()
replica = self.select_replica(required)
if replica is None and fallback_primary:
replica = fallback_primary
if replica is None:
raise RuntimeError(f"No eligible replica for LSN {required}")
value, returned_lsn = self._do_read(replica, key)
self.update_watermark(key, returned_lsn)
return value
def _refresh_replica_state(self) -> None:
"""Fetch current apply_lsn from each replica via heartbeat endpoint."""
for replica in self.replicas:
# Placeholder: GET http://{replica.host}/apply_lsn -> int
pass
def _do_read(self, replica: Replica, key: str) -> Tuple[Optional[str], int]:
"""Perform the actual read from the replica. Returns (value, apply_lsn)."""
return None, 0 # Placeholder
FAQ
- Sticky session vs version routing: Sticky is simpler; version-based routing handles failover and load balancing correctly.
- Per-key vs global watermark: Global is simpler but conservative; per-key is precise but grows with key cardinality.
- Replica failure with version routing: Route to any other replica that meets the required LSN; fall back to primary if none qualify.
- Interaction with read-your-writes: Combined LSN floor is max(last_write_lsn, last_read_lsn); both guarantees share the same enforcement mechanism.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems