When should I use sticky sessions vs version-based routing for monotonic reads?

Use sticky sessions when simplicity is the priority and the replica fleet is stable. Use version-based routing when you need fault tolerance (replica failure should not break session monotonicity), load balancing across replicas, or support for clients that reconnect to different servers.

What is the tradeoff between per-key and global watermark tracking?

Global watermark is simpler (one LSN per client) but conservative — it may require a more advanced replica even for data unrelated to previous reads. Per-key watermark is precise but requires the client to maintain a map of key-to-LSN that grows with the number of distinct keys accessed.

What happens if the required replica for a monotonic read fails?

With sticky sessions, the client must be redirected to another replica, which may have a lower apply_lsn and could violate monotonicity. With version-based routing, the system selects a different eligible replica. If no replica meets the required LSN, the system falls back to the primary or waits for replication to advance.

How does monotonic reads interact with read-your-writes?

Read-your-writes requires the client to see its own writes. Monotonic reads prevents the client from observing older data. Both are enforced by tracking an LSN floor: required_lsn = max(last_write_lsn, last_read_lsn). A replica must satisfy this combined floor to serve any read in the session.

How is monotonic read guarantee enforced with multiple replicas?

The system issues each read a version token equal to the highest timestamp seen so far in the session; every subsequent read is sent to a replica whose applied version is at least that token, ensuring the client never reads a replica that is behind a previously observed state. The token is advanced on each response to reflect the latest observed version.

How does a read version token prevent stale reads?

The token encodes a lower bound on acceptable replica state; a replica compares the token against its own replication offset and either serves the read immediately if it qualifies or waits/rejects until it catches up. This prevents the client from ever observing a value that is older than one it has already seen.

How is the version token passed through load balancers?

The token is embedded in a request header (e.g., X-Read-Version) that load balancers are configured to forward transparently; the application layer or a smart proxy reads the header and selects an eligible backend. Some systems encode the token in a signed cookie so stateless load balancers do not need to parse it at all.

What happens when the pinned replica becomes unavailable?

The system falls back to another replica whose replication offset satisfies the version token, or escalates to the primary if no replica qualifies. If the primary is also unavailable the read must either block (sacrificing availability) or return an error, since serving a stale version would violate the monotonic guarantee.

Monotonic Reads Guarantee Low-Level Design: Read Version Tracking, Replica Selection, and Consistency Enforcement

⏱ 8 min read

What Are Monotonic Reads?

Monotonic reads is a consistency guarantee that prevents a client from observing time going backward. Once a client reads a value at version V, all subsequent reads by that same client must return a value at version V or later. A client should never read an older version of data after having seen a newer one, even if it switches to a different replica.

This guarantee is one component of session consistency and is often paired with read-your-writes to form a complete per-client consistency model.

Per-Client Version Tracking

The foundation of monotonic reads is tracking the highest version observed by each client. This can be implemented as:

Global LSN: A single log sequence number representing the most advanced replication position the client has observed across all data. Simple but conservative — it prevents the client from reading any data older than its highest observed LSN even on unrelated tables or keys.
Per-key LSN: A separate LSN tracked per data key or table. More precise — only enforces monotonicity on the specific data the client actually read. Requires more client state (one entry per key accessed).

The client stores last_seen_version and includes it in every read request. The server uses it to validate whether its current state is sufficient to serve the read.

Replica Selection

Given a required minimum LSN (last_seen_version), the system must route the read to a replica whose apply_lsn is at least that value. The routing logic:

Query each available replica for its current apply_lsn (or maintain a cached apply_lsn map updated via heartbeat).
Filter to replicas where apply_lsn >= last_seen_version.
Among eligible replicas, select by load, latency, or round-robin.
If no replica is eligible, wait for the fastest-replicating replica to catch up, or fall back to the primary.

This approach eliminates the need for sticky sessions at the load balancer while still enforcing monotonicity.

Version Watermark Updates

After each read, the server includes its current apply_lsn in the response. The client updates its last_seen_version:

last_seen_version = max(last_seen_version, returned_lsn)

This means the client's watermark only ever moves forward, never backward. The watermark naturally advances as the client reads more recent data, making subsequent replica eligibility broader as replicas catch up over time.

Sticky Sessions: Simple but Fragile

The simplest implementation of monotonic reads is sticky sessions: always route the same client to the same replica. Because the replica's state only advances, the client never sees older data on the same replica.

Drawbacks:

Replica failure requires session reassignment, which may result in a version regression if the replacement replica is less advanced.
Uneven load distribution if clients are skewed to certain replicas.
Does not survive client reconnection to a different server.

Version-based routing solves all three problems but adds complexity to the routing layer.

Version-Based Routing: Flexible and Fault-Tolerant

Version-based routing selects any replica whose apply_lsn satisfies the client's last_seen_version. This allows:

Failover to any up-to-date replica without session stickiness.
Load balancing across multiple eligible replicas.
Geographic routing (prefer nearby replicas if they are sufficiently advanced).

The routing layer must maintain a low-latency view of each replica's apply_lsn, typically via a heartbeat mechanism where replicas report their current position every few hundred milliseconds.

Global vs. Per-Key Watermark

Global tracking uses a single LSN across all data. If the client reads a user profile at LSN 1000 and then reads a product listing, it requires the product listing replica to be at LSN >= 1000 even though the product data may be entirely independent. This is conservative and may cause unnecessary wait or primary fallback.

Per-key tracking maintains a separate LSN per key or table. The product listing read only requires the replica to be at the LSN of the last product listing read, not the user profile read. This is more precise and causes fewer unnecessary waits, but requires the client to maintain a map of key → last_seen_lsn, which grows with the number of distinct keys accessed.

SQL Schema

-- Per-client monotonic read session state
CREATE TABLE MonotonicReadSession (
    client_id       VARCHAR(128) PRIMARY KEY,
    last_seen_lsn   BIGINT NOT NULL DEFAULT 0,
    last_read_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Current apply state of each replica
CREATE TABLE ReplicaApplyState (
    replica_id  VARCHAR(64) PRIMARY KEY,
    apply_lsn   BIGINT NOT NULL DEFAULT 0,
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_replicaapply_lsn ON ReplicaApplyState(apply_lsn DESC);

Python Implementation

import time
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Tuple

@dataclass
class Replica:
    host: str
    apply_lsn: int
    latency_ms: float

class MonotonicReadClient:
    def __init__(self, replicas: List[Replica], global_watermark: bool = True):
        self.replicas = replicas
        self.global_watermark = global_watermark
        self.last_seen_lsn: int = 0
        self.per_key_lsn: Dict[str, int] = {}

    def select_replica(self, required_lsn: int) -> Optional[Replica]:
        """Select the lowest-latency replica whose apply_lsn >= required_lsn."""
        eligible = [r for r in self.replicas if r.apply_lsn >= required_lsn]
        if not eligible:
            return None
        return min(eligible, key=lambda r: r.latency_ms)

    def update_watermark(self, key: str, returned_lsn: int) -> None:
        """Advance the client's watermark after a successful read."""
        self.last_seen_lsn = max(self.last_seen_lsn, returned_lsn)
        if not self.global_watermark:
            self.per_key_lsn[key] = max(
                self.per_key_lsn.get(key, 0), returned_lsn
            )

    def required_lsn_for(self, key: str) -> int:
        """Determine the minimum LSN required to serve a monotonic read."""
        if self.global_watermark:
            return self.last_seen_lsn
        return self.per_key_lsn.get(key, 0)

    def read(self, key: str, fallback_primary: Optional[Replica] = None) -> Optional[str]:
        """Perform a monotonic read for key."""
        required = self.required_lsn_for(key)
        replica = self.select_replica(required)
        if replica is None:
            # Refresh replica apply_lsn cache and retry once
            self._refresh_replica_state()
            replica = self.select_replica(required)
        if replica is None and fallback_primary:
            replica = fallback_primary
        if replica is None:
            raise RuntimeError(f"No eligible replica for LSN {required}")

        value, returned_lsn = self._do_read(replica, key)
        self.update_watermark(key, returned_lsn)
        return value

    def _refresh_replica_state(self) -> None:
        """Fetch current apply_lsn from each replica via heartbeat endpoint."""
        for replica in self.replicas:
            # Placeholder: GET http://{replica.host}/apply_lsn -> int
            pass

    def _do_read(self, replica: Replica, key: str) -> Tuple[Optional[str], int]:
        """Perform the actual read from the replica. Returns (value, apply_lsn)."""
        return None, 0  # Placeholder

FAQ

Sticky session vs version routing: Sticky is simpler; version-based routing handles failover and load balancing correctly.
Per-key vs global watermark: Global is simpler but conservative; per-key is precise but grows with key cardinality.
Replica failure with version routing: Route to any other replica that meets the required LSN; fall back to primary if none qualify.
Interaction with read-your-writes: Combined LSN floor is max(last_write_lsn, last_read_lsn); both guarantees share the same enforcement mechanism.