Session Consistency Low-Level Design: Read-Your-Writes, Monotonic Reads, and Session Token Tracking

What Is Session Consistency?

Session consistency is a client-centric consistency model that provides guarantees within the scope of a single client session. It is weaker than linearizability but stronger than eventual consistency. The two core guarantees are:

Read-your-writes: Within a session, reads always reflect all prior writes made in that same session.
Monotonic reads: Within a session, reads are monotonically non-decreasing — once a client reads a particular version of data, it never observes an older version in a subsequent read.

Session consistency allows reads to be served from replicas, enabling horizontal read scaling, while still providing meaningful per-client guarantees.

Session Token Design

The session token is an opaque value issued to the client that encodes the consistency state of the session. It carries two fields:

last_write_lsn: The log sequence number (LSN) of the most recent write the client made. This is used to enforce read-your-writes.
last_read_lsn: The LSN of the most recent read served to the client. This is used to enforce monotonic reads.

The token is typically base64-encoded or JWT-encoded so it is opaque to the client but parseable by servers. It travels in an HTTP header (e.g., X-Session-Token) on every request.

Read-Your-Writes Enforcement

When the client sends a read request with a session token, the server extracts last_write_lsn from the token and compares it to its own apply_lsn (the LSN up to which the replica has applied changes from the write-ahead log).

If apply_lsn >= last_write_lsn, the replica can serve the read immediately — the client's write is visible.
If apply_lsn < last_write_lsn, the server either waits (with a configurable timeout) for replication to catch up, or routes the request to the primary or a more up-to-date replica.

On completion, the server returns the current apply_lsn in the response, allowing the client to update its session token.

Monotonic Read Enforcement

The session token also encodes last_read_lsn. When the client reads, the server must serve the read at an LSN that is at least last_read_lsn. This prevents the client from observing a version that is older than what it already read.

The required LSN for a read is therefore: max(last_write_lsn, last_read_lsn). After serving the read, the server reports its current apply_lsn, and the client updates last_read_lsn to max(last_read_lsn, returned_lsn).

Session Migration

When a client connects to a different backend server (due to load balancing, failover, or reconnection), it presents its session token. The new server must either:

Verify that its apply_lsn satisfies the token's required LSN and serve the request directly.
Wait for replication to catch up before serving.
Route the request to the primary if the required LSN is too far ahead.

This means session tokens enable consistent behavior across server migrations without requiring sticky sessions at the load balancer level.

Session Expiry

Session tokens carry a TTL (time-to-live). When a session expires:

The server no longer honors LSN requirements from the expired token.
The client is issued a fresh token with no LSN constraints.
The client effectively degrades to eventual consistency until it makes a new write.

Session expiry prevents unbounded accumulation of session state and allows the system to reclaim resources for inactive sessions.

Tradeoffs vs. Linearizability

Session consistency is significantly less strict than linearizability. Under linearizability, every operation appears to take effect at a precise real-time point, requiring coordination (quorum reads, leader reads) on every operation. Session consistency allows reads from any up-to-date replica and only requires coordination when the replica lags behind the session's LSN requirements. This makes session consistency well-suited for read-heavy workloads with geographic distribution.

SQL Schema

-- Tracks per-client session state
CREATE TABLE ClientSession (
    session_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    last_write_lsn  BIGINT NOT NULL DEFAULT 0,
    last_read_lsn   BIGINT NOT NULL DEFAULT 0,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    expires_at      TIMESTAMPTZ NOT NULL
);

CREATE INDEX idx_clientsession_expires ON ClientSession(expires_at);

-- Audit log of reads served under session consistency
CREATE TABLE SessionRead (
    id            BIGSERIAL PRIMARY KEY,
    session_id    UUID NOT NULL REFERENCES ClientSession(session_id),
    requested_lsn BIGINT NOT NULL,
    served_lsn    BIGINT NOT NULL,
    served_by     VARCHAR(64) NOT NULL,   -- replica host
    latency_ms    INTEGER NOT NULL,
    read_at       TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_sessionread_session ON SessionRead(session_id, read_at);

Python Implementation

import base64
import json
import time
from dataclasses import dataclass
from typing import Optional

@dataclass
class SessionToken:
    write_lsn: int
    read_lsn: int
    expires_at: float

class SessionConsistencyClient:
    def __init__(self, servers: list, session_ttl_s: int = 3600):
        self.servers = servers
        self.session_ttl_s = session_ttl_s
        self.write_lsn: int = 0
        self.read_lsn: int = 0

    def encode_token(self, write_lsn: int, read_lsn: int) -> str:
        payload = {
            "write_lsn": write_lsn,
            "read_lsn": read_lsn,
            "expires_at": time.time() + self.session_ttl_s
        }
        return base64.b64encode(json.dumps(payload).encode()).decode()

    def decode_token(self, token: str) -> SessionToken:
        payload = json.loads(base64.b64decode(token).decode())
        return SessionToken(
            write_lsn=payload["write_lsn"],
            read_lsn=payload["read_lsn"],
            expires_at=payload["expires_at"]
        )

    def wait_for_lsn(self, server: str, required_lsn: int, timeout_ms: int = 500) -> bool:
        """Poll server until its apply_lsn >= required_lsn or timeout."""
        deadline = time.time() + timeout_ms / 1000
        while time.time() = required_lsn:
                return True
            time.sleep(0.02)
        return False

    def _get_apply_lsn(self, server: str) -> int:
        # Placeholder: HTTP call to /internal/apply_lsn
        # Returns the server's current replication apply position
        return 0

    def write(self, key: str, value: str) -> int:
        """Write to primary and update session write LSN."""
        primary = self.servers[0]
        # Placeholder: POST /write {key, value} -> returns {"lsn": N}
        returned_lsn = self._do_write(primary, key, value)
        self.write_lsn = max(self.write_lsn, returned_lsn)
        return returned_lsn

    def read(self, key: str, timeout_ms: int = 500) -> Optional[str]:
        """Read from any server that satisfies session LSN requirements."""
        required_lsn = max(self.write_lsn, self.read_lsn)
        for server in self.servers:
            if self.wait_for_lsn(server, required_lsn, timeout_ms):
                value, served_lsn = self._do_read(server, key)
                self.read_lsn = max(self.read_lsn, served_lsn)
                return value
        # Fall back to primary
        value, served_lsn = self._do_read(self.servers[0], key)
        self.read_lsn = max(self.read_lsn, served_lsn)
        return value

    def _do_write(self, server: str, key: str, value: str) -> int:
        return 0  # Placeholder

    def _do_read(self, server: str, key: str):
        return None, 0  # Placeholder

FAQ

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does read-your-writes differ from linearizability?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Linearizability requires every operation to appear to take effect at a single real-time point, requiring coordination on every read and write. Read-your-writes only requires that a client's own writes are visible in its own subsequent reads. Other clients may temporarily see stale data. This allows reads from replicas with much lower latency.”
}
},
{
“@type”: “Question”,
“name”: “What does the session token actually contain?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The session token encodes last_write_lsn (the LSN of the client's most recent write) and last_read_lsn (the LSN of the most recent read served to the client), along with a TTL expiry timestamp. It is base64 or JWT encoded so it is opaque to the client but parseable by servers.”
}
},
{
“@type”: “Question”,
“name”: “What happens when the LSN wait times out?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “If a replica cannot catch up to the required LSN within the configured timeout (e.g., 500ms), the server either routes the request to the primary or returns an error. The client may retry against a different replica or accept degraded consistency depending on application requirements.”
}
},
{
“@type”: “Question”,
“name”: “What happens when a session token expires?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The server stops honoring the LSN requirements of the expired token. The client receives a fresh session token with no LSN constraints and effectively operates at eventual consistency until it performs a new write that anchors the session to a new LSN.”
}
}
]
}

Read-your-writes vs linearizability: Session consistency only constrains the client's own reads; much cheaper than full linearizability.
Session token content: last_write_lsn, last_read_lsn, and expiry timestamp, base64 or JWT encoded.
LSN wait timeout: Configurable per request; on timeout, route to primary or return error.
Session expiry behavior: Token invalidated; client degrades to eventual consistency until next write.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does read-your-writes guarantee work at the database level?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “After a client writes, the database records the write's LSN (log sequence number) or timestamp in the session token; subsequent reads are routed only to replicas whose replication offset is at least that value, blocking or retrying until the replica catches up. This ensures the client always observes its own committed writes even when reads are served from followers.”
}
},
{
“@type”: “Question”,
“name”: “How is session state routed to the correct replica?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A session token carrying the last-write timestamp or LSN is stored in a cookie or JWT and sent with every request; the load balancer or application layer inspects the token and pins the request to a replica that has replicated at least up to that point. If no such replica is available the request can be promoted to the primary or held in a retry queue.”
}
},
{
“@type”: “Question”,
“name”: “How does session consistency handle replica failover?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When the pinned replica fails, the session token's LSN is used to select the next best replica whose applied log position satisfies the dependency, or the request falls back to the primary. Because the dependency is encoded in a portable token rather than a sticky TCP connection, failover does not break the read-your-writes guarantee.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between session consistency and causal consistency?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Session consistency scopes guarantees to a single client session: that client sees its own writes in order, but two different clients may observe conflicting orderings of each other's operations. Causal consistency is system-wide: if any client observes that write A happened before write B, every client is guaranteed to see A before B, regardless of session boundaries.”
}
}
]
}