Snapshot Isolation Low-Level Design: Read View Construction, Write-Write Conflict Detection, and MVCC Integration

Snapshot isolation (SI) is the transaction isolation level used by PostgreSQL (as its default), Oracle, SQL Server, and many distributed databases. It prevents dirty reads and non-repeatable reads while allowing higher concurrency than serializable isolation. Understanding its mechanics — read view construction, write-write conflict detection, and MVCC integration — is essential for database system design interviews.

Snapshot Isolation Guarantees

Under snapshot isolation:

No dirty reads: a transaction never sees uncommitted data from other transactions
No non-repeatable reads: re-reading the same row within a transaction always returns the same version
No phantom reads: the snapshot is fixed at transaction start, so new rows inserted by concurrent transactions are invisible
Write skew allowed: two concurrent transactions can each read a consistent snapshot and make updates that are individually valid but collectively inconsistent (more on this below)

Snapshot isolation is weaker than serializable (which prevents write skew). PostgreSQL's Serializable Snapshot Isolation (SSI) adds anti-dependency tracking to detect and prevent write skew, achieving full serializability.

Read View Construction

When a transaction begins, the database captures a read view (snapshot):

ReadView = {
    min_active_xid:  smallest transaction ID currently active,
    active_xids_set: set of all currently active (uncommitted) transaction IDs,
    snapshot_xid:    the next transaction ID that will be assigned (exclusive upper bound)
}

A row version created by transaction x is visible to this snapshot if:

x < min_active_xid — committed before any active transaction started, OR
x < snapshot_xid AND x is NOT in active_xids_set — committed before snapshot was taken and not one of the currently active transactions

In other words: the transaction sees all committed state as of its start time, and nothing that was not yet committed.

Write-Write Conflict Detection

Snapshot isolation uses the first-writer-wins rule for concurrent writes to the same row:

Transaction A writes row R (creates a new version with xmin = A.xid)
Transaction B tries to write row R while A is still active
B detects that R's current version has xmin of an active transaction
B is aborted with a serialization failure

This prevents lost updates — the scenario where B overwrites A's change without seeing it.

MVCC Integration

Snapshot isolation is implemented on top of MVCC (Multi-Version Concurrency Control). Each row has:

xmin: transaction ID that created this version
xmax: transaction ID that deleted or superseded this version (0 = still current)

Old row versions are retained in the heap. The visibility rule determines which version a given transaction sees. When xmax is set (by UPDATE or DELETE), the old version remains visible to transactions whose snapshot predates the deletion.

Write Skew: The Classic Example

Two doctors share on-call coverage. Hospital policy: at least one doctor must always be on call. Both check the rule at the same snapshot: Doctor A sees Doctor B is on call (true), Doctor B sees Doctor A is on call (true). Both decide to take off. Both commits succeed — there is no write-write conflict because they modified different rows. Result: no doctors on call. This is write skew.

Preventing write skew requires either:

Serializable Snapshot Isolation (SSI): track read-write anti-dependencies; abort if a dangerous cycle is detected
Explicit locking: SELECT FOR UPDATE on the rows being checked forces serialization

Vacuum and Garbage Collection

Old row versions accumulate as updates and deletes happen. Vacuum removes row versions where xmax < oldest_active_xid — no running transaction can possibly need to see that old version. Without vacuum, tables bloat with dead tuples and index entries pointing to them.

The oldest_active_xid is the minimum transaction ID still running. Long-running transactions prevent vacuum from reclaiming space — a major operational concern in high-write PostgreSQL deployments.

SQL Schema

-- Transaction lifecycle tracking
CREATE TABLE Transaction (
    id           BIGSERIAL PRIMARY KEY,
    status       TEXT NOT NULL DEFAULT 'active',  -- active, committed, aborted
    started_at   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    committed_at TIMESTAMPTZ,
    CONSTRAINT chk_tx_status CHECK (status IN ('active','committed','aborted'))
);

-- Row versions (heap representation)
CREATE TABLE RowVersion (
    id         BIGSERIAL PRIMARY KEY,
    table_name TEXT NOT NULL,
    row_id     BIGINT NOT NULL,
    xmin       BIGINT NOT NULL,   -- creating transaction ID
    xmax       BIGINT,            -- deleting transaction ID (NULL = current version)
    data       JSONB NOT NULL
);

CREATE INDEX idx_rv_row ON RowVersion(table_name, row_id, xmin);
CREATE INDEX idx_rv_xmax ON RowVersion(xmax) WHERE xmax IS NOT NULL;

-- Snapshot captures at transaction start
CREATE TABLE ActiveSnapshot (
    xid            BIGINT PRIMARY KEY,  -- transaction ID
    min_active_xid BIGINT NOT NULL,
    active_xids    JSONB NOT NULL,      -- array of active XIDs at snapshot time
    snapshot_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Python Implementation

from dataclasses import dataclass
from typing import Optional
import threading


@dataclass
class RowVersion:
    row_id: int
    xmin: int           # creating transaction
    xmax: Optional[int] # deleting/updating transaction; None = current
    data: dict


@dataclass
class Snapshot:
    xid: int
    min_active_xid: int
    active_xids: frozenset[int]
    snapshot_xid: int   # next XID at snapshot time


class MVCCStorage:
    """Simplified in-memory MVCC storage for snapshot isolation demo."""
    def __init__(self):
        self._versions: dict[int, list[RowVersion]] = {}
        self._next_xid = 1
        self._active_xids: set[int] = set()
        self._committed_xids: set[int] = set()
        self._lock = threading.Lock()

    def begin_transaction(self) -> Snapshot:
        with self._lock:
            xid = self._next_xid
            self._next_xid += 1
            snapshot = Snapshot(
                xid=xid,
                min_active_xid=min(self._active_xids) if self._active_xids else xid,
                active_xids=frozenset(self._active_xids),
                snapshot_xid=xid
            )
            self._active_xids.add(xid)
        return snapshot

    def commit(self, snapshot: Snapshot) -> None:
        with self._lock:
            self._active_xids.discard(snapshot.xid)
            self._committed_xids.add(snapshot.xid)

    def abort(self, snapshot: Snapshot) -> None:
        with self._lock:
            self._active_xids.discard(snapshot.xid)
            # Roll back versions created by this transaction
            for versions in self._versions.values():
                to_remove = [v for v in versions if v.xmin == snapshot.xid]
                for v in to_remove:
                    versions.remove(v)
                # Restore xmax nulled by this transaction
                for v in versions:
                    if v.xmax == snapshot.xid:
                        v.xmax = None

    def get_visible_version(self, row_id: int, snapshot: Snapshot) -> Optional[RowVersion]:
        """Return the row version visible to the given snapshot."""
        versions = self._versions.get(row_id, [])
        for v in reversed(versions):
            if self._is_visible(v, snapshot):
                return v
        return None

    def _is_visible(self, v: RowVersion, snap: Snapshot) -> bool:
        # xmin must be committed and predate snapshot
        if not self._was_committed_before(v.xmin, snap):
            return False
        # xmax must not be committed before snapshot (row not yet deleted)
        if v.xmax is not None and self._was_committed_before(v.xmax, snap):
            return False
        return True

    def _was_committed_before(self, xid: int, snap: Snapshot) -> bool:
        if xid == snap.xid:
            return True  # own writes are visible
        if xid in snap.active_xids:
            return False  # was active at snapshot time
        if xid >= snap.snapshot_xid:
            return False  # started after snapshot
        return xid in self._committed_xids

    def detect_write_conflict(self, row_id: int, writing_xid: int) -> bool:
        """Return True if another active transaction already modified this row."""
        versions = self._versions.get(row_id, [])
        for v in versions:
            if v.xmin in self._active_xids and v.xmin != writing_xid:
                return True  # First-writer-wins: abort writing_xid
        return False

    def write_version(self, row_id: int, data: dict, xid: int) -> None:
        """Create a new row version. Mark old current version with xmax."""
        versions = self._versions.setdefault(row_id, [])
        for v in versions:
            if v.xmax is None:
                v.xmax = xid
        versions.append(RowVersion(row_id=row_id, xmin=xid, xmax=None, data=data))

    def vacuum_old_versions(self, oldest_active_xid: int) -> int:
        """Remove row versions that are no longer visible to any active transaction."""
        removed = 0
        for versions in self._versions.values():
            dead = [v for v in versions if v.xmax is not None and v.xmax < oldest_active_xid]
            for v in dead:
                versions.remove(v)
                removed += 1
        return removed

Snapshot Isolation vs Serializable

Snapshot isolation prevents the most common anomalies with low overhead — no lock contention on reads. Serializable isolation adds tracking of read-write conflicts (anti-dependencies) and aborts transactions when a dangerous cycle is detected. The abort rate under SSI is low for most workloads, making SSI practical. If your application uses SELECT FOR UPDATE to explicitly lock rows before making decisions, snapshot isolation is effectively equivalent to serializable for those operations.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between snapshot isolation and serializable isolation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Snapshot isolation prevents dirty reads, non-repeatable reads, and phantom reads by giving each transaction a consistent snapshot at start time. However, it allows write skew — two concurrent transactions can each read a consistent snapshot and make updates that are individually valid but collectively inconsistent. Serializable isolation prevents write skew by tracking read-write anti-dependencies (Serializable Snapshot Isolation) and aborting transactions that would create inconsistent outcomes.”
}
},
{
“@type”: “Question”,
“name”: “Can you give an example of write skew under snapshot isolation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The classic example: two doctors share on-call coverage with a rule requiring at least one on call. Transaction A reads that Doctor B is on call, so A takes off. Transaction B reads that Doctor A is on call, so B takes off. Both read from consistent snapshots, both modifications touch different rows (no write-write conflict), both commits succeed. Result: no doctors on call — a violation of the invariant. This anomaly cannot occur under serializable isolation.”
}
},
{
“@type”: “Question”,
“name”: “How is a read view constructed for snapshot isolation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When a transaction starts, the database records three values: min_active_xid (smallest currently active transaction ID), active_xids_set (all currently active transaction IDs), and snapshot_xid (the next XID to be assigned). A row version is visible if its creating transaction committed before any of these active transactions started, or committed after the oldest active but before the snapshot was taken and is not in the active set. This allows the transaction to see a consistent point-in-time snapshot.”
}
},
{
“@type”: “Question”,
“name”: “How does vacuum timing affect snapshot isolation in PostgreSQL?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Vacuum can only reclaim dead row versions (those with xmax set) when no running transaction has a snapshot that predates the deletion. The oldest active transaction's snapshot determines the vacuum horizon. Long-running transactions block vacuum from reclaiming dead tuples, causing table bloat and index bloat. In high-write PostgreSQL deployments, preventing long-running transactions (or using idle-in-transaction timeouts) is critical for controlling table bloat.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is a transaction's read view constructed at snapshot isolation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “At transaction start, the system records (min_active_xid, active_xids_set, current_max_xid); a row version is visible if its creating XID is committed and either less than min_active_xid or not in active_xids_set, ensuring the transaction sees a consistent point-in-time view.”
}
},
{
“@type”: “Question”,
“name”: “What is write skew and how does it occur under snapshot isolation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Write skew: two concurrent transactions each read a shared condition, both decide to write based on the condition being true, but their combined writes violate a constraint that neither write alone would violate; snapshot isolation allows this because each transaction sees a snapshot before the other's write.”
}
},
{
“@type”: “Question”,
“name”: “How does first-writer-wins conflict detection prevent write-write conflicts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “If transaction B tries to update a row that transaction A has already updated (committed or in-progress), B is aborted; this ensures two concurrent transactions cannot both modify the same row, preventing update conflicts at the cost of some aborts.”
}
},
{
“@type”: “Question”,
“name”: “How does vacuum determine which row versions to remove?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Vacuum collects the oldest active transaction XID (the horizon); row versions with xmax less than the horizon are dead (no active transaction can see them) and can be physically removed, reclaiming storage.”
}
}
]
}