Optimistic Locking Low-Level Design: Version Columns, CAS Updates, and Retry Strategies

What Is Optimistic Locking?

Optimistic locking is a concurrency control strategy based on the assumption that conflicts between concurrent transactions are rare. Instead of acquiring locks before reading or modifying data, the transaction reads data freely and only checks for conflicts at write time. If a conflict is detected (another transaction modified the same data between the read and write), the operation is retried.

Optimistic locking shifts cost from blocking (waiting for locks) to retrying (redoing work on conflict). It is most effective in low-contention workloads where conflicts are genuinely rare.

Version Column Pattern

The most common implementation of optimistic locking in relational databases uses a version column on each row:

Each row has an integer version column (or a updated_at timestamp).
When reading a row, the application reads the current version alongside the data.
When writing back, the UPDATE statement includes a WHERE clause that checks the version: WHERE id = ? AND version = read_version.
If the update affects 1 row, the write succeeded — no conflict.
If the update affects 0 rows, a conflict occurred — another transaction incremented the version. The application must retry.

The version column is incremented on every successful write: SET version = version + 1. This ensures the version strictly increases and stale readers will always detect the conflict.

Compare-and-Swap (CAS) Update

The optimistic locking pattern is a database-level compare-and-swap (CAS) operation:

UPDATE versioned_row
SET data = :new_data, version = version + 1, updated_at = now()
WHERE id = :row_id AND version = :expected_version;
-- Check affected_rows == 1

This is atomic at the database level — the check and the update happen in the same statement, under row-level locking for the duration of the update itself. The key insight is that no lock is held during the read phase, only during the microseconds of the write statement execution.

Retry Strategy: Exponential Backoff with Jitter

When a conflict is detected (affected_rows == 0), the application must retry the read-modify-write cycle. A naive fixed-delay retry can cause retry storms under contention (many threads retrying simultaneously). Best practice:

Exponential backoff: 1st retry after ~50ms, 2nd after ~100ms, 3rd after ~200ms, etc.
Jitter: Add a random component to each delay (e.g., uniform random in [0, delay_ms]) to desynchronize retries from multiple threads.
Max retries: After N retries (e.g., 5), abort with an error rather than retrying indefinitely.
Full jitter formula: sleep = random(0, min(cap, base * 2^attempt))

HTTP ETag Analogy

The same optimistic locking pattern appears in HTTP APIs via ETags:

Client performs GET /resource — response includes ETag: “abc123” (a hash of the resource state).
Client modifies the resource locally and sends PUT /resource with header If-Match: “abc123”.
Server computes the current ETag. If it matches “abc123”, the resource has not changed — server applies the update and returns the new ETag.
If the current ETag differs, server returns 412 Precondition Failed — client must re-fetch and retry.

The ETag functions identically to the version column: it represents the known-good state at read time and validates that no concurrent modification occurred.

When to Use Optimistic vs. Pessimistic Locking

Use optimistic: Read-heavy workloads; low conflict probability; short transactions; reads vastly outnumber writes; geographic distribution where lock acquisition is expensive.
Use pessimistic: High-contention rows (e.g., inventory counts, account balances with many concurrent updaters); long transactions where retrying is expensive; when retry cost exceeds the cost of blocking.

A useful heuristic: if the probability of conflict is above ~10-15%, pessimistic locking typically outperforms optimistic due to retry overhead.

Application Responsibility

Unlike database-level MVCC or 2PL, optimistic locking is not transparent — the application is responsible for the read-modify-write loop and conflict detection. This means:

The retry logic must be implemented in application code.
The business logic inside the retry loop must be idempotent (safe to run multiple times).
Side effects (emails, external API calls) must not be triggered inside the retry loop — only after a successful commit.

SQL Schema

-- Row with version column for optimistic locking
CREATE TABLE VersionedRow (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    version     INTEGER NOT NULL DEFAULT 1,
    data        JSONB NOT NULL,
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_versionedrow_version ON VersionedRow(id, version);

-- Audit log of detected optimistic locking conflicts
CREATE TABLE OptimisticConflict (
    id               BIGSERIAL PRIMARY KEY,
    row_id           UUID NOT NULL,
    expected_version INTEGER NOT NULL,
    actual_version   INTEGER NOT NULL,
    conflicted_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_optimisticconflict_row ON OptimisticConflict(row_id, conflicted_at);

Python Implementation

import random
import time
import uuid
from dataclasses import dataclass
from typing import Any, Callable, Optional, Tuple

@dataclass
class VersionedRow:
    row_id: str
    version: int
    data: dict

class OptimisticLockingClient:
    def __init__(self, db):
        self.db = db

    def read_for_update(self, row_id: str) -> Optional[VersionedRow]:
        """Read a row with its current version for use in an optimistic update."""
        row = self.db.query_one(
            "SELECT id, version, data FROM VersionedRow WHERE id = %s",
            (row_id,)
        )
        if row is None:
            return None
        return VersionedRow(row_id=row["id"], version=row["version"], data=row["data"])

    def optimistic_update(self, row_id: str, expected_version: int,
                          new_data: dict) -> bool:
        """
        Perform a CAS update. Returns True on success, False on conflict.
        """
        affected = self.db.execute(
            """UPDATE VersionedRow
               SET data = %s, version = version + 1, updated_at = now()
               WHERE id = %s AND version = %s""",
            (new_data, row_id, expected_version)
        )
        if affected == 0:
            # Log conflict for observability
            actual_version = self._get_current_version(row_id)
            self.db.execute(
                """INSERT INTO OptimisticConflict
                   (row_id, expected_version, actual_version)
                   VALUES (%s, %s, %s)""",
                (row_id, expected_version, actual_version or -1)
            )
        return affected == 1

    def _get_current_version(self, row_id: str) -> Optional[int]:
        row = self.db.query_one(
            "SELECT version FROM VersionedRow WHERE id = %s", (row_id,)
        )
        return row["version"] if row else None

    def retry_with_backoff(self, update_fn: Callable[[], bool],
                           max_retries: int = 5,
                           base_delay_ms: int = 50,
                           cap_ms: int = 2000) -> bool:
        """
        Retry update_fn with exponential backoff and full jitter.
        update_fn returns True on success, False on conflict.
        """
        for attempt in range(max_retries):
            if update_fn():
                return True
            if attempt  bool:
        """
        Full optimistic read-modify-write cycle with retry.
        modify_fn receives current data dict and returns new data dict.
        """
        def attempt() -> bool:
            row = self.read_for_update(row_id)
            if row is None:
                return False
            new_data = modify_fn(row.data)
            return self.optimistic_update(row_id, row.version, new_data)

        return self.retry_with_backoff(attempt, max_retries)

FAQ

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “Should I use a version column or updated_at timestamp for optimistic locking?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Prefer a version integer column. Timestamps can collide if two writes happen within the same timestamp resolution (e.g., within 1ms for millisecond-precision columns), causing a conflict to go undetected. Version integers increment strictly and cannot collide.”
}
},
{
“@type”: “Question”,
“name”: “What is the correct retry backoff strategy for optimistic locking conflicts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use exponential backoff with full jitter: delay = random(0, min(cap, base * 2^attempt)). The jitter desynchronizes concurrent retries to prevent thundering herds. A typical configuration is base=50ms, cap=2000ms, max 5 retries before giving up.”
}
},
{
“@type”: “Question”,
“name”: “When should I switch from optimistic to pessimistic locking?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Switch to pessimistic locking when the conflict rate exceeds roughly 10-15%, making retries more expensive than blocking. Hot rows (inventory counts, shared counters, account balances with many concurrent writers) are the classic case where pessimistic locking outperforms optimistic.”
}
},
{
“@type”: “Question”,
“name”: “How does the HTTP ETag pattern relate to optimistic locking?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “HTTP ETags implement optimistic locking at the API layer. The client reads a resource and receives an ETag (hash of state). On update, it sends If-Match: ETag. The server rejects with 412 Precondition Failed if the ETag no longer matches (conflict), requiring the client to re-fetch and retry — identical to the version column CAS pattern.”
}
}
]
}

Version column vs timestamp: Use integer version — timestamps can collide within their resolution precision.
Retry backoff strategy: Exponential backoff with full jitter; base=50ms, cap=2000ms, max 5 retries.
When to use pessimistic instead: Conflict rate above ~10-15%; hot rows with many concurrent writers.
ETag HTTP analogy: GET returns ETag; PUT sends If-Match: ETag; 412 on mismatch — identical to version CAS pattern.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a version column prevent lost updates?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each row carries an integer version counter; when a transaction reads a row it records the version, and its UPDATE statement includes a WHERE clause matching both the primary key and the recorded version. If another transaction committed a change in the meantime the version will have incremented and the UPDATE affects zero rows, signaling a conflict that the application must resolve by retrying.”
}
},
{
“@type”: “Question”,
“name”: “How is optimistic locking implemented with a CAS UPDATE?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A compare-and-swap UPDATE takes the form: UPDATE table SET value = $new, version = version + 1 WHERE id = $id AND version = $expected_version. The application checks the affected row count; zero rows updated means the version did not match and a concurrent writer won, so the transaction must reload the row and retry its business logic.”
}
},
{
“@type”: “Question”,
“name”: “When should optimistic locking be preferred over pessimistic?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Optimistic locking is preferable when read-to-write ratios are high and actual conflicts are rare, since it avoids the overhead of acquiring locks on every read and allows maximum read concurrency. Pessimistic locking is better when conflicts are frequent or the cost of retrying is high (e.g., multi-step workflows), because optimistic retries under heavy contention can degrade into a retry storm.”
}
},
{
“@type”: “Question”,
“name”: “How are retry storms prevented under high contention?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Retry storms are mitigated with exponential backoff plus random jitter so competing transactions do not all retry simultaneously, combined with a maximum retry limit after which the operation fails fast and surfaces an error to the caller. At the architecture level, high-contention rows can be partitioned or use a queue-based serialization pattern to serialize writes without repeated optimistic conflicts.”
}
}
]
}