Compare-and-Swap Primitives Low-Level Design: Atomic Operations, CAS Loops, and Distributed CAS with Redis

CAS Semantics

Compare-and-swap (CAS) is a single atomic operation: compare the current value at a memory location with an expected value; if they match, replace it with a new value. The operation returns a boolean (success or failure) or the observed old value, depending on the implementation. The entire read-compare-write sequence is indivisible — no other thread can intervene between the comparison and the swap.

Hardware Implementation

CAS is implemented directly in the CPU instruction set:

x86/x86-64: CMPXCHG (32/64-bit), CMPXCHG16B (128-bit for tagged pointers). The LOCK prefix makes the operation bus-atomic across all cores.
ARM: Load-Linked / Store-Conditional (LDREX / STREX); LL marks the cache line, SC succeeds only if no other store has touched it since the LL. ARMv8.1 adds CAS and CASAL for direct compare-and-swap semantics.

Both implementations operate at the cache-coherence protocol level (MESI). When a thread attempts a CAS on a cache line held in shared state by another core, the cache coherence protocol triggers an invalidation round-trip, which is the dominant latency cost under contention.

CAS Loop Pattern

The CAS loop is the fundamental building block of all lock-free algorithms:

Read the current value at the location (load).
Compute the desired new value based on the read.
CAS(location, read_value, new_value). If the CAS fails (another thread changed the location), go back to step 1 and retry.

Under low contention, the CAS succeeds on the first attempt. Under high contention, multiple threads spin-retry; throughput degrades as CAS failures increase. Exponential backoff or queue-based CAS (CLH lock) mitigates this.

ABA Problem and Mitigations

A CAS-loop implementation is vulnerable to the ABA problem: a value changes from A to B and back to A between a thread's read and its CAS attempt, causing the CAS to succeed despite an intermediate state change. Mitigations include:

Version tags: combine value and a monotonic counter into one atomic word. Java's AtomicStampedReference encodes a stamp alongside the reference. Each successful CAS increments the stamp, making A' distinct from A.
Hazard pointers: protect nodes being accessed from reclamation; prevents address reuse while a thread holds a reference.
Epoch-based reclamation: defer freeing until all concurrent threads have advanced past the reclamation epoch.

Database CAS: Optimistic Concurrency

Relational databases implement CAS semantics at the row level via optimistic concurrency control. The pattern is:

UPDATE orders
SET    status  = 'shipped',
       version = version + 1
WHERE  id      = :row_id
AND    version = :expected_version;

If rows_affected = 1, the CAS succeeded — no other transaction modified the row between the read and the write. If rows_affected = 0, the version has changed and the application must retry. This eliminates explicit row locks for read-heavy workloads with infrequent conflicts.

Redis Distributed CAS: WATCH/MULTI/EXEC

Redis implements distributed CAS via optimistic locking:

WATCH key — mark the key for observation.
MULTI — begin a transaction block.
SET key new_value — queue the write command.
EXEC — execute. Returns nil if the key changed since WATCH; the client retries.

For higher-throughput use, Redis Lua scripts provide atomic CAS without client-side retry. Because Redis executes Lua scripts atomically (single-threaded event loop), no other command can interleave. The script checks the current value, writes if it matches, and returns a success flag — all in one atomic step.

Fetch-and-Add as a CAS Variant

Fetch-and-add (FAA) is a specialized CAS for counters: it atomically adds a delta and returns the old value. Most CPU instruction sets support FAA natively (XADD on x86, LDADD on ARMv8.1). In Redis, INCR and INCRBY implement distributed fetch-and-add. FAA avoids the retry loop entirely when the operation is commutative — two increments from different threads produce the same result regardless of order.

SQL Schema

CREATE TABLE CASOperation (
    id             BIGSERIAL PRIMARY KEY,
    resource_id    VARCHAR(128) NOT NULL,
    expected_value JSONB        NOT NULL,
    new_value      JSONB        NOT NULL,
    succeeded      BOOLEAN      NOT NULL,
    attempted_at   TIMESTAMPTZ  NOT NULL DEFAULT now()
);

CREATE TABLE CASRetryLog (
    id             BIGSERIAL PRIMARY KEY,
    resource_id    VARCHAR(128) NOT NULL,
    attempt_number INT          NOT NULL,
    failure_reason VARCHAR(256),
    retried_at     TIMESTAMPTZ  NOT NULL DEFAULT now()
);

CREATE INDEX idx_cas_resource       ON CASOperation (resource_id, attempted_at DESC);
CREATE INDEX idx_cas_retry_resource ON CASRetryLog   (resource_id, attempt_number);

Python Implementation

import threading
import time

# ── In-memory CAS (simulated with GIL, mirrors real atomic semantics) ─────────

_store      = {}
_store_lock = threading.Lock()

def mem_cas(key, expected, new_value):
    with _store_lock:
        if _store.get(key) == expected:
            _store[key] = new_value
            return True
        return False


# ── Database CAS via optimistic versioning ────────────────────────────────────

def db_cas(conn, row_id, expected_version, new_data):
    """
    Returns True if the CAS succeeded (exactly one row updated).
    Caller must retry on False.
    """
    with conn.cursor() as cur:
        cur.execute(
            """
            UPDATE resources
            SET    data    = %s,
                   version = version + 1
            WHERE  id      = %s
            AND    version = %s
            """,
            (new_data, row_id, expected_version)
        )
        conn.commit()
        return cur.rowcount == 1


# ── Redis CAS via Lua (atomic, no retry at protocol level) ───────────────────

def redis_cas(r, key, expected, new_value):
    """
    Atomically compare-and-swap a Redis key using a Lua script.
    Returns True on success, False if the current value did not match.
    """
    script = (
        "if redis.call('GET', KEYS[1]) == ARGV[1] then "
        "  redis.call('SET', KEYS[1], ARGV[2]) "
        "  return 1 "
        "else "
        "  return 0 "
        "end"
    )
    result = r.eval(script, 1, key, expected, new_value)
    return result == 1


# ── Generic CAS loop ──────────────────────────────────────────────────────────

def cas_loop(read_fn, compute_fn, cas_fn, max_retries=10):
    """
    Generic CAS loop:
      read_fn()            -> current value
      compute_fn(current)  -> desired new value
      cas_fn(current, new) -> bool (success)
    """
    for attempt in range(max_retries):
        current   = read_fn()
        new_value = compute_fn(current)
        if cas_fn(current, new_value):
            return new_value
        backoff = min(0.001 * (2 ** attempt), 0.1)
        time.sleep(backoff)
    raise RuntimeError(f"CAS failed after {max_retries} attempts")


# ── Example: atomic counter increment via CAS loop ───────────────────────────

def increment_counter(r, key):
    def read():
        return int(r.get(key) or 0)
    def compute(v):
        return str(v + 1)
    def cas(expected, new_value):
        return redis_cas(r, key, str(expected), new_value)
    return cas_loop(read, compute, cas)

CAS Contention Patterns

Low contention: CAS succeeds on first attempt; overhead is negligible compared to a mutex acquire.
High contention (hot key): many threads retry; CPU cycles wasted in spin-retry; cache line bounces across cores. Solutions: sharding the counter across N keys and summing for reads, or using a dedicated Fetch-and-Add primitive.
Distributed contention: Redis WATCH/MULTI/EXEC contention causes transaction aborts; Lua script eliminates aborts but serializes at the Redis single thread.

Frequently Asked Questions

How does CAS differ from a mutex?

A mutex blocks a thread when it cannot acquire the lock, putting it to sleep until the lock is released. CAS never blocks — it either succeeds immediately or fails immediately, leaving the caller to decide whether to retry or give up. CAS is better for short operations under low contention; mutexes are better when a critical section is long or contention is consistently high, because sleeping threads consume no CPU while waiting.

How do version tags mitigate ABA?

A version tag (stamp) is incremented on every successful CAS. Even if a pointer or value returns to its original state, the tag will have a higher version. A CAS that checks both the value and the tag will correctly fail because the tag no longer matches the originally observed value, preventing the incorrect success that causes ABA corruption.

When should you use database CAS vs Redis CAS?

Use database CAS (optimistic versioning) when the data must be durably persisted and the conflict rate is low — row locking is avoided for the common path. Use Redis CAS when you need sub-millisecond distributed coordination and can tolerate the durability trade-offs of an in-memory store. Redis Lua CAS is faster but loses data on crash unless AOF or RDB persistence is configured appropriately.

What happens to throughput when a CAS loop has high contention?

Under high contention, CAS failure rates approach 100% for all but one thread per round. Throughput collapses because every retry re-reads the cache line, triggers a cache coherence invalidation, and the cycle repeats. Techniques to reduce CAS contention include per-thread counters with periodic aggregation, fetch-and-add primitives (which are inherently conflict-tolerant), and splitting a single hot location into a striped array of locations.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does CAS differ from a mutex?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A mutex blocks a thread when it cannot acquire the lock. CAS never blocks — it either succeeds or fails immediately, leaving the caller to retry. CAS is better for short operations under low contention; mutexes are better when critical sections are long or contention is consistently high.”
}
},
{
“@type”: “Question”,
“name”: “How do version tags mitigate the ABA problem?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A version tag (stamp) is incremented on every successful CAS. Even if a value returns to its original state, the tag will be higher. A CAS checking both value and tag fails because the tag no longer matches, preventing the incorrect success that causes ABA corruption.”
}
},
{
“@type”: “Question”,
“name”: “When should you use database CAS vs Redis CAS?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use database CAS (optimistic versioning) when data must be durably persisted and conflict rate is low. Use Redis CAS when you need sub-millisecond distributed coordination and can tolerate in-memory durability trade-offs. Redis Lua CAS is faster but requires AOF or RDB for durability.”
}
},
{
“@type”: “Question”,
“name”: “What happens to throughput when a CAS loop has high contention?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Under high contention, CAS failure rates approach 100% for all but one thread. Throughput collapses because every retry triggers cache coherence invalidation cycles. Solutions include per-thread counters with aggregation, fetch-and-add primitives, and striped arrays of locations.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does compare-and-swap work at the hardware level?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “CAS is a single atomic CPU instruction (e.g., CMPXCHG on x86) that reads a memory location, compares it to an expected value, and writes a new value only if they match — all without interruption from other cores. The CPU's cache coherence protocol ensures mutual exclusion at the cache-line level, so no OS-level lock is needed.”
}
},
{
“@type”: “Question”,
“name”: “How is CAS used to implement a distributed counter in Redis?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “In Redis, a distributed counter can use the WATCH/MULTI/EXEC optimistic locking pattern: WATCH the key, read the current value, compute the increment, then execute a transaction that only commits if the key hasn't changed — semantically equivalent to CAS. For higher throughput, Redis's single-threaded INCR command provides atomic increment without needing explicit CAS.”
}
},
{
“@type”: “Question”,
“name”: “What causes a CAS loop to spin and how is it bounded?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A CAS loop spins when another thread modifies the target memory location between the load and the CAS, causing the CAS to fail and the loop to retry. Spinning can be bounded by introducing exponential backoff between retries or by falling back to a mutex after a fixed number of failed attempts.”
}
},
{
“@type”: “Question”,
“name”: “How does CAS differ from fetch-and-add?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Fetch-and-add atomically increments a value and returns the old value unconditionally, making it simpler and more efficient for counters because it never fails and needs no retry loop. CAS is more general — it can implement any conditional update — but requires the caller to re-read and retry when the comparison fails.”
}
}
]
}