Low Level Design: CRDT (Conflict-Free Replicated Data Types)

CRDT: Conflict-Free Replicated Data Types

CRDTs are data structures that guarantee strong eventual consistency: any two replicas that have received the same set of updates will converge to the same state, regardless of the order in which updates were applied.

Two CRDT Families

State-based CRDTs (CvRDT — Convergent)

Replicas periodically exchange their full state. The merge function must be:

Commutative: merge(A, B) = merge(B, A)
Associative: merge(merge(A, B), C) = merge(A, merge(B, C))
Idempotent: merge(A, A) = A

Network requirement: at-least-once delivery of state is sufficient.

Operation-based CRDTs (CmRDT — Commutative)

Replicas exchange operations instead of full state (more bandwidth-efficient). Requirements:

Operations must be delivered exactly once to each replica.
Concurrent operations must be commutative (order doesn't matter).

G-Counter (Grow-Only Counter)

State:  counters[node_id] = int  // per-node counter array
Increment(node_id): counters[node_id]++
Value(): sum(counters)
Merge(A, B): for each i, result[i] = max(A[i], B[i])

Guarantees: only increments; no decrement possible. Merge is idempotent.

PN-Counter (Positive-Negative Counter)

State:  P = G-Counter  // increments
        N = G-Counter  // decrements
Increment(): P.increment(node_id)
Decrement(): N.increment(node_id)
Value(): P.value() - N.value()
Merge(A, B): merge both P and N G-Counters separately

OR-Set (Observed-Remove Set)

Supports add and remove of elements. Resolves add/remove conflicts in favor of add.

State:  adds    = set of (element, unique_tag)
        removes = set of unique_tag  // tombstones
Add(e): generate unique_tag; adds.insert((e, unique_tag))
Remove(e): for each (e, tag) in adds, removes.insert(tag)
Contains(e): exists (e, tag) in adds where tag not in removes
Merge(A, B): adds = A.adds ∪ B.adds
             removes = A.removes ∪ B.removes

LWW-Register (Last-Write-Wins Register)

State:  value     = any
        timestamp = int  // logical or physical timestamp
Write(v, ts): if ts > timestamp then value=v, timestamp=ts
Read(): value
Merge(A, B): keep the one with higher timestamp

Simple and efficient; risks data loss if timestamps are not monotonic or two writes have the same timestamp.

RGA (Replicated Growable Array)

Used for collaborative text editing. Each character has a globally unique ID and a predecessor ID:

Character {
    id:          (node_id, sequence_number)
    predecessor: character_id  // insert after this character
    value:       char
    tombstone:   bool          // true if deleted
}

Insert and delete operations are commutative. The document is reconstructed by ordering characters by their predecessor relationships and using unique IDs to break ties deterministically.

Persistence and Replication

Storage: serialize CRDT state as JSONB in a relational DB or document store.
Gossip protocol: nodes periodically exchange state with random peers; each merge converges toward the global state.
Delta-CRDTs: instead of shipping full state, ship only the "delta" — the minimal state that represents recent changes — reducing bandwidth.

Choosing the Right CRDT

Counters: G-Counter or PN-Counter (metrics, likes, inventory adjustments).
Sets: OR-Set (tags, memberships, shopping cart items).
Registers: LWW-Register (user profile fields, configuration).
Sequences: RGA or LSEQ (collaborative document editing).
Maps: OR-Map (recursive CRDT — each key maps to another CRDT).

Key Design Decisions

State-based vs op-based: prefer state-based for simplicity; op-based for bandwidth-constrained environments.
Tombstone accumulation: OR-Set tombstones grow unboundedly — implement periodic garbage collection once all replicas have seen the removal.
Unique tag generation: use (node_id, Lamport timestamp) or UUID to guarantee global uniqueness without coordination.
Causal consistency: pair CRDTs with vector clocks to ensure operations are only applied after their causal dependencies.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a CRDT and what consistency guarantee does it provide?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A CRDT (Conflict-Free Replicated Data Type) is a data structure designed so that any two replicas that have received the same set of updates will converge to the same state, regardless of the order updates were applied. This guarantees strong eventual consistency without requiring coordination or consensus between replicas.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between state-based and operation-based CRDTs?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “State-based CRDTs (CvRDTs) exchange full replica state; the merge function must be commutative, associative, and idempotent, so at-least-once delivery suffices. Operation-based CRDTs (CmRDTs) exchange only operations, which must be commutative and delivered exactly once. CmRDTs use less bandwidth but require more reliable messaging infrastructure.”
}
},
{
“@type”: “Question”,
“name”: “How does an OR-Set handle concurrent add and remove of the same element?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each add operation tags the element with a unique identifier. A remove only tombstones the specific tags it observed at remove time. If a concurrent add uses a new unique tag, that tag is not tombstoned, so the element remains present after merge. This means concurrent add wins over concurrent remove, which is the intended semantics for most collaborative applications.”
}
},
{
“@type”: “Question”,
“name”: “When should you use a CRDT instead of a consensus-based approach like Raft?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use CRDTs when you need high availability and can tolerate eventual consistency — for example, counters, shopping carts, collaborative text editing, or presence indicators. Use consensus (Raft, Paxos) when you need strong consistency and linearizability, such as for financial transactions, distributed locks, or leader election. CRDTs require no coordination and work well under network partitions; consensus requires a quorum and may block during partitions.”
}
}
]
}