Vector Clock Service
Vector clocks track causality in distributed systems without requiring synchronized physical clocks. Each node maintains a vector of logical timestamps — one counter per node in the system.
Vector Clock Mechanics
VC = [VC[0], VC[1], ..., VC[n-1]]
// VC[i] = number of events node i has caused
Rules
- Increment: Before a send event or internal event, node i increments
VC[i]. - Send: Attach the current VC to the message.
- Receive / Merge: For each component j, set
VC[j] = max(local_VC[j], received_VC[j]), then increment own counterVC[i]++.
Causality Detection
- Happened-before (A → B):
VC(A) <= VC(B)— all components of A are less than or equal to the corresponding components of B, with at least one strictly less. - Concurrent (A ∥ B): neither
VC(A) <= VC(B)norVC(B) <= VC(A). - Identical: same event or causally equivalent.
Data Model
VersionedObject {
key: string PRIMARY KEY
value: bytes
vector_clock: JSONB // {"node_id": counter, ...}
updated_by: string // node_id of last writer
updated_at: timestamp
}
Conflict Detection on Write
- Client sends write with its current VC.
- Server reads stored
VC_stored. - Compare: if
VC_client >= VC_stored(client has seen everything stored), write is safe — update and merge clocks. - If concurrent (neither dominates), a conflict exists: store both versions as sibling values.
Conflict Resolution Strategies
- Last-Write-Wins (LWW): use wall clock timestamp as tiebreaker. Simple but loses updates.
- Application-level merge: e.g., shopping cart union — take union of all sibling item sets.
- CRDT: use a data structure that merges automatically (see LLD: CRDT).
- Client-side reconciliation: return all siblings to client; client resolves and writes back.
Dotted Version Vectors
Standard VCs can't distinguish between concurrent writes and causally-related ones when siblings are present. Dotted Version Vectors (DVVs) attach a "dot" (node, counter) to each value, enabling precise tracking of which write introduced each sibling.
DVV = { dot: (node, counter), context: VC }
Practical Implementation: Amazon Dynamo-style KV Store
- Each replica maintains its own VC component.
- On read, coordinator gathers responses from R replicas; returns all siblings if VCs are concurrent.
- On write, coordinator increments its own VC component and writes to W replicas.
- Background anti-entropy: replicas gossip and merge state using VC comparison to detect and repair divergence.
Key Design Decisions
- VC size: grows with number of nodes. Prune stale entries using a garbage collection epoch when nodes leave.
- Storage: store VC as JSONB for flexibility; index on key for fast reads.
- Clock skew: VCs are immune to clock skew — they use logical, not physical, time.
- Read repair: on read, if coordinator detects concurrent siblings, trigger a reconciliation write.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a vector clock and what problem does it solve?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A vector clock is an array of logical counters, one per node, that tracks causality in distributed systems without requiring synchronized physical clocks. It solves the problem of determining whether one event happened before another (causality) or whether two events occurred concurrently, which is essential for detecting and resolving write conflicts in distributed databases.”
}
},
{
“@type”: “Question”,
“name”: “How do you detect concurrent events using vector clocks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Event A happened-before event B if every component of VC(A) is less than or equal to the corresponding component of VC(B), with at least one strictly less. Two events are concurrent if neither dominates the other — A's VC has some components greater than B's and some less, meaning they have no causal relationship.”
}
},
{
“@type”: “Question”,
“name”: “How are vector clock conflicts resolved in a distributed key-value store?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When a write arrives with a vector clock that is concurrent with the stored version, both versions are kept as siblings. Resolution strategies include last-write-wins using a wall clock tiebreaker, application-level merge (e.g., union of shopping cart items), CRDTs that merge automatically, or returning all siblings to the client for manual reconciliation.”
}
},
{
“@type”: “Question”,
“name”: “What are Dotted Version Vectors and why are they better than plain vector clocks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Dotted Version Vectors (DVVs) attach a dot — a (node, counter) pair — to each stored value in addition to the causal context vector clock. This allows the system to precisely track which write introduced each sibling value, preventing the false concurrency problem that arises with plain vector clocks when multiple siblings accumulate over time.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture