Question 1

What is the CAP theorem and how does it apply to multi-region databases?

Accepted Answer

CAP theorem: a distributed system can only guarantee two of three properties — Consistency (every read sees the latest write), Availability (every request gets a non-error response), and Partition tolerance (the system continues operating despite network partitions between nodes). In practice, partition tolerance is mandatory for any distributed system (network partitions happen). So the real choice is CP (consistency + partition tolerance, sacrifice availability — reject requests during partitions) vs. AP (availability + partition tolerance, sacrifice strong consistency — return potentially stale data during partitions). CockroachDB/Spanner: CP. Cassandra/DynamoDB: AP (tunable). Most multi-region systems choose AP with eventual consistency for reads, accepting that users may briefly see stale data, because availability is more valuable than perfect consistency for most use cases.

Question 2

How do you handle write conflicts in a multi-leader database setup?

Accepted Answer

In a multi-leader (active-active) setup, two users in different regions can simultaneously write to the same record, creating a conflict. Resolution strategies: (1) Last-Write-Wins (LWW): each write has a timestamp; higher timestamp wins. Simple but risks data loss when concurrent writes occur with similar timestamps. Used by Cassandra. (2) Version vectors: track causal history. If write A causally precedes B, B wins. Concurrent writes are detected and surfaced to the application to resolve. (3) CRDTs: use data structures where all operations commute and converge (counters, sets). No conflicts by construction — merges always produce the same result regardless of order. (4) Avoid conflicts by routing writes for the same entity to the same region (consistent hashing on user_id). Best approach for user-facing data: combine routing (same user always writes to same region) with LWW as a fallback.

Question 3

How does CockroachDB achieve strong consistency across multiple regions?

Accepted Answer

CockroachDB uses the Raft consensus protocol within each range (a 64MB chunk of key-value data). Each range has a leader and a set of replicas. Writes go to the leader, which replicates to a quorum (majority) of replicas before acknowledging. For cross-region strong consistency, a write must reach a quorum of replicas — if replicas span regions, this requires at least one cross-region round trip (~100–200ms latency). CockroachDB uses Hybrid Logical Clocks (HLC) to maintain causal ordering without GPS clocks. For read-heavy global workloads, it supports "follower reads" with a bounded staleness (reads from a nearby replica that is slightly stale) — sacrificing strict consistency for local read latency. This is similar to Google Spanner's stale reads. Use CockroachDB when you need SQL, ACID transactions, and strong consistency at the cost of write latency.

System Design Interview: Design a Multi-Region Database System

Why Multi-Region Databases?

Replication Topologies

Single-Leader (Primary-Replica)

Multi-Leader (Active-Active)

Leaderless (Dynamo-style)

Conflict Resolution Strategies

Read-Your-Writes Consistency

Global Databases in Practice

CockroachDB / Google Spanner

Cassandra (Multi-DC)

DynamoDB Global Tables

Schema and Migration Strategy

Interview Framework