System Design Interview: Design a Multi-Region Database System

Why Multi-Region Databases?

A single-region database has two critical weaknesses: latency (users in Europe querying a US database experience 150ms+ RTT) and availability (a regional outage takes down the entire service). Multi-region databases address both by replicating data across geographically distributed data centers.

The fundamental challenge is the CAP theorem: in the presence of a network partition between regions, you must choose between consistency (all regions see the same data) and availability (all regions continue to serve requests). Most production systems choose eventual consistency with careful conflict resolution.

  • Cloudflare Interview Guide
  • Shopify Interview Guide
  • Databricks Interview Guide
  • Netflix Interview Guide
  • Coinbase Interview Guide
  • Stripe Interview Guide
  • Replication Topologies

    Single-Leader (Primary-Replica)

    One region is the primary (accepts writes); other regions are replicas (serve reads). Replication lag is the delay between a write on the primary and its appearance on replicas.

    • Pros: simple, no write conflicts, strong consistency possible with synchronous replication
    • Cons: all writes must go to the primary (adds latency for distant writers), primary failure requires failover (RPO = replication lag, RTO = failover time)
    • Best for: read-heavy workloads where writes can tolerate primary-region latency

    Multi-Leader (Active-Active)

    Multiple regions accept writes. Each leader replicates to others asynchronously.

    • Pros: low write latency from any region, continues operating if any region goes down
    • Cons: write conflicts are inevitable and must be resolved. Conflict resolution strategies: last-write-wins (LWW, by timestamp — risks losing data), application-level merging (CRDT-style), or routing writes for the same entity to the same region (avoiding conflicts via consistent hashing on entity ID)
    • Best for: globally distributed write-heavy workloads (Google Docs, collaborative tools)

    Leaderless (Dynamo-style)

    No designated leader. Any node accepts reads and writes. Uses quorum reads and writes: write to W nodes, read from R nodes, where W + R > N (total replicas). Sloppy quorums + hinted handoff handle node failures. Anti-entropy (Merkle trees) repair divergent replicas.

    • Pros: highest availability, no single point of failure
    • Cons: complex conflict resolution, eventual consistency by default
    • Examples: Cassandra (tunable quorums), DynamoDB, Riak

    Conflict Resolution Strategies

    • Last-Write-Wins (LWW): each write carries a timestamp; higher timestamp wins. Simple but risks data loss when clocks are skewed. Used by Cassandra (with caveats).
    • Version vectors: track a vector of (node_id → version) per object. Detects causality — if version A causally precedes B, B wins. If concurrent (neither precedes the other), surface the conflict to the application.
    • CRDTs: Conflict-free Replicated Data Types. Data structures with merge operations that are commutative, associative, and idempotent. Example: G-Counter (grow-only counter), OR-Set (observed-remove set). Operations always converge regardless of order. Best for: counters, sets, shopping carts.
    • Application-level resolution: surface conflicts to the application; it decides (e.g., merge shopping cart items rather than overwrite).

    Read-Your-Writes Consistency

    After a user submits a form, they expect to see their changes immediately — even with async replication. Strategies:

    1. Route reads to the primary for a short window (1 second) after a write
    2. Track the write timestamp in a client cookie; only read from a replica whose replication lag is before that timestamp
    3. Route the user to the same region (sticky sessions) — their write goes to that region’s leader and they read from the same region’s replica, which is always up-to-date

    Global Databases in Practice

    CockroachDB / Google Spanner

    Distributed SQL with strong consistency (serializable isolation) across regions. Uses TrueTime (GPS/atomic clocks) or HLC (Hybrid Logical Clocks) for global transaction ordering. Trade-off: cross-region transactions pay the latency of a Paxos consensus round across regions (typically 100–200ms for writes spanning two regions). Best for: global financial systems where consistency is non-negotiable.

    Cassandra (Multi-DC)

    Tunable consistency: LOCAL_QUORUM reads/writes from the local datacenter only (fast, but inter-DC replication is async). EACH_QUORUM requires quorum in every datacenter (strong consistency but slow). In practice: use LOCAL_QUORUM for most operations, accept eventual consistency between regions.

    DynamoDB Global Tables

    Multi-active replication across AWS regions. Uses last-write-wins on item level with DynamoDB Streams propagating changes. Writes to any region replicate to all others within seconds. Best for: AWS-native applications needing global low-latency reads and writes.

    Schema and Migration Strategy

    Multi-region schema changes are dangerous — a migration that runs region-by-region creates a window where different regions have different schemas. Follow expand-contract pattern:

    1. Expand: add new column (nullable, backward-compatible). Deploy everywhere.
    2. Migrate: backfill existing rows. Deploy writes to both old and new columns.
    3. Switch reads: once all regions read from new column, remove writes to old column.
    4. Contract: drop old column once all regions are fully migrated.

    Interview Framework

    When asked to design a multi-region database system, address these in order:

    1. Consistency requirement: strong (financial) vs. eventual (social feed). This determines your topology.
    2. Write pattern: write-heavy and global (multi-leader), or write-primary with global reads (single-leader)?
    3. Conflict resolution: LWW for simple cases, CRDTs for counters/sets, application-level for complex merges.
    4. Read-your-writes: sticky sessions or timestamp-based routing.
    5. Failure modes: region partition — what does the system do? Degrade to single-region? Accept stale reads? Reject writes?
    6. Schema migrations: expand-contract pattern, never breaking changes across region boundaries.

    {
    “@context”: “https://schema.org”,
    “@type”: “FAQPage”,
    “mainEntity”: [
    {
    “@type”: “Question”,
    “name”: “What is the CAP theorem and how does it apply to multi-region databases?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “CAP theorem: a distributed system can only guarantee two of three properties — Consistency (every read sees the latest write), Availability (every request gets a non-error response), and Partition tolerance (the system continues operating despite network partitions between nodes). In practice, partition tolerance is mandatory for any distributed system (network partitions happen). So the real choice is CP (consistency + partition tolerance, sacrifice availability — reject requests during partitions) vs. AP (availability + partition tolerance, sacrifice strong consistency — return potentially stale data during partitions). CockroachDB/Spanner: CP. Cassandra/DynamoDB: AP (tunable). Most multi-region systems choose AP with eventual consistency for reads, accepting that users may briefly see stale data, because availability is more valuable than perfect consistency for most use cases.” }
    },
    {
    “@type”: “Question”,
    “name”: “How do you handle write conflicts in a multi-leader database setup?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “In a multi-leader (active-active) setup, two users in different regions can simultaneously write to the same record, creating a conflict. Resolution strategies: (1) Last-Write-Wins (LWW): each write has a timestamp; higher timestamp wins. Simple but risks data loss when concurrent writes occur with similar timestamps. Used by Cassandra. (2) Version vectors: track causal history. If write A causally precedes B, B wins. Concurrent writes are detected and surfaced to the application to resolve. (3) CRDTs: use data structures where all operations commute and converge (counters, sets). No conflicts by construction — merges always produce the same result regardless of order. (4) Avoid conflicts by routing writes for the same entity to the same region (consistent hashing on user_id). Best approach for user-facing data: combine routing (same user always writes to same region) with LWW as a fallback.” }
    },
    {
    “@type”: “Question”,
    “name”: “How does CockroachDB achieve strong consistency across multiple regions?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “CockroachDB uses the Raft consensus protocol within each range (a 64MB chunk of key-value data). Each range has a leader and a set of replicas. Writes go to the leader, which replicates to a quorum (majority) of replicas before acknowledging. For cross-region strong consistency, a write must reach a quorum of replicas — if replicas span regions, this requires at least one cross-region round trip (~100–200ms latency). CockroachDB uses Hybrid Logical Clocks (HLC) to maintain causal ordering without GPS clocks. For read-heavy global workloads, it supports "follower reads" with a bounded staleness (reads from a nearby replica that is slightly stale) — sacrificing strict consistency for local read latency. This is similar to Google Spanner's stale reads. Use CockroachDB when you need SQL, ACID transactions, and strong consistency at the cost of write latency.” }
    }
    ]
    }

    Scroll to Top