Why Multi-Region?
Single-region deployments have two problems: (1) Latency — a user in Tokyo making a request to a US-East server experiences 150ms round-trip time before any processing begins. (2) Availability — if the US-East region goes down, all users worldwide lose service. Multi-region architecture addresses both: serve users from the closest region (< 20ms RTT) and survive complete region failure. The challenge: keeping data consistent across geographically distributed databases separated by hundreds of milliseconds of network latency.
The CAP Theorem Applied
CAP Theorem: in the presence of a network partition, a distributed system can provide either Consistency (all nodes see the same data) or Availability (all requests receive a response), but not both simultaneously. For multi-region systems: the network between regions is effectively always “partitioned” — a message from us-east to eu-west takes 80ms one way. Choosing consistency: writes block until all regions acknowledge → every write has 80ms+ additional latency. Choosing availability: serve reads locally with potentially stale data → users in different regions may see different states. Most global systems choose availability + eventual consistency for reads, with strong consistency only where business-critical (payment balances, inventory counts).
Active-Passive (Primary-Secondary) Replication
One region is the primary (handles all writes); other regions are replicas (handle reads, replicate from primary). Write flow: write goes to primary → primary commits → asynchronously replicates to replicas → replicas apply the change (eventually consistent). Read flow from a replica: may return stale data if replication lag is non-zero. Use cases: read-heavy systems where slight staleness is acceptable (product catalog, blog posts, user profiles). Failover: if the primary fails, promote a replica to primary. Replica may be slightly behind — possible data loss (RPO > 0). Synchronous replication (write ack after all replicas confirm) eliminates data loss but adds cross-region write latency.
Active-Active (Multi-Primary) Replication
Multiple regions accept writes simultaneously. Writes from different regions are replicated to all other regions and merged. Challenge: conflict resolution when the same record is updated in two regions simultaneously. Conflict resolution strategies:
- Last-write-wins (LWW): accept the write with the latest timestamp. Simple but can lose data — a write with a slightly earlier timestamp is silently discarded. Suitable for user profile updates where overwriting is acceptable.
- Vector clocks: each write carries a version vector (one counter per region). Concurrent writes (neither dominates the other’s vector) are flagged as conflicts for application-level resolution. Used by Amazon DynamoDB.
- CRDTs: design the data structure to support automatic conflict-free merging — counters, sets, registers with specific semantics.
- Application-level: detect conflicts and present to the user (Dropbox conflict copies, Google Docs version history).
Global Load Balancing
Route users to the nearest healthy region via: (1) GeoDNS — DNS server returns different IP addresses based on the client’s geographic origin (resolved via IP geolocation). Client queries DNS → receives IP of the nearest region → connects to that region. Propagation delay: DNS TTL must be short (60 seconds) to quickly redirect traffic during failover, but short TTL increases DNS query volume. (2) Anycast — same IP address is advertised from multiple regions. BGP routing automatically directs packets to the nearest advertisement. Used by Cloudflare for its network (single IP, routed to nearest PoP). (3) Application-layer redirect — the application itself detects the user’s location and redirects to the nearest region’s endpoint. Adds one redirect hop but is more flexible.
Database Global Patterns
Google Spanner: globally distributed SQL database with external consistency (stronger than serializable). Uses TrueTime (GPS + atomic clocks) to assign globally unique timestamps with bounded uncertainty. Achieves strong consistency across regions at the cost of write latency (must wait for TrueTime uncertainty interval, typically 7ms). Used by Google Ads, F1 (Google Shopping). Available as Cloud Spanner on GCP.
CockroachDB: open-source distributed SQL database with serializable isolation across regions. Uses Raft consensus per range (16MB data chunks). Write latency in multi-region mode: Raft quorum requires acknowledgment from majority of replicas (2 of 3 regions) → latency = RTT to nearest non-local region. Configurable home regions per table or row — data locality reduces latency for region-specific data.
DynamoDB Global Tables: fully managed multi-active replication across up to 3 AWS regions. Last-write-wins conflict resolution. Sub-second replication between regions. Per-region reads are eventually consistent; strong-consistency reads are single-region only.
Data Residency and Compliance
GDPR requires European users’ personal data to be stored in the EU. This means: user data for EU users must be in EU regions, cannot be replicated to US regions without explicit user consent. Multi-region architecture must respect data residency: route EU users to EU regions, store PII in EU-only databases, apply geo-fencing to replication (EU data replicated only within EU). Operational challenge: support queries that join EU and US data (analytics, fraud detection) without violating residency — anonymize/aggregate before cross-region transfer.
Interview Checklist
- Latency requirement → multi-region; single region for simpler deployments
- Read-heavy + tolerate staleness → active-passive (primary-secondary)
- High availability for writes globally → active-active (multi-primary)
- Conflict resolution: LWW (simple), vector clocks (exact), CRDTs (merge-friendly)
- Global routing: GeoDNS (flexibility) or Anycast (low overhead)
- Strong consistency globally: Spanner / CockroachDB (at cost of write latency)
- Compliance: data residency constraints drive region assignments
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What does the CAP theorem mean in practice for multi-region systems?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The CAP theorem states that a distributed system can only guarantee two of three properties: Consistency (all nodes see the same data simultaneously), Availability (every request receives a response), and Partition Tolerance (the system continues operating despite network partitions). In practice, network partitions are unavoidable in distributed systems u2014 the network between two datacenters in different countries will occasionally be slow or unreliable. So the real choice is between Consistency and Availability during a partition. CP systems (choose consistency over availability): if a network partition prevents a region from confirming the latest data, it stops serving requests rather than risk returning stale data. Example: a bank’s balance query waits until it can confirm the current balance, even if this means a timeout for the user. AP systems (choose availability over consistency): continue serving requests even during partitions, accepting that different regions may temporarily see different data. Example: a product catalog shows slightly stale prices during a partition u2014 better than a 503 error. Most web-scale systems choose AP for user-facing features (show stale data rather than errors) and CP for financial operations (balance, inventory count). The PACELC extension adds: Even in the absence of partitions, there is a trade-off between Latency (respond fast) and Consistency (wait for replication).”
}
},
{
“@type”: “Question”,
“name”: “How do active-active multi-region systems handle write conflicts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Write conflicts occur when the same data is modified in two regions before the changes replicate to each other. Resolution strategies: (1) Last-Write-Wins (LWW): each write includes a timestamp. When conflicting writes are detected, the one with the later timestamp wins. Problem: clock skew between regions means “later” is unreliable u2014 NTP synchronizes to within ~1ms, but a write in us-east 500 microseconds after a write in eu-west might have a slightly earlier clock time due to skew. Solution: use logical clocks (Lamport timestamps or hybrid logical clocks) instead of wall clocks. (2) Vector clocks: each write carries a vector of counters (one per region). Two writes with non-comparable vectors (neither dominates) are flagged as concurrent conflicts u2014 the application decides how to merge. Amazon DynamoDB’s conditional writes and optimistic locking use this approach. (3) CRDTs (Conflict-free Replicated Data Types): design the data structure so any two concurrent operations can be automatically merged without conflict. Examples: G-Counter (increment only, merge = max per region), LWW-Element-Set (set with timestamps), OR-Set (supports add and remove). CRDTs eliminate conflicts entirely for their supported operations. (4) Operational Transformation (Google Docs approach): serialize all operations through a single global coordinator, transforming concurrent operations to be commutative. Requires a central authority u2014 limits availability during partition.”
}
},
{
“@type”: “Question”,
“name”: “How does Google Cloud Spanner achieve global strong consistency?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Google Spanner provides external consistency u2014 a stronger guarantee than serializable isolation u2014 across globally distributed databases. The key innovation is TrueTime: Google’s globally synchronized clock system using GPS receivers and atomic clocks in every datacenter. TrueTime provides timestamps with a known error bound: TT.now() returns (earliest, latest) with the guarantee that the true current time is within this interval. Typically the interval is 1-7ms wide. Spanner uses this to assign globally consistent commit timestamps: when a transaction commits, it picks a timestamp T such that T is after all previously committed transactions. Spanner waits for the TrueTime uncertainty interval to pass before returning to the client (“commit wait”), ensuring that any subsequent transaction will have a later TrueTime reading. This wait (1-7ms) is the price of external consistency. Why it works: because TrueTime gives real-time guarantees (not just logical ordering), Spanner can assign globally monotonic timestamps without central coordination. A read at timestamp T is guaranteed to see all transactions committed before T, regardless of which replica serves the read. Practical implications: Spanner multi-region writes take ~50-100ms (cross-region Paxos round trip) versus ~10ms for single-region. Use Spanner when data correctness across regions is non-negotiable (financial systems, global inventory); use eventually consistent systems when performance matters more than perfect consistency.”
}
}
]
}