Why Multi-Region?
Single-region deployments fail entirely during a cloud provider AZ or region outage. Multi-region provides: (1) High availability — survive a full region outage without downtime. (2) Latency — serve users from geographically close regions (50ms vs 200ms). (3) Compliance — data residency requirements (GDPR: EU data must stay in EU). (4) Disaster recovery — RTO (Recovery Time Objective) of minutes, not hours.
Architecture Patterns
Active-Passive: one primary region handles all writes; standby regions serve reads and can be promoted on failure. Simpler consistency (single write source). Failover takes 1-5 minutes (DNS propagation + health check). Writes have single-region latency.
Active-Active: all regions accept reads and writes simultaneously. Lowest latency for users globally. Requires distributed consensus or eventual consistency — concurrent writes to different regions for the same record must be resolved (last-write-wins, vector clocks, or CRDT). Used by DynamoDB Global Tables, Cassandra, CockroachDB.
Active-Active with region affinity: users are pinned to a primary region by their user_id hash. Writes from user always go to their home region. Cross-region writes happen only for shared resources. Avoids most conflict scenarios while still serving reads globally.
Data Replication
Primary Region (us-east-1):
PostgreSQL Primary → Kafka (CDC via Debezium)
→ Kafka MirrorMaker 2 → Secondary Region Kafka
→ PostgreSQL Read Replica (eu-west-1)
Secondary Region (eu-west-1):
Read Replica: serves reads with ~100ms replication lag
Promoted to primary on failover
CDC (Change Data Capture) via Debezium captures every DB change as a Kafka event. Replication lag is typically <1 second for writes of normal volume. Monitoring: track replication_lag metric; alert if > 30 seconds.
Global Load Balancing
Route traffic to the nearest healthy region: (1) GeoDNS: DNS returns different IPs based on the client’s geographic location. TTL=60s — failover takes up to 60s. (2) Anycast routing: announce the same IP from multiple regions; BGP routes to the nearest. Used by Cloudflare. (3) Global load balancer (AWS Global Accelerator, GCP Global LB): layer-4/7 routing with health checks; failover in <30 seconds. Production recommendation: use a global LB with health checks for API traffic; GeoDNS for static assets via CDN.
Failover Procedure
- Health checks detect primary region failure (3 consecutive failures over 30s)
- Global LB stops routing new traffic to the failed region
- Promote read replica in secondary region to primary (pg_promote() in PostgreSQL)
- Update application config to point writes to new primary
- Verify replication lag was < acceptable threshold before promotion (check last_wal_receive_lsn)
- Alert on-call team; begin RCA
RTO: 1-5 minutes. RPO (Recovery Point Objective): data up to the replication lag (typically <1s, worst case 30s if monitoring threshold).
Conflict Resolution for Active-Active
When two regions accept writes for the same record concurrently: (1) Last-Write-Wins (LWW): compare timestamps; most recent update wins. Risk: clock skew between regions can cause earlier writes to incorrectly win. Use hybrid logical clocks (HLC) instead of wall clocks. (2) Application-level merging: for counters, use CRDTs (Conflict-free Replicated Data Types) — a CRDT counter merges by taking the max of each node’s value. (3) Conflict detection + manual resolution: detect conflicts (same record modified in two regions during the same window), store both versions, application resolves on next read. Used by Amazon’s shopping cart (Dynamo paper).
Key Design Decisions
- Active-passive for most services — simpler consistency, acceptable 1-5min failover
- Active-active only for user-specific data where region affinity eliminates most conflicts
- CDC via Kafka for cross-region replication — replayable, auditable, decoupled
- Global LB with health checks — faster failover than GeoDNS alone
- Monitor replication lag continuously — stale replica is worse than no replica
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the difference between active-passive and active-active multi-region architectures?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Active-passive: one primary region handles all write traffic; secondary regions serve reads only and are promoted to primary on failure. Simpler — no conflict resolution needed because there is only one write source. Failover takes 1-5 minutes (health check detection + DNS propagation + replica promotion). RPO (data loss): typically < 1 second of replication lag. Active-active: all regions accept both reads and writes simultaneously. Lowest read and write latency globally — users always write to a nearby region. Requires conflict resolution for concurrent writes to the same record from different regions (last-write-wins, CRDTs, or application merge). Used by DynamoDB Global Tables, Cassandra, CockroachDB. Choose active-passive unless you specifically need write latency below 100ms from multiple geographies or need to survive writes during a primary region outage.”}},{“@type”:”Question”,”name”:”What is CDC (Change Data Capture) and how does it enable cross-region replication?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”CDC captures every database change (INSERT, UPDATE, DELETE) as a stream of events in real time. Tools: Debezium (open source, supports PostgreSQL, MySQL, MongoDB), AWS DMS, Google Datastream. Debezium reads the database write-ahead log (WAL) and publishes change events to Kafka. For cross-region replication: Kafka MirrorMaker 2 replicates topics from the primary region Kafka cluster to the secondary region cluster. The secondary region consumes the change stream and applies it to its read replica. Lag: typically < 1 second. Benefits over logical replication: (1) Kafka provides durable storage and replayability — if the secondary region is down, it catches up when it recovers. (2) Multiple consumers can use the same change stream for different purposes (replication, audit, cache invalidation). (3) Decoupled — primary DB is not directly connected to secondary DB.”}},{“@type”:”Question”,”name”:”How does global load balancing route traffic to the nearest healthy region?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Three approaches: (1) GeoDNS: the DNS resolver returns different IP addresses based on the client's geographic location. Simple to implement; failover speed limited by DNS TTL (60-300 seconds). (2) Anycast: the same IP address is announced from multiple PoPs (points of presence) via BGP. Network routing automatically directs clients to the topologically nearest PoP. Used by Cloudflare and CDNs for <20ms routing decisions. No DNS TTL delay on failover. (3) AWS Global Accelerator / GCP Global LB: managed global load balancer with health checks. Routes TCP/UDP traffic to the nearest healthy region. Health check failure detected in 10-30 seconds; failover in 30-60 seconds. Recommendation: use Global Accelerator or GCP Global LB for API traffic (fastest failover, no DNS dependency), and GeoDNS + CDN for static assets.”}},{“@type”:”Question”,”name”:”What are RTO and RPO, and what values are achievable with multi-region?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”RTO (Recovery Time Objective): maximum acceptable time from failure to restored service. How long can you be down? RPO (Recovery Point Objective): maximum acceptable data loss measured in time. How much data can you afford to lose? With active-passive multi-region: RTO = 1-5 minutes (time to detect failure, promote replica, and update routing). RPO = replication lag at time of failure, typically < 1 second with synchronous replication to a standby, or < 30 seconds with asynchronous replication. With active-active: RTO = seconds (traffic already flowing to all regions; failed region just stops receiving new traffic). RPO = near-zero for writes that succeeded in any region before failure. Without multi-region (single AZ failover): RTO = 1-10 minutes, RPO = seconds. Without any HA: RTO = hours (manual restore from backup), RPO = hours (last backup).”}},{“@type”:”Question”,”name”:”How do you handle conflict resolution in an active-active multi-region database?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Three strategies: (1) Last-Write-Wins (LWW): compare timestamps; the write with the higher timestamp wins. Problem: clock skew between servers — clocks can differ by seconds. Solution: use hybrid logical clocks (HLC) that combine physical time with a logical counter, ensuring monotonically increasing timestamps across nodes. (2) CRDTs (Conflict-free Replicated Data Types): data structures that merge without conflicts by design. G-Counter (grow-only counter): each node tracks its own count; merge by taking the max per node. PN-Counter: pair of G-Counters for increment and decrement. LWW-Register: stores (value, timestamp). CRDTs work for counters, sets, and registers but not arbitrary relational data. (3) Application-level resolution: detect conflicts (two writes to the same record within a time window), store both versions, resolve on next read. Used by Amazon's Dynamo for shopping carts. Operationally simplest but requires application logic for each entity type.”}}]}
Atlassian system design covers distributed and multi-region architecture. See common questions for Atlassian interview: multi-region architecture and high availability design.
Amazon system design covers global infrastructure and multi-region architecture. Review patterns for Amazon interview: multi-region and global infrastructure design.
Databricks system design covers multi-region data replication. See design patterns for Databricks interview: multi-region data replication and consistency.