System Design Interview: Multi-Region Active-Active Architecture

Why Multi-Region Active-Active?

A single-region active-passive setup (one primary region serves traffic, one standby region for failover) has two problems: (1) users in distant regions experience high latency — a user in Tokyo reading from us-east-1 adds 150ms of network RTT to every request; (2) the primary region is a potential single point of failure. Active-active deployment runs live traffic in multiple regions simultaneously, each handling local users. This reduces latency by routing users to the nearest region and provides true HA — a region failure affects only local users, not the global service.

Key Design Challenges

  1. Write routing: where do writes go when you have multiple active primaries?
  2. Conflict resolution: what happens when two regions modify the same data concurrently?
  3. Data consistency: how stale can reads be when reading from a local replica?
  4. Regional isolation: failures in one region must not cascade to others
  5. Cross-region latency: network RTT between regions (50-200ms) limits synchronous operations

Traffic Routing: GeoDNS and Anycast

GeoDNS resolves the same domain to different IP addresses based on the user’s geographic location. AWS Route 53 latency-based routing measures actual latency from each AWS region to the user’s DNS resolver and routes to the lowest-latency region. Anycast: the same IP address is advertised from multiple data centers; BGP routing automatically sends packets to the nearest one. Cloudflare uses anycast for all its edge nodes.


# Route 53 latency-based routing:
us-east-1:  api.example.com → 34.x.x.x
eu-west-1:  api.example.com → 52.x.x.x
ap-east-1:  api.example.com → 13.x.x.x
# Route 53 measures latency from user to each region and resolves accordingly
# Health checks: if a region becomes unhealthy, DNS stops routing to it

Write Routing Strategies

Single Primary (Active-Passive with Regional Read Replicas)

All writes go to one region (the primary). Other regions have read replicas. Reads serve from the local replica; writes cross-region to the primary. Effective for read-heavy workloads (content, product catalogs). Write latency from distant regions is high (150ms+ cross-region RTT). Simplest to implement — no conflict resolution needed.

Home Region Routing (Sharded by User)

Each user has a “home region” determined at signup (based on geography or load balancing). All of that user’s writes go to their home region. Other regions store a read replica for cross-region reads. The key insight: if user Alice always writes to us-east-1, there is no conflict — concurrent writes to the same user from different regions are impossible by design. Shopify uses a similar model (tenant-level region assignment).

Multi-Primary with Conflict Resolution

Any region can accept any write. Conflicts occur when two regions modify the same record in the same time window (before replication). Conflict resolution strategies:

  • Last Write Wins (LWW): the write with the higher timestamp wins. Requires synchronized clocks (Google TrueTime, NTP with bounded drift). Loses concurrent writes — one is discarded.
  • Application-defined resolution: custom merge functions per entity type. “Higher balance wins” for account credits; “union of sets” for tags.
  • CRDTs (Conflict-free Replicated Data Types): data structures whose merge operation is commutative and associative — applying updates in any order produces the same result. G-Counter (increment only), OR-Set (add/remove set with tombstones), LWW-Register. No conflicts possible — every update can be merged. Used in Riak, Cassandra counters, some Redis use cases.

Replication Topology

Data must flow between regions for cross-region reads and conflict resolution. Options:

  • Full mesh: every region replicates to every other. Highest availability; all regions have all data. Network cost grows as N². Used for small N (2-5 regions).
  • Hub-and-spoke: primary region replicates to all others; secondaries do not replicate to each other. Simpler but primary region is a bottleneck for replication.
  • CockroachDB multi-region: configures per-table region affinity. Tables with a “home region” designation replicate leaseholders to that region. Global tables (config, reference data) are fully replicated to all regions with fast local reads.

-- CockroachDB multi-region table configuration:
ALTER DATABASE app SET PRIMARY REGION "us-east1";
ALTER DATABASE app ADD REGION "eu-west1";
ALTER DATABASE app ADD REGION "ap-east1";

-- User-partitioned table (writes stay in user's home region):
ALTER TABLE user_profiles SET LOCALITY REGIONAL BY ROW;
-- CockroachDB automatically routes writes to the row's home region

-- Global reference data (config, feature flags):
ALTER TABLE feature_flags SET LOCALITY GLOBAL;
-- Reads served locally from all regions with no cross-region latency

Regional Isolation: Bulkheads and Cell Architecture

A region failure must not cascade. Design principles:

  • Regional database instances: each region has its own database cluster. Cross-region database queries are eliminated from the critical path — a region operates independently even if cross-region replication is paused.
  • Cell architecture: divide each region into independent cells (AWS availability zones, or logical shards within a region). Failures are contained within a cell. Amazon uses two-pizza team cells — each cell handles a subset of users and has its own database, caches, and compute. A cell failure affects 1/N users (not all users in the region).
  • Circuit breakers on cross-region calls: if cross-region requests are timing out (network partition), open the circuit breaker and serve from local data (possibly stale). Degrade gracefully rather than timing out on every request.

Disaster Recovery: RTO and RPO

Active-active is both DR and HA: a region failure triggers automatic traffic failover (Route 53 health checks stop routing to the failed region within 60 seconds). Recovery objectives:

  • RTO (Recovery Time Objective): how long the system can be degraded. With active-active, RTO is the DNS failover time — ~60 seconds.
  • RPO (Recovery Point Objective): how much data loss is acceptable. With asynchronous replication, RPO = replication lag (typically seconds). With synchronous replication across regions, RPO = 0 but write latency increases by cross-region RTT.

Key Interview Points

  • GeoDNS routes users to the nearest region; anycast is an IP-level alternative
  • Avoid multi-primary writes by routing each user to a home region — eliminates conflicts by design
  • If multi-primary is needed: use CRDTs for conflict-free data structures, or LWW with synchronized clocks
  • Replicate with full mesh (small N) or per-table regional affinity (CockroachDB)
  • Regional isolation: independent database instances per region + circuit breakers on cross-region calls
  • Active-active RTO: 60s (DNS failover); RPO: seconds (async replication lag)

Frequently Asked Questions

How do you route users to the nearest region in a multi-region architecture?

Two primary techniques: GeoDNS and anycast. GeoDNS resolves the same domain to different IP addresses based on the geographic location of the user's DNS resolver. AWS Route 53 latency-based routing measures actual network latency from Route 53 resolvers to each AWS region and routes each DNS query to the lowest-latency region. The routing table is continuously updated based on real latency measurements. DNS TTL (typically 60 seconds) determines how quickly clients pick up routing changes — if a region fails and Route 53 stops routing to it, clients with cached DNS responses continue hitting the failed region until TTL expires. Anycast assigns the same IP prefix to multiple data centers and uses BGP routing to deliver packets to the nearest advertising node. No DNS lookup involved — routing is at the IP layer, so failover happens in BGP convergence time (seconds, not minutes). Cloudflare uses anycast for all 300+ PoPs. Combined approach: anycast at the network layer for DDoS resilience and instant failover, with application-level GeoDNS for routing specific services to specific regions.

How do CRDTs solve the conflict resolution problem in multi-region systems?

CRDTs (Conflict-free Replicated Data Types) are data structures designed so that concurrent updates from any region can always be merged into a consistent result, regardless of the order updates are applied. They achieve this by constraining operations to be commutative and associative — the merge of any two states produces the same result no matter which order they are merged. Example: a G-Counter (Grow-only Counter) assigns one counter slot per replica. Increment operations only affect the local replica's slot. The value is the sum of all slots. Two regions can increment simultaneously (both increment their local slot) — when they sync, the merge is simply summing all slots. No conflict possible. More complex types: OR-Set (observed-remove set) allows add and remove operations without conflict by tagging each element with a unique ID — removes only remove the specific tagged version, not concurrent adds. LWW-Register (last-write-wins register) uses timestamps to resolve concurrent writes — the highest timestamp wins. CRDT tradeoff: they constrain your data model. Not all business operations can be expressed as CRDTs — a "set account balance to X" is not CRDT-safe, but "add X to balance" is (using a G-Counter). Design data models around CRDT-friendly operations from the start.

How do you design for regional isolation to prevent cascading failures?

Regional isolation ensures a failure in one region does not take down other regions. Key design principles: (1) Regional database autonomy: each region has its own database cluster that can operate independently. Cross-region replication is asynchronous — if the replication link fails, each region continues serving its local users from its local database. No synchronous cross-region database calls in the critical path. (2) Circuit breakers on cross-region dependencies: if service A in us-east-1 calls service B in eu-west-1 (anti-pattern — avoid this), a circuit breaker opens when cross-region calls start timing out, falling back to cached or default data. (3) Cell architecture within regions: divide each region into independent cells (typically availability zones, or logical shards). A cell failure is contained — load balancer stops routing to that cell, other cells absorb the traffic. With 3 cells per region, each cell handles ~50% more than its normal share during a cell failure — size cells with this headroom. (4) Stateless application servers: application servers hold no local state; all state is in the database. Replacing a failed AZ means just launching new EC2 instances — no data recovery needed. (5) Separate DNS health checks per region: Route 53 removes a region from rotation when health checks fail consistently (e.g., 3 consecutive failures over 30 seconds). This triggers failover to healthy regions automatically.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you route users to the nearest region in a multi-region architecture?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Two primary techniques: GeoDNS and anycast. GeoDNS resolves the same domain to different IP addresses based on the geographic location of the user’s DNS resolver. AWS Route 53 latency-based routing measures actual network latency from Route 53 resolvers to each AWS region and routes each DNS query to the lowest-latency region. The routing table is continuously updated based on real latency measurements. DNS TTL (typically 60 seconds) determines how quickly clients pick up routing changes — if a region fails and Route 53 stops routing to it, clients with cached DNS responses continue hitting the failed region until TTL expires. Anycast assigns the same IP prefix to multiple data centers and uses BGP routing to deliver packets to the nearest advertising node. No DNS lookup involved — routing is at the IP layer, so failover happens in BGP convergence time (seconds, not minutes). Cloudflare uses anycast for all 300+ PoPs. Combined approach: anycast at the network layer for DDoS resilience and instant failover, with application-level GeoDNS for routing specific services to specific regions.”
}
},
{
“@type”: “Question”,
“name”: “How do CRDTs solve the conflict resolution problem in multi-region systems?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “CRDTs (Conflict-free Replicated Data Types) are data structures designed so that concurrent updates from any region can always be merged into a consistent result, regardless of the order updates are applied. They achieve this by constraining operations to be commutative and associative — the merge of any two states produces the same result no matter which order they are merged. Example: a G-Counter (Grow-only Counter) assigns one counter slot per replica. Increment operations only affect the local replica’s slot. The value is the sum of all slots. Two regions can increment simultaneously (both increment their local slot) — when they sync, the merge is simply summing all slots. No conflict possible. More complex types: OR-Set (observed-remove set) allows add and remove operations without conflict by tagging each element with a unique ID — removes only remove the specific tagged version, not concurrent adds. LWW-Register (last-write-wins register) uses timestamps to resolve concurrent writes — the highest timestamp wins. CRDT tradeoff: they constrain your data model. Not all business operations can be expressed as CRDTs — a “set account balance to X” is not CRDT-safe, but “add X to balance” is (using a G-Counter). Design data models around CRDT-friendly operations from the start.”
}
},
{
“@type”: “Question”,
“name”: “How do you design for regional isolation to prevent cascading failures?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Regional isolation ensures a failure in one region does not take down other regions. Key design principles: (1) Regional database autonomy: each region has its own database cluster that can operate independently. Cross-region replication is asynchronous — if the replication link fails, each region continues serving its local users from its local database. No synchronous cross-region database calls in the critical path. (2) Circuit breakers on cross-region dependencies: if service A in us-east-1 calls service B in eu-west-1 (anti-pattern — avoid this), a circuit breaker opens when cross-region calls start timing out, falling back to cached or default data. (3) Cell architecture within regions: divide each region into independent cells (typically availability zones, or logical shards). A cell failure is contained — load balancer stops routing to that cell, other cells absorb the traffic. With 3 cells per region, each cell handles ~50% more than its normal share during a cell failure — size cells with this headroom. (4) Stateless application servers: application servers hold no local state; all state is in the database. Replacing a failed AZ means just launching new EC2 instances — no data recovery needed. (5) Separate DNS health checks per region: Route 53 removes a region from rotation when health checks fail consistently (e.g., 3 consecutive failures over 30 seconds). This triggers failover to healthy regions automatically.”
}
}
]
}

Companies That Ask This Question

Scroll to Top