Question 1

What is the difference between active-active and active-passive multi-region setups?

Accepted Answer

Active-passive (single primary): one region accepts all writes; other regions are read-only replicas synchronized via async replication. Writes from non-primary regions must be proxied to the primary, adding cross-region latency (50-200ms RTT). Simple conflict resolution (there is only one writer). Failover requires promoting a replica to primary — typically 30-120 seconds. Active-active (multi-primary): every region accepts writes locally, replicates to others asynchronously. Write latency is always local (fast for all users). Requires conflict resolution when two regions write to the same row concurrently (last-writer-wins, CRDTs, or application-level merge). Failover is instant — other regions already accept writes. Complexity is much higher. Choose active-passive for most applications; use active-active only when write latency for global users is a hard requirement.

Question 2

How do you implement read-your-writes consistency across regions?

Accepted Answer

After a user performs a write that goes to the primary region, their subsequent reads in a different region may hit a replica that hasn't received the replication yet. They see stale data despite having just written. Two solutions: (1) Session pinning — store the primary region LSN at write time in the user's session. Before serving a read from a replica, check if the replica has applied at least that LSN (SELECT pg_last_wal_replay_lsn()). If not, proxy the read to the primary. (2) Sticky routing — for 5-10 seconds after a write, route all reads from that user to the primary region. Store the flag in a local Redis key: SETEX wrote:{user_id} 10 1. Check before each read. Both approaches add slight overhead for the 1-5% of reads that follow a recent write; the other 95%+ of reads are served from local replicas.

Question 3

How do you resolve write conflicts in an active-active setup?

Accepted Answer

Last-writer-wins (LWW): keep the write with the higher timestamp. Requires monotonic timestamps — use Hybrid Logical Clocks (HLC), not wall clocks (which can go backward on NTP sync). Implement as ON CONFLICT DO UPDATE WHERE excluded.hlc_ts > current.hlc_ts. Works well for profile updates where the latest value is correct. CRDTs (Conflict-free Replicated Data Types): design data structures that automatically merge without conflicts. A shopping cart can be a G-Set (grow-only set) — adding an item in two regions simultaneously is not a conflict, both additions are kept, merge is union. Counters use PN-counters (increment/decrement replicated independently). Application-level merge: for complex objects (documents, settings), store an edit log per region and merge on read using operational transforms. CRDTs are the most principled approach; LWW is the simplest.

Question 4

What is RPO and RTO and how do they shape multi-region design?

Accepted Answer

RPO (Recovery Point Objective): maximum data loss acceptable if a region fails. Measured in time: RPO=0 means zero data loss (requires synchronous replication, adds latency). RPO=30s means up to 30 seconds of writes may be lost (async replication is acceptable if lag is under 30s). RTO (Recovery Time Objective): maximum downtime acceptable after a failure. RTO=5m means the system must be serving traffic within 5 minutes of a regional failure. These SLAs directly determine architecture: RPO=0 requires synchronous writes to multiple regions (adds latency, expensive). RPO=60s allows async replication (fast writes, some risk). RTO=1min requires automated failover with pre-warmed replicas. RTO=15min allows manual failover. Define these SLAs explicitly before designing replication — they determine cost, complexity, and latency trade-offs.

Question 5

What is split-brain and how do you prevent it during regional failover?

Accepted Answer

Split-brain occurs when two regions both believe they are the primary and simultaneously accept writes. If region A fails and region B is promoted to primary, then region A comes back online still believing it is primary, both accept writes — the data diverges irrecoverably. Prevention: fencing (STONITH — Shoot The Other Node In The Head). Before promoting region B to primary: (1) Revoke region A's ability to write by updating its DNS record, revoking its database credentials, or terminating its network access. (2) Use a distributed lease: the primary holds a lease from a Raft-based consensus system (etcd, ZooKeeper). The lease has a TTL; region B only accepts promotion after region A's lease has expired and it has acquired a new one. The lease TTL establishes the minimum RTO — you cannot failover faster than the lease expiration.

Multi-Region Replication Low-Level Design: Global Data Distribution and Failover

Replication Topologies

Data Model for Region Routing

Write Routing and Read-Your-Writes Consistency

Conflict Resolution for Active-Active

Replication Lag Monitoring and Failover

Key Interview Points