System Design Interview: Two-Phase Commit and Distributed Transactions

The Distributed Transaction Problem

A transaction that spans multiple databases or services must either fully commit on all participants or fully rollback — partial commits leave the system in an inconsistent state. Transferring money between two bank accounts in different databases, or placing an order that deducts inventory and charges payment in separate services, are classic examples. Single-database ACID transactions cannot span multiple systems. Two-Phase Commit (2PC) is the classical protocol for achieving atomicity across distributed participants.

Two-Phase Commit Protocol

2PC uses a coordinator (the transaction manager) and multiple participants (databases or services). The protocol has two phases:

Phase 1: Prepare (Voting)

  1. The coordinator sends a PREPARE message to all participants
  2. Each participant executes the transaction locally (writes to its log), acquires all necessary locks, and votes YES or NO
  3. A YES vote means “I can commit; I have durably recorded my intent and will not forget”
  4. A NO vote (or timeout) means “I cannot commit; abort”

Phase 2: Commit or Abort

  1. If all participants voted YES: coordinator sends COMMIT to all participants. Each releases locks and makes changes permanent.
  2. If any participant voted NO (or timed out): coordinator sends ABORT to all participants. Each rolls back.

# Pseudocode for coordinator:
def two_phase_commit(participants, transaction):
    # Phase 1
    votes = []
    for p in participants:
        try:
            vote = p.prepare(transaction)
            votes.append(vote)
        except Timeout:
            votes.append("NO")

    # Phase 2
    if all(v == "YES" for v in votes):
        for p in participants:
            p.commit(transaction)   # must succeed; retry until it does
        return "COMMITTED"
    else:
        for p in participants:
            p.abort(transaction)
        return "ABORTED"

The Coordinator Failure Problem

2PC has a critical vulnerability: if the coordinator fails after Phase 1 (participants voted YES) but before Phase 2 (sending COMMIT), participants are stuck. They have acquired locks and cannot proceed or rollback — they must wait for the coordinator to recover. This is called the blocking problem: a coordinator failure blocks all participants for an indefinite period.

Mitigation: the coordinator writes its decision to a durable WAL before sending Phase 2 messages. On recovery, it re-reads the WAL and re-sends the Phase 2 decision to all participants (idempotent commit/abort). This ensures eventual resolution, but the recovery time (coordinator restart) may be minutes. During this window, participants hold locks — causing cascading latency issues in high-throughput systems.

Three-Phase Commit (3PC)

3PC adds a pre-commit phase to address 2PC’s blocking problem. After all YES votes, the coordinator sends a PRE-COMMIT before the final COMMIT. Participants who receive PRE-COMMIT know a commit decision has been made and can proceed without the coordinator in some failure scenarios. However, 3PC is not widely used in practice because: it still blocks under network partitions (split-brain); it adds latency (3 round-trips instead of 2); the Paxos and Raft consensus algorithms solve the underlying problem more robustly.

XA Transactions

XA is the X/Open standard for distributed transaction management. Most enterprise databases (PostgreSQL, MySQL, Oracle) implement XA. Application servers use a Transaction Manager (Atomikos, Bitronix) that plays the coordinator role. The application code uses JTA (Java Transaction API) which looks like a normal local transaction to the developer:


// Java/Spring XA transaction across two databases:
@Transactional
public void transferFunds(Account from, Account to, BigDecimal amount) {
    // These are XA-aware DataSources; the TX manager coordinates 2PC
    fromAccountRepo.debit(from, amount);    // DB1
    toAccountRepo.credit(to, amount);       // DB2
    auditLogRepo.record(from, to, amount);  // DB3
    // If any throws, all rollback automatically
}

XA works well for enterprise systems (Java EE, Spring with JTA) but has significant limitations at web scale: coordinator is a single point of failure; tight coupling between services; poor performance (2 round-trips + lock hold time); cannot span services with heterogeneous protocols (HTTP APIs, Kafka). Most modern microservices systems avoid XA.

The Saga Pattern: Alternative to 2PC

Instead of a distributed lock-based transaction, a Saga is a sequence of local transactions, each publishing an event that triggers the next step. If a step fails, compensating transactions undo previous steps. Sagas trade strict ACID atomicity for eventual consistency — there is a window where the system is in an intermediate state.


# Order Saga (Choreography style):
# 1. OrderService: create order (status=PENDING)
#    → publish OrderCreated event
# 2. InventoryService: reserve inventory
#    → on success: publish InventoryReserved
#    → on failure: publish InventoryReservationFailed
# 3. PaymentService: charge payment
#    → on success: publish PaymentProcessed → OrderService marks CONFIRMED
#    → on failure: publish PaymentFailed
#       → InventoryService receives PaymentFailed → release reservation (compensate)
#       → OrderService marks CANCELLED

Choreography sagas: services react to each other’s events — no central coordinator. Simple but hard to track overall progress. Orchestration sagas: a central Saga Orchestrator sends commands to each service and handles compensations — easier to monitor and debug, but the orchestrator is a coordination point. Frameworks: Conductor (Netflix), Temporal, AWS Step Functions.

When to Use Each Approach

Approach When to use Tradeoff
Single-DB transaction All data in one database Full ACID, no distributed complexity
2PC / XA Small number of DBs, enterprise Java, strict atomicity needed Coordinator SPOF, blocking on failure, poor scalability
Saga (choreography) Microservices with event-driven architecture, eventual consistency acceptable Intermediate states visible, complex compensation logic
Saga (orchestration) Complex multi-step workflows needing visibility and error handling Central orchestrator; well-suited to long-running workflows
Idempotent operations + retry Simple at-least-once delivery acceptable Simplest; requires idempotency at every step

Practical Advice for Interviews

Most interviewers at FAANG/tech companies expect you to avoid 2PC for microservices and instead recommend Saga or idempotent operations with compensating transactions. Reasons: 2PC requires all participants to be available simultaneously; in a microservices architecture with independent deployability, this tight coupling defeats the purpose. Saga aligns with the event-driven, loosely-coupled architecture that scales well. Use 2PC only when: you are using relational databases that support XA, the number of participants is small (2-3), and you need strict atomicity that cannot be approximated by eventual consistency.

Key Interview Points

  • 2PC Phase 1: all participants vote YES/NO after acquiring locks. Phase 2: coordinator sends COMMIT or ABORT based on votes.
  • Blocking problem: coordinator failure after Phase 1 leaves participants holding locks indefinitely
  • XA implements 2PC for Java/enterprise; not suitable for high-scale microservices
  • Saga provides eventual consistency without distributed locks: local transactions + compensating transactions
  • For interviews: recommend Saga over 2PC for microservices; use 2PC only for 2-3 relational DBs with XA support
  • Temporal and AWS Step Functions are production-grade Saga orchestrators

Frequently Asked Questions

What is the blocking problem in two-phase commit?

The blocking problem is 2PC's fundamental weakness. After Phase 1 (all participants vote YES), each participant has acquired locks on the affected data and is waiting for the coordinator's Phase 2 decision. If the coordinator crashes at this exact moment — after receiving all YES votes but before sending COMMIT — participants are stuck. They cannot unilaterally commit (maybe another participant voted NO) and cannot unilaterally abort (maybe the coordinator decided to commit). They must hold their locks and wait for the coordinator to recover. This is called a blocking protocol: participant progress is blocked by coordinator availability. During the coordinator's downtime (seconds to minutes depending on recovery), all locked rows are inaccessible — any concurrent transaction touching those rows also blocks. In high-throughput databases, this cascading blockage can bring the system to a halt. The coordinator writes its Phase 2 decision to a durable write-ahead log before sending messages, so on recovery it re-sends the decision. But the window between Phase 1 completion and coordinator recovery is a hard blocking period. Three-Phase Commit attempts to address this by adding a pre-commit phase, but still blocks under network partitions. Most modern systems avoid 2PC entirely by using the Saga pattern or single-database transactions.

When should you use Saga instead of two-phase commit?

Use Saga over 2PC in virtually all microservices architectures. 2PC requires: (1) all participants to be available simultaneously — a single service being down blocks the transaction; (2) participants to hold database locks during the multi-second coordination window — causing contention under load; (3) a distributed coordinator with potential to be a single point of failure; (4) all participants to implement the XA protocol — many modern services (HTTP APIs, Kafka consumers, NoSQL databases) do not. The Saga pattern trades strict ACID atomicity for eventual consistency: each step performs a local transaction, and failures trigger compensating transactions on previous steps. This is appropriate when: (a) intermediate states are acceptable to external users for a brief period (an order shows "processing" before transitioning to "confirmed" or "cancelled"); (b) compensation logic is well-defined (cancel the inventory reservation, refund the charge); (c) services are independently deployable and heterogeneous. Use 2PC only when: you control a small number of relational databases that support XA, all in the same data center with reliable network, and strict atomicity is a hard requirement (no intermediate state is acceptable). Financial ledger entries are the clearest case for 2PC — you cannot have a situation where one account is debited but the other is not credited.

How does the Saga pattern handle failures and ensure data consistency?

In a Saga, each step is a local transaction on a single service's database. If a step fails, compensating transactions undo the work of all previously completed steps. Consistency is eventual — there is a window where the system is in an intermediate state. Key design rules: (1) Compensating transactions must be idempotent — they may be retried on failure. (2) All steps and compensations must be logged durably so the Saga can be resumed after a crash. (3) The Saga must be designed for "at-least-once" execution — each step may run more than once if the service crashes between the action and the acknowledgment. Two implementation styles: Choreography — services emit events and react to each other's events. No central orchestrator; harder to track overall Saga state. Orchestration — a central Saga Orchestrator (Temporal, AWS Step Functions, or a custom service) sends commands to each participant and handles compensation on failure. The orchestrator persists its state durably, so it resumes correctly after any failure. Production-ready Saga orchestration: Temporal provides persistent workflow execution — the workflow code looks sequential but automatically persists progress and resumes after crashes. AWS Step Functions provides a managed state machine for Sagas with built-in retry logic, compensation handling, and dead-letter queues.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the blocking problem in two-phase commit?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The blocking problem is 2PC’s fundamental weakness. After Phase 1 (all participants vote YES), each participant has acquired locks on the affected data and is waiting for the coordinator’s Phase 2 decision. If the coordinator crashes at this exact moment — after receiving all YES votes but before sending COMMIT — participants are stuck. They cannot unilaterally commit (maybe another participant voted NO) and cannot unilaterally abort (maybe the coordinator decided to commit). They must hold their locks and wait for the coordinator to recover. This is called a blocking protocol: participant progress is blocked by coordinator availability. During the coordinator’s downtime (seconds to minutes depending on recovery), all locked rows are inaccessible — any concurrent transaction touching those rows also blocks. In high-throughput databases, this cascading blockage can bring the system to a halt. The coordinator writes its Phase 2 decision to a durable write-ahead log before sending messages, so on recovery it re-sends the decision. But the window between Phase 1 completion and coordinator recovery is a hard blocking period. Three-Phase Commit attempts to address this by adding a pre-commit phase, but still blocks under network partitions. Most modern systems avoid 2PC entirely by using the Saga pattern or single-database transactions.”
}
},
{
“@type”: “Question”,
“name”: “When should you use Saga instead of two-phase commit?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use Saga over 2PC in virtually all microservices architectures. 2PC requires: (1) all participants to be available simultaneously — a single service being down blocks the transaction; (2) participants to hold database locks during the multi-second coordination window — causing contention under load; (3) a distributed coordinator with potential to be a single point of failure; (4) all participants to implement the XA protocol — many modern services (HTTP APIs, Kafka consumers, NoSQL databases) do not. The Saga pattern trades strict ACID atomicity for eventual consistency: each step performs a local transaction, and failures trigger compensating transactions on previous steps. This is appropriate when: (a) intermediate states are acceptable to external users for a brief period (an order shows “processing” before transitioning to “confirmed” or “cancelled”); (b) compensation logic is well-defined (cancel the inventory reservation, refund the charge); (c) services are independently deployable and heterogeneous. Use 2PC only when: you control a small number of relational databases that support XA, all in the same data center with reliable network, and strict atomicity is a hard requirement (no intermediate state is acceptable). Financial ledger entries are the clearest case for 2PC — you cannot have a situation where one account is debited but the other is not credited.”
}
},
{
“@type”: “Question”,
“name”: “How does the Saga pattern handle failures and ensure data consistency?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “In a Saga, each step is a local transaction on a single service’s database. If a step fails, compensating transactions undo the work of all previously completed steps. Consistency is eventual — there is a window where the system is in an intermediate state. Key design rules: (1) Compensating transactions must be idempotent — they may be retried on failure. (2) All steps and compensations must be logged durably so the Saga can be resumed after a crash. (3) The Saga must be designed for “at-least-once” execution — each step may run more than once if the service crashes between the action and the acknowledgment. Two implementation styles: Choreography — services emit events and react to each other’s events. No central orchestrator; harder to track overall Saga state. Orchestration — a central Saga Orchestrator (Temporal, AWS Step Functions, or a custom service) sends commands to each participant and handles compensation on failure. The orchestrator persists its state durably, so it resumes correctly after any failure. Production-ready Saga orchestration: Temporal provides persistent workflow execution — the workflow code looks sequential but automatically persists progress and resumes after crashes. AWS Step Functions provides a managed state machine for Sagas with built-in retry logic, compensation handling, and dead-letter queues.”
}
}
]
}

Companies That Ask This Question

Scroll to Top