Two-Phase Commit (2PC) Low-Level Design: Distributed Transaction Protocol

Two-phase commit (2PC) is a distributed transaction protocol that ensures all participants in a multi-node operation either all commit or all abort. It solves the atomic commit problem: how do you guarantee that a write to database A and a write to database B either both succeed or both fail, without leaving the system in a partial state? 2PC is the classical answer, available in databases via XA transactions. Understanding its costs and failure modes is essential — most modern distributed systems deliberately avoid it in favor of sagas and idempotent retries.

The Two Phases

Phase 1 — Prepare: The coordinator sends a PREPARE message to all participants. Each participant performs the transaction locally, writes to its WAL, acquires locks, and responds with VOTE_COMMIT (ready) or VOTE_ABORT (cannot commit). The participant does not commit yet — it just promises it can commit if asked.

Phase 2 — Commit or Abort: If all participants voted COMMIT, the coordinator writes COMMIT to its log and sends COMMIT to all participants. Each participant commits, releases locks, and ACKs. If any participant voted ABORT, the coordinator sends ABORT to all, and each rolls back.

Protocol Implementation

class TwoPhaseCoordinator:
    def __init__(self, participants: list):
        self.participants = participants

    def execute(self, transaction_id: str, operations: list) -> bool:
        # Write intent to coordinator log (durable)
        self._log(transaction_id, 'PREPARING')

        # Phase 1: Prepare
        votes = []
        for participant, operation in zip(self.participants, operations):
            try:
                response = participant.prepare(transaction_id, operation)
                votes.append(response == 'VOTE_COMMIT')
            except Exception:
                votes.append(False)

        if all(votes):
            # All voted commit — write decision durably BEFORE sending commits
            self._log(transaction_id, 'COMMIT_DECIDED')
            # Phase 2: Commit
            for participant in self.participants:
                self._commit_with_retry(participant, transaction_id)
            self._log(transaction_id, 'COMPLETED')
            return True
        else:
            self._log(transaction_id, 'ABORT_DECIDED')
            # Phase 2: Abort
            for participant in self.participants:
                try:
                    participant.abort(transaction_id)
                except Exception:
                    pass  # participant will timeout and rollback
            return False

    def _commit_with_retry(self, participant, txn_id: str, max_attempts: int = 10):
        """Once COMMIT is decided, retry until participant ACKs."""
        for attempt in range(max_attempts):
            try:
                participant.commit(txn_id)
                return
            except Exception:
                time.sleep(2 ** attempt)
        # Escalate: coordinator must keep retrying or require manual intervention

PostgreSQL XA Transactions (Real Implementation)

import psycopg2

def transfer_between_databases(amount: float, from_db_url: str, to_db_url: str):
    txn_id = f"transfer-{uuid4().hex}"

    conn_a = psycopg2.connect(from_db_url)
    conn_b = psycopg2.connect(to_db_url)

    try:
        # Phase 1: Prepare both
        conn_a.set_isolation_level(0)  # autocommit for XA
        conn_b.set_isolation_level(0)

        cur_a = conn_a.cursor()
        cur_b = conn_b.cursor()

        cur_a.execute("BEGIN")
        cur_a.execute("UPDATE accounts SET balance = balance - %s WHERE id = 1", [amount])
        cur_a.execute(f"PREPARE TRANSACTION '{txn_id}-a'")  # phase 1

        cur_b.execute("BEGIN")
        cur_b.execute("UPDATE accounts SET balance = balance + %s WHERE id = 2", [amount])
        cur_b.execute(f"PREPARE TRANSACTION '{txn_id}-b'")  # phase 1

        # Phase 2: Both prepared -- commit
        cur_a.execute(f"COMMIT PREPARED '{txn_id}-a'")
        cur_b.execute(f"COMMIT PREPARED '{txn_id}-b'")

    except Exception as e:
        # Abort both
        try:
            cur_a.execute(f"ROLLBACK PREPARED '{txn_id}-a'")
        except Exception:
            pass
        try:
            cur_b.execute(f"ROLLBACK PREPARED '{txn_id}-b'")
        except Exception:
            pass
        raise

The Blocking Problem and Coordinator Failure

2PC has one critical vulnerability: if the coordinator crashes after writing COMMIT_DECIDED but before sending COMMIT to participants, the participants are stuck. They have voted COMMIT and hold their locks — they cannot unilaterally commit or abort without risking inconsistency. They must wait for the coordinator to recover. This is the blocking problem: participant locks are held indefinitely if the coordinator fails at the wrong moment.

Three-phase commit (3PC) attempts to fix this by adding a pre-commit phase, but it is vulnerable to network partitions and is almost never used in practice. The real solutions are: (1) make coordinator failures rare (use Raft/Paxos for coordinator HA), (2) avoid 2PC entirely using the saga pattern, or (3) use a database with built-in distributed transaction support (CockroachDB, Spanner) that handles coordinator recovery internally.

When 2PC Is Acceptable vs When to Use Sagas

Use 2PC when: the operation spans 2-3 databases within the same datacenter, latency is not critical, and you need strict atomicity (financial ledger entries). Use sagas when: the operation spans many services, involves external APIs (payment processors, email), requires high availability, or executes over long durations. The saga pattern accepts eventual consistency in exchange for availability and partition tolerance — the right trade-off for most microservice applications.

Key Interview Points

The coordinator must write COMMIT_DECIDED to durable storage before sending commits. If it crashes after the decision but hasn’t persisted it, it may re-decide ABORT on recovery — which would conflict with participants that already committed.
Prepared transactions hold locks in PostgreSQL — a participant stuck in PREPARED state blocks reads and writes to affected rows indefinitely. Monitor pg_prepared_xacts and alert on any transaction older than 60 seconds.
2PC adds 2 network round-trips to the critical path — latency doubles compared to a single-database transaction. At p99 latencies of 5ms per hop, that adds 10ms+ to every 2PC operation.
Recovery protocol: on coordinator restart, re-read the log and continue from the last recorded state — re-send COMMIT to any participants that haven’t ACKed, re-send ABORT to any participants that haven’t rolled back.
In interviews, explain 2PC as a stepping stone to motivate why modern systems prefer the saga pattern for distributed transactions.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What are the two phases of 2PC and what happens in each?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Phase 1 (Prepare): the coordinator sends PREPARE to all participants. Each participant executes the transaction locally — writing to its WAL and acquiring locks — then votes VOTE_COMMIT if it can commit or VOTE_ABORT if it cannot. The participant does not commit yet; it only promises it is ready. Phase 2 (Commit or Abort): if every participant voted COMMIT, the coordinator writes COMMIT_DECIDED to its durable log, then sends COMMIT to all participants. Each participant commits and releases locks. If any participant voted ABORT, the coordinator sends ABORT to all, and each rolls back. The key invariant: the coordinator must write its decision durably before sending it. If the coordinator crashes after deciding but before delivering, it must re-deliver the decision on recovery.”}},{“@type”:”Question”,”name”:”What is the blocking problem in 2PC and when does it occur?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The blocking problem occurs when the coordinator crashes after writing COMMIT_DECIDED but before delivering COMMIT to participants. The participants have voted COMMIT and hold their locks. They cannot unilaterally commit (they might be the only ones the coordinator contacted if others voted ABORT) or abort (the coordinator may have decided COMMIT). They must wait for the coordinator to recover. During this wait, rows are locked and unavailable for other transactions. This is why 2PC is called a blocking protocol — a single coordinator failure can stall the entire system indefinitely. The fix: use a highly available coordinator (Paxos/Raft-replicated coordinator) so recovery is fast, or avoid 2PC altogether.”}},{“@type”:”Question”,”name”:”How does PostgreSQL implement prepared transactions for 2PC?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”PostgreSQL supports the XA protocol via PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED commands. Flow: (1) BEGIN — start a regular transaction. (2) Execute SQL statements (INSERT, UPDATE, etc.). (3) PREPARE TRANSACTION ‘txn-id’ — flush WAL, acquire locks, but do not commit. The transaction is now in a prepared state visible in pg_prepared_xacts. (4) An external coordinator decides: either COMMIT PREPARED ‘txn-id’ or ROLLBACK PREPARED ‘txn-id’. Important: prepared transactions hold row locks indefinitely until resolved. Always monitor pg_prepared_xacts — a prepared transaction older than a few minutes indicates a stuck coordinator and will block vacuum and other operations.”}},{“@type”:”Question”,”name”:”When should you use 2PC vs the saga pattern?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”2PC is appropriate when: you need strict atomicity across 2-3 databases in the same datacenter, latency is not critical, all participants implement the XA protocol, and operations complete in milliseconds. Use 2PC for: cross-database financial ledger entries, atomic metadata updates across two services you own. Avoid 2PC when: the operation involves external APIs (payment processors, email providers) that don’t support XA; the operation is long-running (seconds to minutes); you need high availability during coordinator failure; or you have more than 3-4 participants (latency multiplies). The saga pattern is the right choice for most microservice applications — it trades strict atomicity for availability and handles partial failures explicitly via compensating transactions.”}},{“@type”:”Question”,”name”:”How do you recover from a coordinator crash mid-2PC?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The coordinator logs every state transition to durable storage before acting on it: PREPARING → COMMIT_DECIDED or ABORT_DECIDED → COMPLETED. On restart, the coordinator reads its log and resumes from the last recorded state: if COMMIT_DECIDED was logged, re-send COMMIT to all participants (safe to retry — commit is idempotent); if ABORT_DECIDED was logged, re-send ABORT; if only PREPARING was logged (coordinator crashed before deciding), send ABORT to all participants. Participants that receive a re-delivered COMMIT after already committing must handle it idempotently (return success without error). Participants stuck in prepared state waiting for a decision must be contacted by the recovered coordinator within a bounded timeout — set an alert if any prepared transaction is older than 60 seconds.”}}]}

Two-phase commit and distributed transaction design is discussed in Stripe system design interview questions.

Two-phase commit and distributed financial transaction design is covered in Coinbase system design interview preparation.

Two-phase commit and distributed systems design is discussed in Google system design interview guide.