Two-Phase Commit (2PC) Low-Level Design: Distributed Transaction Protocol

Two-phase commit (2PC) is a distributed transaction protocol that ensures all participants in a multi-node operation either all commit or all abort. It solves the atomic commit problem: how do you guarantee that a write to database A and a write to database B either both succeed or both fail, without leaving the system in a partial state? 2PC is the classical answer, available in databases via XA transactions. Understanding its costs and failure modes is essential — most modern distributed systems deliberately avoid it in favor of sagas and idempotent retries.

The Two Phases

Phase 1 — Prepare: The coordinator sends a PREPARE message to all participants. Each participant performs the transaction locally, writes to its WAL, acquires locks, and responds with VOTE_COMMIT (ready) or VOTE_ABORT (cannot commit). The participant does not commit yet — it just promises it can commit if asked.

Phase 2 — Commit or Abort: If all participants voted COMMIT, the coordinator writes COMMIT to its log and sends COMMIT to all participants. Each participant commits, releases locks, and ACKs. If any participant voted ABORT, the coordinator sends ABORT to all, and each rolls back.

Protocol Implementation

class TwoPhaseCoordinator:
    def __init__(self, participants: list):
        self.participants = participants

    def execute(self, transaction_id: str, operations: list) -> bool:
        # Write intent to coordinator log (durable)
        self._log(transaction_id, 'PREPARING')

        # Phase 1: Prepare
        votes = []
        for participant, operation in zip(self.participants, operations):
            try:
                response = participant.prepare(transaction_id, operation)
                votes.append(response == 'VOTE_COMMIT')
            except Exception:
                votes.append(False)

        if all(votes):
            # All voted commit — write decision durably BEFORE sending commits
            self._log(transaction_id, 'COMMIT_DECIDED')
            # Phase 2: Commit
            for participant in self.participants:
                self._commit_with_retry(participant, transaction_id)
            self._log(transaction_id, 'COMPLETED')
            return True
        else:
            self._log(transaction_id, 'ABORT_DECIDED')
            # Phase 2: Abort
            for participant in self.participants:
                try:
                    participant.abort(transaction_id)
                except Exception:
                    pass  # participant will timeout and rollback
            return False

    def _commit_with_retry(self, participant, txn_id: str, max_attempts: int = 10):
        """Once COMMIT is decided, retry until participant ACKs."""
        for attempt in range(max_attempts):
            try:
                participant.commit(txn_id)
                return
            except Exception:
                time.sleep(2 ** attempt)
        # Escalate: coordinator must keep retrying or require manual intervention

PostgreSQL XA Transactions (Real Implementation)

import psycopg2

def transfer_between_databases(amount: float, from_db_url: str, to_db_url: str):
    txn_id = f"transfer-{uuid4().hex}"

    conn_a = psycopg2.connect(from_db_url)
    conn_b = psycopg2.connect(to_db_url)

    try:
        # Phase 1: Prepare both
        conn_a.set_isolation_level(0)  # autocommit for XA
        conn_b.set_isolation_level(0)

        cur_a = conn_a.cursor()
        cur_b = conn_b.cursor()

        cur_a.execute("BEGIN")
        cur_a.execute("UPDATE accounts SET balance = balance - %s WHERE id = 1", [amount])
        cur_a.execute(f"PREPARE TRANSACTION '{txn_id}-a'")  # phase 1

        cur_b.execute("BEGIN")
        cur_b.execute("UPDATE accounts SET balance = balance + %s WHERE id = 2", [amount])
        cur_b.execute(f"PREPARE TRANSACTION '{txn_id}-b'")  # phase 1

        # Phase 2: Both prepared -- commit
        cur_a.execute(f"COMMIT PREPARED '{txn_id}-a'")
        cur_b.execute(f"COMMIT PREPARED '{txn_id}-b'")

    except Exception as e:
        # Abort both
        try:
            cur_a.execute(f"ROLLBACK PREPARED '{txn_id}-a'")
        except Exception:
            pass
        try:
            cur_b.execute(f"ROLLBACK PREPARED '{txn_id}-b'")
        except Exception:
            pass
        raise

The Blocking Problem and Coordinator Failure

2PC has one critical vulnerability: if the coordinator crashes after writing COMMIT_DECIDED but before sending COMMIT to participants, the participants are stuck. They have voted COMMIT and hold their locks — they cannot unilaterally commit or abort without risking inconsistency. They must wait for the coordinator to recover. This is the blocking problem: participant locks are held indefinitely if the coordinator fails at the wrong moment.

Three-phase commit (3PC) attempts to fix this by adding a pre-commit phase, but it is vulnerable to network partitions and is almost never used in practice. The real solutions are: (1) make coordinator failures rare (use Raft/Paxos for coordinator HA), (2) avoid 2PC entirely using the saga pattern, or (3) use a database with built-in distributed transaction support (CockroachDB, Spanner) that handles coordinator recovery internally.

When 2PC Is Acceptable vs When to Use Sagas

Use 2PC when: the operation spans 2-3 databases within the same datacenter, latency is not critical, and you need strict atomicity (financial ledger entries). Use sagas when: the operation spans many services, involves external APIs (payment processors, email), requires high availability, or executes over long durations. The saga pattern accepts eventual consistency in exchange for availability and partition tolerance — the right trade-off for most microservice applications.

Key Interview Points

  • The coordinator must write COMMIT_DECIDED to durable storage before sending commits. If it crashes after the decision but hasn’t persisted it, it may re-decide ABORT on recovery — which would conflict with participants that already committed.
  • Prepared transactions hold locks in PostgreSQL — a participant stuck in PREPARED state blocks reads and writes to affected rows indefinitely. Monitor pg_prepared_xacts and alert on any transaction older than 60 seconds.
  • 2PC adds 2 network round-trips to the critical path — latency doubles compared to a single-database transaction. At p99 latencies of 5ms per hop, that adds 10ms+ to every 2PC operation.
  • Recovery protocol: on coordinator restart, re-read the log and continue from the last recorded state — re-send COMMIT to any participants that haven’t ACKed, re-send ABORT to any participants that haven’t rolled back.
  • In interviews, explain 2PC as a stepping stone to motivate why modern systems prefer the saga pattern for distributed transactions.

Two-phase commit and distributed transaction design is discussed in Stripe system design interview questions.

Two-phase commit and distributed financial transaction design is covered in Coinbase system design interview preparation.

Two-phase commit and distributed systems design is discussed in Google system design interview guide.

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

Scroll to Top