Low Level Design: Distributed Transaction Manager

What Is a Distributed Transaction Manager?

A Distributed Transaction Manager (DTM) coordinates operations that span multiple databases, services, or nodes so that all participants either commit or roll back together. Unlike a local ACID transaction confined to one database, a distributed transaction must guarantee atomicity across heterogeneous resources — each of which may fail independently. The DTM acts as the single source of truth for transaction state, orchestrating prepare and commit phases and persisting enough metadata to recover after crashes.

Data Model

-- Central coordinator store
CREATE TABLE distributed_txn (
    txn_id        UUID PRIMARY KEY,
    status        VARCHAR(20) CHECK (status IN ('INITIATED','PREPARING','PREPARED','COMMITTING','COMMITTED','ABORTING','ABORTED')),
    initiated_at  TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at    TIMESTAMP NOT NULL DEFAULT NOW(),
    timeout_ms    INT NOT NULL DEFAULT 30000,
    coordinator   VARCHAR(128) NOT NULL
);

CREATE TABLE txn_participant (
    txn_id        UUID REFERENCES distributed_txn(txn_id),
    participant   VARCHAR(128) NOT NULL,
    resource_type VARCHAR(64),   -- e.g. postgres, kafka, redis
    vote          VARCHAR(10),   -- YES | NO | NULL
    ack           BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (txn_id, participant)
);

CREATE TABLE txn_log (
    log_id        BIGSERIAL PRIMARY KEY,
    txn_id        UUID NOT NULL,
    event         VARCHAR(64) NOT NULL,
    detail        TEXT,
    logged_at     TIMESTAMP NOT NULL DEFAULT NOW()
);

Core Algorithm

The DTM implements Two-Phase Commit (2PC) as its base protocol, extended with a write-ahead log for durability:

  1. Begin: Client calls beginTransaction(). DTM inserts a row in distributed_txn with status INITIATED and returns a txn_id.
  2. Enlist: Each participating service registers itself via enlistParticipant(txn_id, resourceDSN). DTM inserts a row in txn_participant.
  3. Phase 1 — Prepare: DTM sets status to PREPARING, then sends a PREPARE message to every participant in parallel. Each participant durably writes its prepared state and replies YES or NO.
  4. Decision: If all votes are YES, DTM writes PREPARED then COMMITTING to the log. Any NO vote triggers ABORTING.
  5. Phase 2 — Commit/Abort: DTM broadcasts COMMIT or ROLLBACK to all participants, collecting ACKs. Once all ACKs arrive, status advances to COMMITTED or ABORTED.

Failure Handling and Recovery

The hardest problem in distributed transactions is surviving coordinator or participant crashes mid-protocol. The DTM addresses this with:

  • Coordinator crash after PREPARED: On restart, a recovery thread scans for transactions stuck in PREPARING or COMMITTING and re-drives Phase 2. Because the decision is already durable, re-sending COMMIT is safe and idempotent.
  • Participant crash after voting YES: The participant must replay its prepared log on restart and await the coordinator’s Phase 2 message. It must not unilaterally abort once it has voted YES.
  • Timeout: Transactions stuck in PREPARING beyond timeout_ms are aborted by a background sweeper. Participants receive an explicit ROLLBACK.
  • Heuristic decisions: If a participant cannot reach the coordinator for an extended period, it may make a heuristic commit or abort and record the fact for later reconciliation.

Scalability Considerations

  • Coordinator bottleneck: The coordinator is stateful, so scale it as an active-passive pair with shared durable storage (e.g., PostgreSQL or etcd) rather than active-active.
  • Latency: 2PC adds at least two network round-trips. Keep participants co-located in the same region when possible; use async commit ACK collection in Phase 2 to reduce tail latency.
  • Throughput: Shard the coordinator by txn_id prefix or business domain so multiple coordinator instances handle disjoint transaction sets.
  • Observability: Expose transaction state via a metrics endpoint; alert on any transaction older than 2× timeout_ms that has not reached a terminal state.

Summary

A Distributed Transaction Manager provides ACID guarantees across service boundaries by persisting coordinator state, running a two-phase protocol, and recovering deterministically from failures. The key design trade-off is latency and availability: 2PC blocks participants during coordinator downtime. Evaluate whether saga-based eventual consistency is acceptable before committing to a DTM.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

Scroll to Top