What is a distributed transaction and why is it hard to implement?

A distributed transaction is an operation that spans multiple independent services or databases and must either fully commit or fully roll back across all participants. It is hard to implement because network failures, partial failures, and the absence of a shared clock make it impossible to guarantee atomicity without coordination protocols. Engineers must choose between correctness guarantees and availability when designing these systems.

How do you ensure atomicity in a distributed transaction without a global lock?

Atomicity can be achieved without a global lock by using protocols like Two-Phase Commit (2PC), which coordinates a prepare phase and a commit phase across all participants, or by adopting the Saga pattern, which breaks the transaction into a sequence of local transactions with compensating rollbacks. Each approach trades off latency, availability, or implementation complexity differently. In practice, many teams prefer Sagas for long-lived transactions and 2PC only for tightly coupled services where locking windows are acceptable.

What role does idempotency play in distributed transaction design?

Idempotency is critical because network retries and partial failures mean the same operation may be delivered more than once to a participant. Designing each step to be idempotent — producing the same result whether applied once or multiple times — allows the coordinator to safely retry without corrupting state. Common techniques include idempotency keys, conditional writes, and deduplication tables.

When would you choose eventual consistency over strong consistency for a distributed transaction?

Eventual consistency is preferred when high availability and low latency matter more than immediate correctness, such as in user-facing features like shopping carts, recommendation feeds, or notification delivery. Strong consistency is reserved for cases where incorrect intermediate state is unacceptable, such as financial transfers or inventory deductions. The decision depends on the business tolerance for anomalies and the cost of compensating for them after the fact.

Low Level Design: Distributed Transaction Manager

⏱ 4 min read

What Is a Distributed Transaction Manager?

A Distributed Transaction Manager (DTM) coordinates operations that span multiple databases, services, or nodes so that all participants either commit or roll back together. Unlike a local ACID transaction confined to one database, a distributed transaction must guarantee atomicity across heterogeneous resources — each of which may fail independently. The DTM acts as the single source of truth for transaction state, orchestrating prepare and commit phases and persisting enough metadata to recover after crashes.

Data Model

-- Central coordinator store
CREATE TABLE distributed_txn (
    txn_id        UUID PRIMARY KEY,
    status        VARCHAR(20) CHECK (status IN ('INITIATED','PREPARING','PREPARED','COMMITTING','COMMITTED','ABORTING','ABORTED')),
    initiated_at  TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at    TIMESTAMP NOT NULL DEFAULT NOW(),
    timeout_ms    INT NOT NULL DEFAULT 30000,
    coordinator   VARCHAR(128) NOT NULL
);

CREATE TABLE txn_participant (
    txn_id        UUID REFERENCES distributed_txn(txn_id),
    participant   VARCHAR(128) NOT NULL,
    resource_type VARCHAR(64),   -- e.g. postgres, kafka, redis
    vote          VARCHAR(10),   -- YES | NO | NULL
    ack           BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (txn_id, participant)
);

CREATE TABLE txn_log (
    log_id        BIGSERIAL PRIMARY KEY,
    txn_id        UUID NOT NULL,
    event         VARCHAR(64) NOT NULL,
    detail        TEXT,
    logged_at     TIMESTAMP NOT NULL DEFAULT NOW()
);

Core Algorithm

The DTM implements Two-Phase Commit (2PC) as its base protocol, extended with a write-ahead log for durability:

Begin: Client calls beginTransaction(). DTM inserts a row in distributed_txn with status INITIATED and returns a txn_id.
Enlist: Each participating service registers itself via enlistParticipant(txn_id, resourceDSN). DTM inserts a row in txn_participant.
Phase 1 — Prepare: DTM sets status to PREPARING, then sends a PREPARE message to every participant in parallel. Each participant durably writes its prepared state and replies YES or NO.
Decision: If all votes are YES, DTM writes PREPARED then COMMITTING to the log. Any NO vote triggers ABORTING.
Phase 2 — Commit/Abort: DTM broadcasts COMMIT or ROLLBACK to all participants, collecting ACKs. Once all ACKs arrive, status advances to COMMITTED or ABORTED.

Failure Handling and Recovery

The hardest problem in distributed transactions is surviving coordinator or participant crashes mid-protocol. The DTM addresses this with:

Coordinator crash after PREPARED: On restart, a recovery thread scans for transactions stuck in PREPARING or COMMITTING and re-drives Phase 2. Because the decision is already durable, re-sending COMMIT is safe and idempotent.
Participant crash after voting YES: The participant must replay its prepared log on restart and await the coordinator’s Phase 2 message. It must not unilaterally abort once it has voted YES.
Timeout: Transactions stuck in PREPARING beyond timeout_ms are aborted by a background sweeper. Participants receive an explicit ROLLBACK.
Heuristic decisions: If a participant cannot reach the coordinator for an extended period, it may make a heuristic commit or abort and record the fact for later reconciliation.

Scalability Considerations

Coordinator bottleneck: The coordinator is stateful, so scale it as an active-passive pair with shared durable storage (e.g., PostgreSQL or etcd) rather than active-active.
Latency: 2PC adds at least two network round-trips. Keep participants co-located in the same region when possible; use async commit ACK collection in Phase 2 to reduce tail latency.
Throughput: Shard the coordinator by txn_id prefix or business domain so multiple coordinator instances handle disjoint transaction sets.
Observability: Expose transaction state via a metrics endpoint; alert on any transaction older than 2× timeout_ms that has not reached a terminal state.

Summary

A Distributed Transaction Manager provides ACID guarantees across service boundaries by persisting coordinator state, running a two-phase protocol, and recovering deterministically from failures. The key design trade-off is latency and availability: 2PC blocks participants during coordinator downtime. Evaluate whether saga-based eventual consistency is acceptable before committing to a DTM.