System Design Interview: Distributed Transactions, 2PC, and the Saga Pattern
When a business operation spans multiple microservices or databases, maintaining consistency is hard. ACID transactions do not cross service boundaries. This guide covers the two main patterns for distributed transactions — Two-Phase Commit (2PC) and the Saga Pattern — along with the trade-offs that make one or the other the right choice.
Why Distributed Transactions Are Hard
Consider an e-commerce order: deduct inventory, charge the credit card, create the order record. If these live in separate services, any step can fail after the previous ones succeeded. Without coordination:
- Inventory deducted, payment fails → customer cannot pay, item held indefinitely
- Payment charged, order creation crashes → money taken, no order created
We need atomicity across services: either all steps succeed or all are rolled back.
Two-Phase Commit (2PC)
2PC uses a coordinator to achieve distributed atomicity:
Phase 1 — Prepare:
Coordinator → "Can you commit?" → Service A, Service B, Service C
Each service: acquire locks, write to WAL, respond "Yes" or "No"
Phase 2 — Commit/Abort:
If all Yes: Coordinator → "Commit!" → all services apply the changes, release locks
If any No: Coordinator → "Abort!" → all services roll back
Strengths: strong consistency, atomic commit across all participants.
Weaknesses:
- Blocking: if the coordinator crashes after Phase 1 but before Phase 2, participants hold locks indefinitely — the system is stuck
- Availability: all participants must be available. One unavailable service blocks the entire transaction
- Latency: two round trips across all participants adds significant overhead
- Not microservice-friendly: requires all services to implement the 2PC protocol and share a coordinator
Use 2PC only when you control all participants and can tolerate reduced availability. Used by: distributed databases (Spanner, CockroachDB) internally, XA transactions in Java EE.
The Saga Pattern
A Saga breaks a distributed transaction into a sequence of local transactions, each with a compensating transaction that undoes it if something later fails.
Order Flow:
Step 1: Reserve inventory → compensate: release inventory
Step 2: Charge payment → compensate: issue refund
Step 3: Create order → compensate: cancel order
Success path: 1 → 2 → 3 ✓
Failure at Step 3: execute compensations in reverse: refund (2) → release inventory (1)
Each step is a local ACID transaction on a single service. No distributed lock. No coordinator blocking.
Choreography-Based Saga
Services communicate via events. Each service listens for events, performs its step, and publishes the next event:
Order Service publishes: OrderPlaced
→ Inventory Service listens, reserves stock, publishes: InventoryReserved
→ Payment Service listens, charges card, publishes: PaymentCharged
→ Order Service listens, creates order record, publishes: OrderCreated
Failure: Payment Service publishes: PaymentFailed
→ Inventory Service listens, releases stock, publishes: InventoryReleased
→ Order Service listens, marks order as cancelled
Pros: loose coupling, no central coordinator, each service only knows about its own step.
Cons: hard to track overall workflow state, difficult to debug, event chains can become complex.
Orchestration-Based Saga
A central Saga Orchestrator explicitly tells each service what to do:
OrderSagaOrchestrator:
1. Send "ReserveInventory" to Inventory Service → await result
2. Send "ChargePayment" to Payment Service → await result
3. Send "CreateOrder" to Order Service → await result
On failure at step N:
For i from N-1 down to 1: send compensation command to service i
Pros: workflow is visible and testable in one place, easier to debug.
Cons: orchestrator becomes a central point of coupling (but not failure, since it is stateless).
Saga vs. 2PC: When to Use Which
| Criterion | 2PC | Saga |
|---|---|---|
| Consistency | Strong (ACID) | Eventual (BASE) |
| Availability | Lower (all must be up) | Higher (partial failures handled) |
| Complexity | Protocol complexity | Compensation logic complexity |
| Cross-service | Difficult | Natural fit |
| Best for | Same database cluster | Microservices, long-running workflows |
Idempotency in Sagas
Saga steps must be idempotent because they may be retried on failure. Use idempotency keys:
# Charge payment step:
INSERT INTO payments (saga_id, step, status)
VALUES ('saga_123', 'charge', 'processing')
ON CONFLICT (saga_id, step) DO NOTHING;
-- If this saga_id + step already exists, it was already processed → skip
-- Return the previously stored result
Interview Tips
- Always start with why standard transactions don't work across services — sets the context
- Explain the coordinator failure scenario in 2PC — this is the blocking problem interviewers probe
- For Sagas: define “compensating transaction” clearly — it is the undo of a step, not a rollback
- Recommend orchestration over choreography for complex workflows — it is easier to debug
- Emphasize idempotency: “Saga steps must be designed so that running them twice produces the same result”
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between the Saga pattern and Two-Phase Commit?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Two-Phase Commit (2PC) achieves strong ACID consistency across distributed participants using a coordinator. Phase 1: coordinator asks all services "can you commit?" and waits for Yes/No. Phase 2: if all Yes, commit; if any No, abort. Problem: if the coordinator crashes after Phase 1, participants hold locks indefinitely (blocking). Also, all participants must be available simultaneously. Saga pattern: breaks the transaction into a sequence of local transactions, each with a compensating transaction (undo). No distributed locks. If step 3 fails, run compensations for steps 2 and 1 in reverse. Saga gives eventual consistency – there is a window where the system is partially applied. Use 2PC when you need strong consistency and control all participants. Use Saga for microservices where availability matters more than immediate consistency.” }
},
{
“@type”: “Question”,
“name”: “What is a compensating transaction in the Saga pattern?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “A compensating transaction is the semantic undo of a completed local transaction. It is NOT a database rollback – the original transaction is already committed. Example: in an e-commerce order saga, Step 2 is "charge payment card" (local transaction on Payment Service). Its compensating transaction is "issue refund" (another local transaction on Payment Service). If Step 3 (create order record) fails, the saga executes: refund payment (compensates Step 2), then release inventory (compensates Step 1). Compensating transactions must be idempotent and eventually successful – they cannot fail permanently. If a compensation fails, the saga is stuck in an inconsistent state, requiring manual intervention or a dead letter queue.” }
},
{
“@type”: “Question”,
“name”: “When should you choose choreography vs. orchestration for saga implementation?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Choreography: services communicate by publishing and subscribing to events. No central coordinator. Order Service publishes OrderPlaced, Inventory Service listens and publishes InventoryReserved, Payment Service listens and publishes PaymentCharged. Pros: loose coupling, no single point of failure, services are independently deployable. Cons: hard to track overall workflow state, debugging requires tracing events across multiple services, complex failure scenarios are hard to reason about. Orchestration: a Saga Orchestrator explicitly sends commands to each service and handles the response. The orchestrator holds the workflow state machine. Pros: workflow is visible in one place, easier to debug and test, explicit error handling. Cons: orchestrator is a coupling point (though not a failure point if stateless). Recommendation: use orchestration for complex workflows with many steps or frequent failure paths. Use choreography for simple 2-3 step workflows.” }
}
]
}