Question 1

Why is Two-Phase Commit (2PC) avoided in microservices architectures?

Accepted Answer

2PC has three major problems for microservices. First, blocking: if the coordinator crashes after the prepare phase, all participants hold locks indefinitely -- they replied "ready" but cannot commit or abort without coordinator direction. This blocks those records for other operations until the coordinator recovers. Second, performance: two network round-trips plus lock-holding across services means high latency and low throughput. Third, availability: 2PC requires all participants to be reachable. If any service is down during phase 1, the entire transaction must abort -- in a microservices environment with many services and independent deployments, this is frequent. The CAP theorem trade-off: 2PC favors consistency over availability. Saga favors availability with eventual consistency.

Question 2

What is a compensating transaction and when is it not possible?

Accepted Answer

A compensating transaction reverses the effect of a completed step. Examples: inventory reservation compensated by releasing it; payment charge compensated by a refund. Compensation is not always a true undo -- it is a semantic reversal that is logically correct but may not be identical to a rollback. Situations where compensation is impossible: sending an email or SMS (cannot un-send), printing a label (cannot un-print), publishing a post publicly (cannot guarantee all readers forget it). In these cases: (1) defer the irreversible action to the last step of the saga so compensation is rarely needed; (2) use a best-effort compensation (send a "we made a mistake" follow-up email); (3) design the system to handle partial completion gracefully (order shows "pending" until all steps commit).

Question 3

How does the Outbox pattern guarantee exactly-once event delivery?

Accepted Answer

The outbox pattern achieves exactly-once relative to the database transaction. The event is written to an outbox table inside the same database transaction as the business record. Since they share a transaction, they commit or rollback together -- no dual-write race condition. The outbox processor (Debezium CDC or a polling worker) reads undelivered outbox rows and publishes to Kafka. If the publish succeeds: mark the row as delivered (or delete it). If the publish fails: retry. The consumer receives the event at-least-once (because the outbox row may be retried). To achieve exactly-once end-to-end: the consumer must be idempotent -- process the same event twice with the same result. Together: outbox ensures the event is always delivered (at-least-once) + idempotent consumer = effectively exactly-once.

Question 4

How do you choose between choreography and orchestration for a saga?

Accepted Answer

Choreography: each service reacts to events from the previous step. No central coordinator. Better for: simple flows with few steps, teams that want maximum service independence, event-driven architectures already using Kafka. Downsides: saga state is implicit and distributed -- hard to query "what step is order X on?", hard to debug failures, risk of cyclic event chains. Orchestration: a saga orchestrator service maintains explicit state and sends commands. Better for: complex flows with many steps, business processes that need monitoring and dashboards, teams that want clear ownership of business logic. Downsides: orchestrator is a new service to maintain. In practice: choreography for simple 2-3 step flows; orchestration for anything with 4+ steps, complex error handling, or audit requirements.

Question 5

How do you implement saga idempotency to handle retries safely?

Accepted Answer

Each saga step receives a command with a saga_id and step_name. The receiving service generates an idempotency_key = hash(saga_id + step_name). Before processing: check the idempotency_keys table for this key. If found: return the previously stored result (skip processing). If not found: process the command, insert the key and result atomically in the same transaction. This guarantees that retrying the same command produces the same result. For payment steps: the idempotency key maps to the payment provider's idempotency key (Stripe uses Idempotency-Key headers). If the saga retries a payment step, Stripe returns the original payment result rather than charging again. Idempotency keys have a TTL (24-48 hours) after which they can be cleaned up.

System Design: Distributed Transactions — Two-Phase Commit, Saga Pattern, and the Outbox Pattern

The Problem with Distributed Transactions

Two-Phase Commit (2PC)

Saga Pattern

The Outbox Pattern

Idempotency Across Services

Interview Tips