Saga Pattern: Distributed Transactions Low-Level Design

The Saga pattern is a way to manage data consistency across multiple services in a microservices architecture without using distributed transactions. Instead of a two-phase commit that locks resources across services, a saga is a sequence of local transactions, each with a compensating transaction that undoes its effects if a later step fails. Sagas enable long-running business processes that span multiple services to remain consistent without distributed locking.

Why Not Two-Phase Commit

Two-phase commit (2PC) coordinates distributed transactions across multiple databases: phase 1 (prepare) asks all participants to lock resources and confirm readiness; phase 2 (commit or abort) tells all participants to commit or roll back. Problems: the coordinator is a single point of failure — if it crashes between phases, participants are blocked indefinitely holding locks; all participants must be online simultaneously; it does not scale to services with different database technologies; and it creates lock contention that degrades throughput. Sagas trade ACID isolation for eventual consistency while avoiding these distributed locking problems.

Choreography-Based Sagas

Each service listens for events and emits events — no central coordinator. Order saga example: OrderService creates an order and emits OrderCreated. PaymentService listens for OrderCreated, charges the card, and emits PaymentCompleted or PaymentFailed. InventoryService listens for PaymentCompleted, reserves items, and emits InventoryReserved or InventoryFailed. ShippingService listens for InventoryReserved and schedules delivery.

On failure: PaymentService emits PaymentFailed → OrderService listens and cancels the order. InventoryService emits InventoryFailed → PaymentService listens and issues a refund. Compensating transactions are triggered by failure events, flowing backward through the chain.

Advantages: no central coordinator, loosely coupled services, each service only knows its own events. Disadvantages: complex failure analysis (tracing a failure across services requires correlating events across service logs), no single place to see the saga’s current state, risk of cyclic event dependencies.

Orchestration-Based Sagas

A central saga orchestrator tells each service what to do and handles failures. OrderSagaOrchestrator: step 1, call PaymentService.charge(); if success, step 2, call InventoryService.reserve(); if success, step 3, call ShippingService.schedule(); if any step fails, issue compensating commands in reverse order: call InventoryService.release(), then call PaymentService.refund().

Advantages: the saga state is visible in one place, failure handling is explicit and easy to reason about, easier to add steps. Disadvantages: the orchestrator is a central dependency, tighter coupling between orchestrator and services. Better for complex sagas with many steps; choreography is better for simple two-service interactions.

Compensating Transactions

A compensating transaction semantically reverses a completed local transaction. It must be idempotent (retries must be safe), it must always succeed (a compensation that can fail requires another compensation), and it must be designed upfront — you cannot always compensate an arbitrary operation. Canceling an already-shipped order cannot be compensated by un-shipping it; you issue a return label instead. Design compensating transactions as part of the business logic, not as an afterthought.

Isolation and the Lost Update Problem

Sagas lack ACID isolation — other transactions can read intermediate states while a saga is in progress. A user could see their order as “payment pending” while another process reads the same account balance. Mitigate with: semantic locking (mark records as “processing” to signal other transactions to retry or wait), pessimistic locking within each local transaction, or designing UX to tolerate temporary inconsistency. Accept that sagas trade isolation for availability — this is the CAP theorem at the application level.

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Atlassian Interview Guide

See also: Coinbase Interview Guide

See also: Shopify Interview Guide

See also: Snap Interview Guide

See also: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

Practice at Top Companies

This topic appears in system design interviews at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering, Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering, Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence, Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering, LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale, Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering, Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture, Anthropic Interview Guide 2026: Process, Questions, and AI Safety, Atlassian Interview Guide, Coinbase Interview Guide, Shopify Interview Guide, Snap Interview Guide, Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems, Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems.

Scroll to Top