System Design: Microservices Data Patterns — Saga, Outbox Pattern, Dual Write Problem, Transactional Messaging, CQRS

In a microservices architecture, maintaining data consistency across services is the hardest challenge. A single business operation (placing an order) may span multiple services (order, payment, inventory), each with its own database. Traditional distributed transactions (2PC) are too slow and fragile. This guide covers the patterns that production microservices use to maintain consistency: the outbox pattern, sagas, and transactional messaging — essential for senior system design interviews.

The Dual Write Problem

The dual write problem occurs when a service needs to update its database AND publish an event to a message broker. If the service writes to the database and then publishes to Kafka, two failure modes exist: (1) Database write succeeds, Kafka publish fails — the data is saved but downstream services are not notified. The system is inconsistent. (2) Kafka publish succeeds, database write fails — downstream services are notified of a change that did not actually happen. Even worse. You cannot make a database write and a Kafka publish atomic — they are separate systems with no shared transaction. The retry approach does not help: retrying the Kafka publish may succeed, but if the service crashes between the database write and the Kafka publish, the event is lost forever. The dual write problem is fundamental: any time you need to update two separate systems atomically, you face this issue. Solutions: the outbox pattern, change data capture (CDC), and transactional outbox with polling.

The Outbox Pattern

The outbox pattern solves the dual write problem by writing the event to the same database as the business data, in the same transaction. Process: (1) The service writes the business data (INSERT INTO orders) and the event (INSERT INTO outbox_events) in a single database transaction. Both succeed or both fail — atomicity guaranteed by the database. (2) A separate process reads events from the outbox table and publishes them to Kafka. After successful publication, the event is marked as published (or deleted). Outbox table schema: event_id (UUID), aggregate_type (“Order”), aggregate_id (order_id), event_type (“OrderCreated”), payload (JSON), created_at, published (boolean). The outbox reader can be: (1) Polling — a background job queries SELECT * FROM outbox_events WHERE published = false every 1 second. Simple but adds polling load and 1-second latency. (2) CDC (Change Data Capture) — Debezium reads the database transaction log (PostgreSQL WAL) and publishes new outbox rows to Kafka automatically. Near-real-time (sub-second latency), no polling overhead. Debezium is the recommended approach for production outbox implementations.

Saga Pattern Revisited

A saga coordinates a multi-service business operation as a sequence of local transactions, each publishing an event that triggers the next step. If a step fails, compensating transactions undo the previous steps. Example — order placement saga: Step 1: Order Service creates order (status: PENDING) -> publishes OrderCreated. Step 2: Payment Service charges card -> publishes PaymentProcessed or PaymentFailed. Step 3: Inventory Service reserves items -> publishes ItemsReserved or ReservationFailed. Step 4: Order Service updates order (status: CONFIRMED). If Payment fails: Order Service cancels the order (compensating transaction). If Inventory fails after Payment: Payment Service issues a refund (compensating transaction), Order Service cancels order. Each service uses the outbox pattern to reliably publish its events. Two saga coordination styles: (1) Choreography — each service listens for events and decides what to do. No central coordinator. Simple for 3-4 steps but hard to follow for complex flows. (2) Orchestration — a saga orchestrator service sends commands to each service and handles responses. The orchestrator knows the full workflow, making complex sagas easier to manage, debug, and modify.

Compensating Transactions

Compensating transactions are the “undo” for completed saga steps. They are not true rollbacks — they are semantic reversals. A payment refund is not the same as the payment never happening (the customer sees a charge and a refund on their statement). Designing compensations: (1) Idempotent — a compensation may be retried (network failure during compensation). Processing a refund twice must not refund twice. Use an idempotency key. (2) Works on partial state — the original operation may have partially completed. The compensation must handle this. (3) Order independence — in an orchestrated saga, compensations run in reverse order. In a choreographed saga, compensations may run in any order. Design them to be order-independent. Irreversible actions: some operations cannot be compensated. Sending an email cannot be unsent. Shipping a package cannot be unshipped (only a return can be initiated). Strategy: delay irreversible actions to the end of the saga. Place them after all other steps have succeeded. If an earlier step fails, the irreversible action has not occurred yet, so no compensation is needed.

CQRS with Event-Driven Microservices

CQRS (Command Query Responsibility Segregation) in a microservices context: each service publishes events when its state changes (using the outbox pattern). Other services consume these events and build their own read-optimized views. Example: the Order Service publishes OrderCreated with order details. The Search Service consumes it and indexes the order in Elasticsearch (for order search). The Analytics Service consumes it and updates aggregated metrics in ClickHouse. The Notification Service consumes it and sends a confirmation email. Each consumer builds a projection tailored to its use case. The Search Service denormalizes order + customer + product data for fast search. The Analytics Service pre-aggregates for dashboard queries. Benefits: (1) Services are decoupled — the Order Service does not know about search, analytics, or notifications. (2) Each read model is optimized for its access pattern. (3) Adding a new consumer (e.g., a fraud detection service) requires no changes to the Order Service — just consume the existing events. Challenges: eventual consistency (the search index lags behind the Order Service database by seconds), event schema evolution (consumers must handle old event versions), and debugging complexity (tracing a request across multiple services requires distributed tracing with trace_id).

Choosing the Right Pattern

Decision guide: (1) Single service updating its own database + publishing an event — outbox pattern. Solves the dual write problem with database-level atomicity. Always use this when a service needs to publish events. (2) Multi-service business operation requiring rollback — saga pattern (orchestration for complex flows, choreography for simple 3-4 step flows). Each service uses the outbox pattern internally. (3) Multiple services need different views of the same data — CQRS with event-driven projections. Each consumer builds its own read model from events. (4) Need a complete audit trail and temporal queries — event sourcing. Store events as the source of truth, derive state by replaying. Anti-patterns: (1) Distributed transactions (2PC) across microservices — too slow, too fragile, couples services. Use sagas instead. (2) Synchronous chains (service A calls B which calls C) — creates tight coupling and cascading failures. Use async events via Kafka. (3) Shared databases between services — defeats the purpose of microservices (independent deployability, independent scaling). Each service owns its data. In interviews: mention the outbox pattern whenever you design a microservice that needs to publish events. It shows awareness of the dual write problem that many candidates miss.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the dual write problem and how does the outbox pattern solve it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The dual write problem: a service needs to update its database AND publish an event to Kafka. These are separate systems with no shared transaction. If the database write succeeds but Kafka publish fails, downstream services miss the event. If Kafka succeeds but the database fails, downstream services act on a non-existent change. The outbox pattern solves this: write the event to an outbox table in the SAME database as the business data, in the SAME transaction. Both succeed or both fail — database atomicity guarantees consistency. A separate process reads the outbox table and publishes events to Kafka. Options: polling (query every second for unpublished events) or CDC with Debezium (reads the database transaction log and publishes automatically, sub-second latency). After successful Kafka publish, mark the event as published. This converts the dual write into a single atomic database write plus a reliable async publish.”}},{“@type”:”Question”,”name”:”What is the difference between saga orchestration and choreography?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Choreography: each service listens for events and independently decides what to do next. Order Service publishes OrderCreated. Payment Service hears it, charges card, publishes PaymentProcessed. Inventory Service hears it, reserves items. No central coordinator. Pros: simple, decoupled. Cons: hard to follow the full workflow across services, difficult to handle complex compensation logic, and adding/modifying steps requires changes in multiple services. Best for: simple 3-4 step flows. Orchestration: a central saga orchestrator service drives the workflow. It sends commands to each service in sequence, handles responses, and triggers compensations on failure. The orchestrator holds the complete workflow definition. Pros: clear workflow visibility, easier to debug, simpler compensation logic. Cons: the orchestrator is a potential bottleneck and single point of failure (mitigate with HA). Best for: complex multi-step workflows with branching and conditional logic. Most production systems use orchestration for non-trivial sagas.”}},{“@type”:”Question”,”name”:”How do compensating transactions work in sagas?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”When a saga step fails, previously completed steps must be undone via compensating transactions — semantic reversals, not true rollbacks. Example: order saga step 3 (inventory) fails. Compensations: step 2 reversal — Payment Service issues refund. Step 1 reversal — Order Service marks order as cancelled. Design requirements: (1) Idempotent — compensations may be retried (network failure). A refund processed twice must not refund twice. Use idempotency keys. (2) Handle partial state — the original operation may have partially completed. (3) Order-independent — in choreography, compensations may arrive in any order. Irreversible actions: some operations cannot be compensated (sending an email, shipping a package). Strategy: place irreversible actions at the END of the saga. If an earlier step fails, the irreversible action has not occurred. If the irreversible step itself fails, all previous steps succeeded — partial completion may be acceptable.”}},{“@type”:”Question”,”name”:”When should you use the outbox pattern versus direct event publishing?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Always use the outbox pattern when a service needs to update its database and publish an event. Direct publishing (write to DB, then publish to Kafka) has the dual write problem: if the publish fails or the service crashes between the DB write and publish, the event is lost. There is no reliable way to make DB + Kafka atomic without the outbox. The outbox pattern is the standard solution: write business data + event to the same DB transaction. A separate process publishes events from the outbox to Kafka. With Debezium CDC, this adds sub-second latency and requires no polling. The only case where direct publishing is acceptable: when losing an occasional event is tolerable (analytics, non-critical logging) AND the probability of failure between DB write and publish is acceptably low. For any business-critical event (order created, payment processed, user registered), use the outbox pattern. In interviews, mentioning the outbox pattern demonstrates awareness of a subtle but critical consistency issue that many candidates miss.”}}]}
Scroll to Top