System Design Interview: Event Sourcing and CQRS

⏱ 10 min read

What Is Event Sourcing?

Traditional systems store the current state of data (the latest balance in a bank account, the current status of an order). Event sourcing stores the sequence of events that led to the current state. Instead of updating a row, you append an event: OrderPlaced, ItemAdded, PaymentProcessed, OrderShipped. The current state is derived by replaying all events from the beginning. This is analogous to how a bank ledger works — never erasing transactions, only adding entries.

CQRS: Command Query Responsibility Segregation

CQRS separates the write model (commands that mutate state) from the read model (queries that return data). These two models can use different schemas, databases, and services. Commands go through a command handler that validates business rules and emits events. Queries read from one or more materialized read models (projections) optimized for specific views — often denormalized for fast reads.

CQRS does not require event sourcing, but they pair naturally: events from the write side are consumed by projectors that build read models, creating an event-driven pipeline.

Event Store Design

An event store is an append-only log of domain events, organized by aggregate (the entity whose state changes). An order aggregate has events: OrderCreated → ItemAdded → ItemAdded → PaymentProcessed → Shipped. Each event has: aggregate_id, sequence_number (position within the aggregate), event_type, event_data (JSON payload), occurred_at, metadata (user_id, correlation_id).


-- Event store table (PostgreSQL)
CREATE TABLE events (
    id              BIGSERIAL PRIMARY KEY,
    aggregate_id    UUID NOT NULL,
    aggregate_type  TEXT NOT NULL,        -- 'Order', 'BankAccount'
    sequence_number INT  NOT NULL,        -- monotonic within aggregate
    event_type      TEXT NOT NULL,        -- 'OrderPlaced', 'ItemAdded'
    event_data      JSONB NOT NULL,
    metadata        JSONB,
    occurred_at     TIMESTAMPTZ DEFAULT now(),
    UNIQUE (aggregate_id, sequence_number)  -- optimistic concurrency
);

CREATE INDEX ON events (aggregate_id, sequence_number);
CREATE INDEX ON events (occurred_at);     -- for projector catchup

The UNIQUE constraint on (aggregate_id, sequence_number) provides optimistic concurrency control: if two processes try to write sequence 5 for the same aggregate simultaneously, only one succeeds. The loser must re-read the aggregate state and retry.

Loading and Saving Aggregates


class OrderAggregate:
    def __init__(self, order_id):
        self.order_id = order_id
        self.items = []
        self.status = None
        self.sequence = 0
        self._pending_events = []

    def place(self, customer_id, items):
        if self.status is not None:
            raise Exception("Order already placed")
        self._apply(OrderPlaced(self.order_id, customer_id, items))

    def _apply(self, event):
        # Update in-memory state
        if isinstance(event, OrderPlaced):
            self.status = 'placed'
            self.items = event.items
        elif isinstance(event, OrderShipped):
            self.status = 'shipped'
        self.sequence += 1
        self._pending_events.append(event)

    @classmethod
    def load(cls, aggregate_id, events):
        order = cls(aggregate_id)
        for event in events:
            order._apply(event)
        order._pending_events.clear()  # loaded events are not new
        return order

# Repository:
def save(order, event_store):
    for i, event in enumerate(order._pending_events):
        event_store.append(
            aggregate_id=order.order_id,
            sequence_number=order.sequence - len(order._pending_events) + i,
            event=event
        )

Projections (Read Models)

A projection consumes the event stream and builds a read-optimized view. A customer order list projection maintains a denormalized table: (customer_id, order_id, status, item_count, total_amount). When OrderPlaced is received, insert a row. When OrderShipped is received, update the status column. This read model is queried by the “My Orders” page with a simple SELECT WHERE customer_id = :id — no joins, no aggregation.

Multiple projections can consume the same event stream for different purposes: one for the customer’s order list, one for the warehouse fulfillment queue, one for fraud analysis. Projections can be rebuilt from scratch by replaying all events — powerful for bug fixes and new features.

Snapshotting for Performance

Loading an aggregate by replaying 10,000 events on every command is slow. Snapshots periodically persist the current aggregate state. After applying every 100 events (configurable), serialize the aggregate state and store it as a snapshot. On load: fetch the latest snapshot, then replay only events after the snapshot’s sequence number.


CREATE TABLE snapshots (
    aggregate_id    UUID NOT NULL,
    sequence_number INT  NOT NULL,
    snapshot_data   JSONB NOT NULL,
    created_at      TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (aggregate_id, sequence_number)
);

-- Load: get latest snapshot, then events after it
WITH latest_snap AS (
    SELECT * FROM snapshots WHERE aggregate_id = $1
    ORDER BY sequence_number DESC LIMIT 1
)
SELECT * FROM events
WHERE aggregate_id = $1
AND sequence_number > (SELECT sequence_number FROM latest_snap)
ORDER BY sequence_number;

Eventual Consistency and Dealing With It

The read model is updated asynchronously after the write. A user places an order; the command succeeds; the projection updates 50-200ms later. During this window, the user’s order list may not show the new order. Solutions:

Optimistic UI: immediately show the new order in the UI using client-side state, before the projection confirms. If the read model is eventually consistent, the UI self-corrects on next poll.
Read from write side: for the confirmation page, return the just-placed order data from the command response directly, bypassing the read model.
Causality tokens: the command returns a sequence number (the event’s position). The read request includes this token; the query service waits until the projection has processed up to that sequence number before returning. Similar to GTID-based read consistency in databases.

When NOT to Use Event Sourcing

Event sourcing adds significant complexity: two models to maintain, projection rebuilds to manage, eventual consistency to handle. It is valuable when:

The audit log of every state change is a hard requirement (financial transactions, medical records, regulatory compliance)
You need to rebuild historical state at any point in time (“what was this order’s status on March 15?”)
Multiple read models with different shapes are needed from the same data

Avoid event sourcing for: simple CRUD applications, teams new to distributed systems (learning curve is steep), or when the business domain has no meaningful event history. Many systems use event sourcing only for their core domain (orders, payments) while using traditional CRUD for peripheral features (user preferences, notification settings).

Key Interview Points

Event store is append-only — events are immutable; current state is derived by replay
UNIQUE(aggregate_id, sequence_number) provides optimistic concurrency without explicit locks
Projections are read-optimized views built from the event stream; multiple projections can coexist
Snapshot every N events to avoid replaying the full history on every load
Eventual consistency between command and query sides requires UI or API strategies to handle
Event sourcing is not always appropriate — complexity is high; apply where audit trail and temporal queries are business requirements

Frequently Asked Questions

What is the difference between event sourcing and event-driven architecture?

Event-driven architecture (EDA) is a communication pattern where services publish events to a broker (Kafka, RabbitMQ) and other services subscribe to react. Services still store current state in a traditional database — events are used for inter-service communication. Event sourcing is a storage pattern where the primary store for an entity's state is a sequence of domain events — the database contains the event log, not the current state rows. Current state is derived by replaying events. These concepts are complementary but distinct. A system can use event-driven architecture without event sourcing (microservices communicating via Kafka, each with their own relational database). A system can use event sourcing without a message broker (write events to the event store, poll for projector updates). Systems like financial trading platforms often use both: event sourcing for the order aggregate (immutable audit trail of all order state changes) combined with event-driven architecture (trade events published to Kafka for downstream settlement, reporting, and risk systems).

How do you handle schema evolution in an event store — changing event formats over time?

Events in an event store are immutable and may be replayed decades into the future. If the event schema changes (new fields added, fields renamed), replaying old events with new code breaks. Strategies: (1) Weak schema / additive changes only: only add new optional fields — never remove or rename fields. Old events are missing the new field; code handles this with defaults. (2) Upcasting: when loading old events, an upcast function transforms them to the latest schema before passing to the aggregate. The upcast chain is: v1 → v2 → v3. Old events are stored as-is but normalized on read. (3) Event versioning with discriminators: store an event_version field. Code switches on version to handle each format. (4) Copy-transform-publish: when a significant schema change is needed, read all old events, transform them to new format, and write to a new event stream. The old stream is kept as an archive. Version your event types explicitly (OrderPlaced_v1, OrderPlaced_v2) and maintain upcasters for each transition.

When should you NOT use event sourcing?

Event sourcing is powerful but adds substantial complexity that is only justified in specific scenarios. Avoid it when: (1) Simple CRUD suffices: if the application is a content management system, user profile editor, or simple form-based app with no meaningful history, event sourcing adds complexity with no benefit. (2) Team is inexperienced with the pattern: the learning curve is steep — projections, eventual consistency, snapshot management, and schema evolution all require careful design. Introducing it to a team unfamiliar with distributed systems is high-risk. (3) Reporting is the primary use case: if you mostly need analytical queries (not audit trails or temporal queries), a data warehouse with slowly changing dimensions is simpler. (4) Low event volume: event sourcing is most efficient with high write rates; for rarely-updated entities, a traditional database with an audit log table achieves the same audit goal with far less complexity. Use event sourcing when: hard regulatory audit requirements demand immutable history (financial transactions, medical records), when you need to reconstruct past state at any point in time, or when multiple bounded contexts need to react to state changes in the same aggregate.

Shopify Interview Guide

LinkedIn Interview Guide

Netflix Interview Guide

Plaid Interview Guide

Coinbase Interview Guide

Companies That Ask This Question

Stripe Engineering Interview Guide

Uber Engineering Interview Guide

Meta Engineering Interview Guide

Atlassian Engineering Interview Guide

Asked at: Databricks Interview Guide