What is the difference between Event Sourcing and CQRS?

Event Sourcing stores state as an append-only log of events rather than current state. CQRS (Command Query Responsibility Segregation) separates the write model (commands) from the read model (queries). They are complementary but independent: CQRS can be used without event sourcing, and event sourcing can exist without CQRS. Together they provide auditability, time-travel debugging, and optimized read models.

How do you reconstruct aggregate state from events?

The aggregate starts in an empty initial state and replays all events for that aggregate_id in sequence order. Each event type is handled by an apply method that transitions the state. For performance, snapshots checkpoint the state at a known event sequence number. On load, restore from the latest snapshot and replay only events after the snapshot sequence.

How do you handle event schema evolution?

Use event upcasters: functions that transform old event versions into the current version at read time. Maintain event versioning in the event type field (e.g., OrderCreatedV1, OrderCreatedV2). Never modify existing events in the store. Weak schema (JSON, Avro with schema registry) allows adding optional fields without breaking existing consumers.

What are the downsides of event sourcing?

Storage grows continuously since events are never deleted. Full replay of large aggregates is slow without snapshots. Querying current state requires projections rather than direct SQL. Schema evolution is more complex than table migrations. The eventual consistency between write and read models is a source of bugs. Event sourcing adds significant complexity and is not appropriate for all domains.

How does Kafka serve as an event store?

Kafka topics act as event logs: each partition is an ordered, append-only sequence of events. Consumers replay from any offset. Log compaction (keyed by aggregate_id) retains only the latest event per key, serving as a snapshot mechanism. However, Kafka lacks per-aggregate sequence guarantees across partitions and has limited query capabilities, so dedicated event stores (EventStoreDB, Axon Server) are often preferred for complex aggregates.

Low Level Design: Event Sourcing and CQRS

⏱ 7 min read

What Is Event Sourcing?

In traditional systems, the database stores the current state of an entity. When an order changes from PENDING to SHIPPED, you overwrite the old value. Event Sourcing inverts this: the system stores the sequence of events that caused state changes, and the current state is derived by replaying those events. You never update or delete events — the log is append-only and immutable.

This means an Order aggregate is not a row with a status column. It is the result of applying OrderPlaced, PaymentReceived, ItemShipped in sequence. The event log is the source of truth.

Event Store Schema

The core persistence layer in an event-sourced system is an event store — a table (or log partition) designed for append-only writes and sequential reads. A minimal relational schema looks like this:

CREATE TABLE events (
  event_id        UUID PRIMARY KEY,
  aggregate_id    UUID        NOT NULL,
  aggregate_type  VARCHAR(64) NOT NULL,
  sequence_no     BIGINT      NOT NULL,
  event_type      VARCHAR(128) NOT NULL,
  payload         JSONB       NOT NULL,
  occurred_at     TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE (aggregate_id, sequence_no)
);

sequence_no is a per-aggregate counter starting at 1. The UNIQUE constraint on (aggregate_id, sequence_no) is your optimistic concurrency guard: if two concurrent commands try to append event #7 for the same aggregate, one will fail with a unique-violation and must retry. payload stores the event data as JSON; event_type is the discriminator used during deserialization.

Aggregate Reconstruction

To load an aggregate, the repository reads all events for a given aggregate_id ordered by sequence_no, then applies each event to a blank aggregate object:

SELECT * FROM events
WHERE aggregate_id = $1
ORDER BY sequence_no ASC;

Each event type maps to an apply() method on the aggregate. The aggregate accumulates state transitions without ever touching the database write path. This is pure functional: given the same event sequence, you always get the same state. It also makes unit testing trivial — no mocks needed, just feed events and assert state.

Snapshots

Full replay is fine for short-lived aggregates, but a bank account with 10 years of transactions would require replaying millions of events on every load. Snapshots solve this by periodically serializing the aggregate’s current state and storing it alongside the event stream.

A snapshot record contains the aggregate state at a given sequence_no. On load, the repository fetches the latest snapshot, deserializes the aggregate, then replays only events with sequence_no > snapshot.sequence_no. A common heuristic: snapshot every 50–200 events, or whenever the event count exceeds a threshold on load. Snapshots are an optimization — the system must remain correct with or without them.

Event Schema Versioning and Upcasting

Events are immutable once written, but business requirements evolve. An OrderPlaced event from 2021 may lack a currency field added in 2023. The solution is upcasting: a chain of transformers that convert old event payloads to the current schema before the aggregate sees them.

Upcasters are versioned functions: OrderPlaced_v1 → OrderPlaced_v2 → OrderPlaced_v3. Each upcaster handles exactly one schema migration. The aggregate always receives the latest version. An alternative is weak schema (store everything as JSON and let the aggregate handle missing fields defensively), but explicit upcasting is safer for complex migrations. Store a schema_version field in the event payload or as a separate column.

CQRS: Separating Commands from Queries

Command Query Responsibility Segregation (CQRS) splits the application into two models. The write model handles commands (mutations): it loads the aggregate, validates business rules, appends events. The read model handles queries: it reads from a separate, denormalized store optimized for the query pattern.

This separation is powerful because the write model optimizes for consistency (small aggregates, event validation) while the read model optimizes for performance (wide tables, materialized joins, full-text indexes). You can have multiple read models for the same data — one for the customer dashboard, another for the analytics pipeline — without polluting the write model.

Read Model Projections

Projections are the consumers that build read models from the event stream. An event handler subscribes to the event store (or a message bus), receives events in order, and upserts rows into a query-optimized table. For example, an OrderSummaryProjection might maintain a flat order_summaries table with columns for customer name, total amount, and status — a join that would be expensive to compute on every query from the normalized write store.

Projections are rebuildable: if you need a new read model or fix a projection bug, you reset the consumer offset to the beginning of the event stream and replay. This is a key operational advantage over traditional systems where migrations are destructive.

Eventual Consistency and Out-of-Order Events

The write and read sides are eventually consistent. After a command appends an event, the projection consumer processes it asynchronously — there is a lag, typically milliseconds to seconds. The UI must handle this: either show optimistic UI (assume the command succeeded), poll for confirmation, or use WebSockets to push the updated read model once the projection catches up.

Out-of-order delivery is a real concern when using distributed message brokers. Events from the same aggregate should be partitioned by aggregate_id to guarantee ordering within a partition. Cross-aggregate ordering is harder — projections may need to track sequence_no and buffer out-of-order events until gaps are filled, or use idempotent upserts that tolerate reprocessing.

Event Sourcing with Kafka

Kafka is a natural fit as an event store backbone. Each aggregate type maps to a Kafka topic, partitioned by aggregate_id. Producers append events; consumers (projections, downstream services) read at their own pace. Kafka’s log compaction can be used on snapshot topics to retain only the latest snapshot per aggregate.

The challenge: Kafka does not support the optimistic concurrency check (UNIQUE (aggregate_id, sequence_no)) natively. A common pattern is to use a relational event store as the source of truth for writes and mirror events to Kafka via the transactional outbox pattern. Alternatively, Kafka Streams or ksqlDB can maintain aggregate state as materialized tables.

Axon Framework

The Axon Framework (Java) is a purpose-built CQRS/ES framework. It provides: aggregate annotations (@AggregateRoot, @CommandHandler, @EventSourcingHandler), an embedded or server-based event store (Axon Server), snapshot support, saga orchestration, and projection infrastructure. Axon handles sequence number management, snapshotting triggers, and event routing — letting developers focus on domain logic.

Axon Server acts as both event store and message router, supporting clustering and multi-context deployments. For teams not on the JVM, similar patterns are implemented in libraries like EventStoreDB (purpose-built event store by Greg Young, the originator of CQRS), Marten (.NET, uses PostgreSQL), and esdbclient for Go.

Pros, Cons, and When to Use Event Sourcing

Pros: Complete audit log with no extra effort. Time-travel debugging — replay the event stream to any point in time. Event-driven integration — other services subscribe to your events without polling. Decoupled read models. Supports temporal queries ("what was the order state on Tuesday?").

Cons: Storage grows unboundedly (events are never deleted). Query patterns are more complex — you can’t do ad-hoc SQL against the write store. Event schema migration requires upcasting infrastructure. Eventual consistency is unfamiliar to many teams. Debugging projection lag adds operational burden.

When to use it: Financial systems (every cent must be accounted for), compliance-heavy domains, systems requiring audit trails, or anywhere where the history of state changes is as valuable as the current state. Avoid it for simple CRUD domains where the overhead outweighs the benefits.