What is the difference between at-least-once and exactly-once delivery?

At-least-once: a message is guaranteed to be delivered but may be delivered more than once. Consumer commits offset after processing. If consumer crashes after processing but before committing, it re-processes on restart. Make the consumer idempotent (use a message ID + processed-messages set to skip duplicates). Easier to implement and the standard for most systems. Exactly-once: a message is delivered and processed exactly once. Kafka achieves this with transactional producers (atomic publish + offset commit) and idempotent consumers. More complex and has a performance cost. Required for financial transactions, billing, inventory updates where duplicate processing causes real harm. Most event-driven systems accept at-least-once with idempotent consumers rather than paying the exactly-once overhead.

How does Kafka achieve high throughput?

Four design decisions: (1) Sequential disk I/O - messages are appended to a log file. Sequential writes on spinning disks are 100x faster than random writes; on SSDs the gap is smaller but still significant. (2) Zero-copy transfer - Kafka uses sendfile() (Linux) to transfer data from page cache to network socket without copying to user space. (3) Batching - producers batch multiple messages before sending; consumers fetch in large batches. Batching amortizes network round-trip overhead. (4) Page cache usage - the OS page cache acts as an L2 cache. Kafka does not maintain its own cache, instead relying on the OS to cache recently written data. Consumers reading recent messages (typical case) get served from page cache, not disk.

How do Kafka consumer groups enable parallel processing?

A consumer group subscribes to one or more topics. Each partition of a topic is assigned to exactly one consumer in the group at a time. This prevents duplicate processing within the group. If a consumer fails, its partitions are reassigned to remaining consumers (rebalancing). Maximum parallelism = number of partitions. If you have 6 partitions and 6 consumers in a group, each consumer handles 1 partition. Adding a 7th consumer gives no benefit - it sits idle. Multiple consumer groups can read the same topic independently, each getting all messages (fan-out). Use case: one consumer group for real-time processing, another for batch analytics, both reading the same event stream.

What is log compaction in Kafka and when do you use it?

Log compaction retains only the most recent value for each message key. For a topic with keys like user_id, compaction keeps only the latest update per user - older versions are garbage collected. The compacted topic acts like a key-value store changelog that can be replayed to reconstruct the current state. Use cases: database CDC (Change Data Capture), materialized view synchronization, event sourcing for configuration. Contrast with time-based retention (keep all messages for 7 days then delete): retention is for event streams where every event matters; compaction is for state streams where only the current value matters. A tombstone (null value for a key) signals that the key was deleted and should be removed during compaction.

How does a dead letter queue (DLQ) work?

When a consumer fails to process a message after N retries (e.g., 3 attempts with exponential backoff), it routes the message to a separate DLQ topic instead of blocking the main consumer. The DLQ captures messages that cannot be processed due to: malformed payloads, transient downstream failures that exceed retry budget, business logic errors (e.g., referencing a non-existent entity). The main consumer continues processing without being blocked. A separate DLQ consumer (often human-monitored or lower-SLA) processes DLQ messages: investigate the root cause, fix the issue, and replay the message back to the main topic. DLQs are critical for any system where blocking on failed messages would cause cascading delays.

Message Queue System Low-Level Design

⏱ 10 min read

Message queues like Kafka, RabbitMQ, and SQS show up in both system design and low-level design interviews. Understanding the internals helps you answer follow-up questions about ordering, durability, and exactly-once delivery.

Why Message Queues

Message queues solve four problems:

Async decoupling: The order service does not wait for the email service to respond. It publishes an event and moves on. If the email service is down, the message waits in the queue.
Traffic spike absorption: Black Friday sends 100x normal order volume. The queue buffers the spike; downstream services process at their own pace without falling over.
Fan-out to multiple consumers: One order event triggers fulfillment, email notification, analytics update, and fraud check simultaneously – each consumer reads independently.
Durability: Messages are persisted to disk. If the consumer crashes mid-processing, the message is not lost.

Core Concepts

Producer: Any service that publishes messages. Producers send to a topic, not directly to a consumer. They do not know who consumes the message.
Topic: A named category or feed for messages. Think of it as a table in a database or a channel in Slack. Topics are logical; partitions are physical.
Partition: A topic is divided into N partitions. Each partition is an ordered, immutable sequence of records. Parallelism within a topic is bounded by partition count.
Consumer Group: A set of consumers that jointly consume a topic. Each partition is assigned to exactly one consumer in the group at a time – this is how Kafka achieves parallel consumption without duplicate processing within a group.
Offset: A monotonically increasing integer identifying a message’s position within a partition. Consumers track their offset to know where to resume after restart.

Partition Strategy

How you assign messages to partitions determines ordering guarantees and parallelism:

Key-based partitioning: hash(message_key) % num_partitions. All messages with the same key go to the same partition, guaranteeing order within that key. Use when order matters per entity – e.g., all events for user_id=123 must be processed in order.
Round-robin: Messages distributed evenly across partitions. Maximum parallelism, but no ordering guarantee across messages. Use for independent events like log lines or metrics.
Custom partition function: Producer-side logic to route to a specific partition. Use when you have business logic that determines co-location – e.g., VIP customers get their own partition for priority processing.

Design rule: if a consumer needs to see all events for a given entity in order, make sure all those events share the same partition key.

Storage Layer

Kafka’s storage design is why it is so fast:

Append-only log: Each partition is stored as a sequence of segment files on disk. New messages are always appended to the active segment – no random writes, no indexing overhead. Sequential disk writes saturate disk bandwidth (500+ MB/s on spinning disk, far above random write throughput).
Segment rotation: When the active segment reaches a size limit (default 1GB) or time limit (default 7 days), it is closed and a new segment opens. Closed segments are immutable.
Sparse index: Each segment has an accompanying .index file mapping offsets to byte positions. Kafka does not index every message – it uses binary search within a segment and then scans forward. This keeps index size small.
Zero-copy transfers: When a consumer reads, Kafka uses sendfile() to copy data from disk directly to the network socket – bypassing user space entirely. This is why Kafka can serve millions of reads/sec without saturating CPU.

Consumer Groups

Consumer groups are how Kafka enables parallel consumption:

Topic: orders (4 partitions)
         P0  P1  P2  P3
Group A: C1  C2  C1  C2   (2 consumers, each handles 2 partitions)
Group B: C3  C3  C4  C4   (independent read of same topic)

Rules:

Each partition is assigned to at most one consumer per group at a time.
If you have more consumers than partitions, some consumers sit idle. Adding partitions later allows more parallelism.
Different consumer groups are completely independent – they each maintain their own offsets and can be at different positions in the log.
Offsets are committed to a special internal topic called __consumer_offsets. This makes offset tracking durable and removes the need for ZooKeeper in newer Kafka versions.

Rebalancing happens when a consumer joins or leaves a group. During rebalance, partition assignments are redistributed and consumption pauses briefly. Sticky assignor and cooperative rebalancing reduce this disruption.

At-Least-Once vs Exactly-Once Delivery

At-most-once: Commit offset before processing. If consumer crashes after commit but before processing, message is lost. Acceptable for metrics and analytics where some data loss is tolerable.

At-least-once: Commit offset after processing. If consumer crashes after processing but before committing, it re-processes the message on restart. This is the default and most common mode. Your consumer must be idempotent – processing the same message twice should be safe.

Exactly-once: The hard problem. Kafka achieves this via:

Idempotent producer: Each message gets a sequence number. The broker deduplicates retries within a session. Enabled with enable.idempotence=true.
Transactional producer: Atomic write spanning multiple partitions. Either all messages in the transaction are visible, or none are. Enables read-process-write patterns where you atomically consume from one topic and produce to another.
Idempotent consumer: Even with transactional producers, consumers should track processed message IDs (e.g., in a database) to handle edge cases.

In practice: use at-least-once with idempotent consumers for 90% of use cases. Reserve exactly-once for financial transactions and audit logs where duplicates cause real harm.

Dead Letter Queue

Some messages cannot be processed – malformed data, upstream dependency unavailable, or a bug in consumer code. Without a DLQ, failed messages block processing or get silently dropped.

DLQ pattern:

Consumer tries to process message. On failure, increments retry count (stored in message header or a Redis key).
After N retries (typically 3-5), consumer publishes the message to a dead-letter topic: orders-dlq.
Consumer commits the original offset and continues processing.
An ops team or separate process monitors the DLQ, inspects failures, fixes the bug or data issue, and republishes messages to the original topic.

Important: include the original topic name, partition, offset, timestamp, and error message in the DLQ record. Without this context, debugging is difficult.

Replication

Kafka replicates each partition across multiple brokers for fault tolerance:

Leader: One replica is elected leader. All produce and consume requests go to the leader. The leader tracks which followers are in-sync.
In-Sync Replicas (ISR): The set of replicas that are fully caught up with the leader. A follower is removed from ISR if it falls behind by more than replica.lag.time.max.ms (default 10 seconds).
Acknowledgment modes (acks):
- acks=0: Fire and forget. Producer does not wait for broker acknowledgment. Fastest, but no durability guarantee.
- acks=1: Leader writes to disk, acknowledges. Follower lag means data loss if leader fails before replication.
- acks=all (acks=-1): Leader waits for all ISR to acknowledge. Strongest durability. Use min.insync.replicas=2 to require at least 2 replicas in ISR.
Leader election: When the leader fails, the controller (another broker) elects a new leader from the ISR. Unclean leader election (electing an out-of-sync replica) risks data loss and is disabled by default.

Push vs Pull

Pull (Kafka, Kinesis): Consumer requests messages from broker at its own pace. Advantages: consumer controls batch size and fetch rate; no backpressure needed; consumer can replay from any offset. Disadvantage: consumer must poll even when no messages are available (mitigated by long polling with fetch.max.wait.ms).
Push (SQS, RabbitMQ): Broker pushes messages to consumer. Advantages: lower latency for single-message processing; simpler consumer code. Disadvantages: broker must track per-consumer rate limits; slow consumers can be overwhelmed.

SQS uses a visibility timeout model: consumer receives message, message becomes invisible to other consumers for N seconds. Consumer must delete the message before timeout expires, or it becomes visible again for retry. This enables at-least-once delivery without explicit consumer groups.

Retention and Compaction

How long do you keep messages? Kafka supports two policies:

Time/size-based retention (log.retention.hours, log.retention.bytes): Delete segments older than N hours or when the topic exceeds M bytes. Use for event streams where historical data beyond a window is not needed – application logs, metrics, clickstreams.

Log compaction (log.cleanup.policy=compact): For each unique key, retain only the most recent message. Older messages for the same key are garbage collected. The compacted log is like a snapshot of the latest state for every key. Use cases:

Database change data capture (CDC) – latest row state per primary key
User profile updates – latest profile version per user ID
Configuration changes – latest config value per config key

Log compaction allows a new consumer to bootstrap current state by reading from offset 0 without needing infinite retention of all historical events.

Scale Numbers

Kafka’s published benchmarks (LinkedIn, 2014 era hardware):

Producer throughput: 786,980 messages/second for a single producer (200MB/s)
Consumer throughput: 940,521 messages/second (over 400MB/s, benefits from zero-copy)
LinkedIn production (2023): 7 trillion messages per day, 2.8 million messages per second peak
Typical latency: sub-10ms end-to-end at p99 for acks=1, sub-100ms for acks=all with replication

These numbers come from sequential I/O, batching (producers buffer messages before sending), and zero-copy network transfers. The architecture is designed around disk, not around RAM – Kafka intentionally writes everything to disk and relies on the OS page cache for read performance.

LinkedIn created Kafka and deeply tests message queue design. See system design questions for LinkedIn interview: Kafka and message queue system design.

Netflix uses Kafka for event streaming at massive scale. See system design patterns for Netflix interview: message queue and event streaming design.

Datadog ingests metrics via message queues. See system design patterns for Datadog interview: event pipeline and message queue design.