Question 1

When should you use RabbitMQ versus Apache Kafka?

Accepted Answer

Use RabbitMQ when: (1) You need traditional message queuing -- task distribution, background job processing (resize images, send emails). Messages are consumed once and deleted. (2) You need complex routing -- RabbitMQ exchanges route messages to queues based on routing keys, headers, or topic patterns. A single message can be routed to multiple queues based on rules. (3) You need request-reply (RPC) patterns -- RabbitMQ has built-in support for correlating requests and responses. (4) Message volume is moderate (thousands to tens of thousands per second). Use Kafka when: (1) Multiple consumers need to independently read the same messages -- Kafka consumer groups each maintain their own offset. RabbitMQ delivers each message to one consumer per queue. (2) You need message replay -- Kafka retains messages for days or indefinitely. Consumers can rewind and reprocess historical messages. RabbitMQ deletes messages after acknowledgment. (3) You need high throughput -- Kafka handles millions of messages per second. (4) You are building event-driven architecture -- event sourcing, CQRS, change data capture. Kafka is designed as a durable event log. Summary: RabbitMQ is a smart broker (routes messages, manages delivery), Kafka is a dumb broker with smart consumers (append-only log, consumers track their position).

Question 2

What is a dead letter queue and why is it important?

Accepted Answer

A dead letter queue (DLQ) captures messages that cannot be processed successfully after multiple attempts. Without a DLQ, a poison message (one that always causes processing to fail -- malformed data, triggering an unhandled exception) creates an infinite retry loop: the consumer receives the message, fails, the broker redelivers, the consumer fails again. This blocks the queue -- no other messages are processed. DLQ workflow: (1) Configure a maximum retry count (e.g., 3 attempts). (2) After 3 failures, the message is automatically moved to the DLQ instead of being retried. (3) An alert fires when messages appear in the DLQ. (4) An engineer inspects the failed messages, identifies the root cause (bug in consumer code, unexpected data format), fixes the issue. (5) The messages are replayed from the DLQ back to the original queue for reprocessing. Configuration: SQS -- set a RedrivePolicy with maxReceiveCount and deadLetterTargetArn. RabbitMQ -- set x-dead-letter-exchange and x-dead-letter-routing-key on the queue. Kafka -- no built-in DLQ; implement at the application level by publishing failed messages to a separate dead-letter topic. DLQs are essential for production reliability -- they prevent poison messages from blocking the system while preserving failed messages for investigation.

Question 3

How do you achieve exactly-once message processing?

Accepted Answer

True exactly-once delivery is impossible in distributed systems (the Two Generals Problem). What is achievable is exactly-once processing: at-least-once delivery combined with idempotent consumers. At-least-once delivery: the broker guarantees every message is delivered at least once. If the consumer processes the message but crashes before acknowledging, the broker redelivers. The consumer may process the message twice. Idempotent consumer: make the consumer safe to execute multiple times for the same message. Implementation: assign each message a unique ID (message_id or idempotency_key). Before processing, check if this ID exists in a processed_messages table. If found, skip (already processed). If not, process the message and insert the ID in the same database transaction as the business logic. This ensures that even if the message is delivered twice, the business operation executes only once. Critical: the deduplication check and the business operation MUST be in the same transaction. Otherwise, there is a window between processing and recording where a crash causes re-processing. Kafka 0.11+ adds producer idempotency (prevents duplicate sends) and transactional consumers (atomic commit of offsets and state), but end-to-end exactly-once to an external database still requires application-level idempotency.

Question 4

How does message ordering work in Kafka and when does it break?

Accepted Answer

Kafka guarantees ordering within a single partition: messages are appended sequentially, and a consumer reads them in the order they were written. Across partitions: no ordering guarantee. Messages in different partitions may be consumed in any order. Partition assignment: the producer assigns messages to partitions based on a key. All messages with the same key go to the same partition: partition = hash(key) % num_partitions. Example: using order_id as the key ensures all events for order-12345 (OrderCreated, PaymentProcessed, OrderShipped) go to the same partition and are consumed in order. Events for different orders may go to different partitions and are processed in parallel. When ordering breaks: (1) Producer retries with acks=1 (not all replicas) can reorder messages. Fix: use enable.idempotence=true (Kafka 0.11+) which ensures in-order delivery with retries. (2) Consumer processing failure with retry -- if message 5 fails and is retried after message 6, they are processed out of order. Fix: stop processing the partition until the retry succeeds, or use a retry topic with delay. (3) Repartitioning -- changing the number of partitions changes the hash-to-partition mapping, potentially splitting a key across partitions. Avoid changing partition count for topics that require ordering.

System Design: Message Queues — RabbitMQ vs SQS vs Kafka, Dead Letter Queues, Exactly-Once, Ordering

RabbitMQ: Traditional Message Broker

Amazon SQS: Managed Queue Service

Apache Kafka: Event Streaming Platform

Dead Letter Queues and Error Handling

Exactly-Once Processing

Choosing the Right Message Queue