Message Queues: Kafka, RabbitMQ, and Async Architecture

Message queues decouple the components of a distributed system so that producers and consumers can operate independently. They appear in system design interviews for almost any large-scale system: notification services, order processing pipelines, event-driven microservices, and data ingestion systems all rely on them.

The interview question usually looks like: “How would you design a system that processes 100,000 orders per second without losing any?” or “How do you send email notifications without slowing down the checkout flow?”

Strategy

Frame the problem before reaching for a specific tool. Message queues solve three fundamental problems:

  1. Decoupling: The service that creates an event doesn’t need to know who processes it.
  2. Load leveling: A queue absorbs traffic spikes so downstream services process at their own pace.
  3. Durability: If a consumer is down, messages wait in the queue instead of being lost.

Core Concepts

Point-to-Point (Queue)

One producer, one consumer. A message is delivered to exactly one consumer and then deleted. Classic job queue: a web server enqueues “send welcome email” tasks, a pool of workers each picks up one task.

# Producer
queue.send("send_email", {"to": "user@example.com", "template": "welcome"})

# Consumer (one of many workers)
task = queue.receive()  # only one worker gets this message
process(task)
queue.delete(task)      # acknowledge: remove from queue

Publish-Subscribe (Topic)

One producer, many consumers. A message published to a topic is delivered to all subscribers independently. An “order placed” event might be consumed simultaneously by the inventory service, the notification service, and the analytics service.

# Publisher
topic.publish("order_placed", {"order_id": 123, "user_id": 456})

# Subscribers (all receive the same message independently)
# Inventory service, notification service, analytics service
# each consume their own copy

Kafka vs. RabbitMQ

The two most common names in interviews. Know the difference:

| | Apache Kafka | RabbitMQ |
|—|—|—|
| Model | Distributed log / pub-sub | Message broker (queue + pub-sub) |
| Message retention | Configurable (days/forever) — consumers read by offset | Deleted after acknowledgment |
| Consumer model | Consumer pulls at its own pace | Broker pushes to consumers |
| Throughput | Very high (millions/sec) | Moderate (tens of thousands/sec) |
| Ordering | Guaranteed within a partition | Not guaranteed across queues |
| Replay | Yes — rewind to any offset | No — once consumed, gone |
| Routing | Topic/partition-based | Flexible exchange routing (direct, topic, fanout, headers) |
| Best for | Event streaming, audit logs, real-time pipelines | Task queues, RPC, complex routing |

Choose Kafka when: you need high throughput, event replay (multiple consumers processing the same events independently), audit logs, or event sourcing. Kafka is a durable log, not just a queue.

Choose RabbitMQ when: you need complex routing (route messages by content, headers, or patterns), lower latency for short-lived tasks, or classic job queue semantics where messages are consumed once and discarded.

Delivery Guarantees

Every queue system makes a promise about delivery. Know all three:

At-most-once: Messages are delivered zero or one time. Fire and forget — fast but lossy. Acceptable for non-critical metrics (click tracking, analytics where occasional loss is fine).

At-least-once: Messages are delivered one or more times. The consumer may receive duplicates if it crashes after processing but before acknowledging. The producer retries on timeout. This is the default for Kafka and most production systems.

Exactly-once: Every message is processed exactly one time, even with retries and failures. Very hard to achieve. Kafka supports it with idempotent producers + transactions since version 0.11. Usually achieved in practice by making consumers idempotent (safe to process the same message twice) rather than engineering true exactly-once delivery.

Interview tip: When an interviewer asks how you ensure exactly-once processing, the right answer is usually “idempotent consumers” — design the consumer so processing the same message twice has the same effect as processing it once. Deduplicate using a message ID stored in Redis or a DB unique constraint.

def process_order(message):
    order_id = message["order_id"]

    # Idempotency check
    if db.exists("processed_orders", order_id):
        return  # already handled, skip

    # Process the order
    db.insert("orders", message)
    db.insert("processed_orders", {"id": order_id, "processed_at": now()})

Dead Letter Queues (DLQ)

A DLQ receives messages that couldn’t be processed successfully after N retries. Instead of losing the message or blocking the queue, failed messages move to the DLQ for inspection and replay.

Queue:    [msg1] [msg2] [msg3_FAILING] [msg4]
                          ↓ after 3 retries
DLQ:              [msg3_FAILING]  ← alert, inspect, fix, replay

Always mention DLQs when designing pipelines that can’t afford data loss. Production Kafka, SQS, RabbitMQ, and Azure Service Bus all support them.

Partitioning and Ordering (Kafka)

Kafka topics are divided into partitions. Messages within a partition are ordered. Messages across partitions are not.

Topic: "order_placed"
  Partition 0: [order#1, order#3, order#7]  ← user_id % 3 == 0
  Partition 1: [order#2, order#5]            ← user_id % 3 == 1
  Partition 2: [order#4, order#6]            ← user_id % 3 == 2

If ordering matters (all events for a given user must be processed in order), use a consistent partition key (e.g., user_id). All events for that user land on the same partition and are processed in order by one consumer in the consumer group.

This is how Kafka achieves both parallelism and per-entity ordering simultaneously.

Common Interview Scenarios

Notification system: User action → publish “user_signup” event to Kafka → email service, push notification service, analytics service each subscribe and process independently. Decoupled, retryable, no synchronous coupling between checkout and email send.

Order processing: Order service → SQS queue → fulfillment workers. Workers pull tasks at their own rate. Queue buffers traffic spikes (flash sales). DLQ catches failed orders for manual review.

Event sourcing: Every state change is published as an immutable event to Kafka. Any service can replay the event log to reconstruct state or build a new read model. The queue is the source of truth.

Rate limiting requests to a downstream API: Instead of calling a rate-limited third-party API directly, enqueue calls and have a single-threaded worker drain the queue at the allowed rate.

Backpressure

When consumers can’t keep up with producers, the queue grows. Strategies:

  • Horizontal scaling: Add more consumer instances (Kafka: add to consumer group, each takes more partitions).
  • Drop low-priority messages: Set a max queue depth; drop or sample messages beyond it (acceptable for metrics, not for orders).
  • Slow the producer: If the queue depth exceeds a threshold, apply backpressure upstream (return 429 Too Many Requests, block the producer).

Summary

Message queues decouple producers from consumers, level load spikes, and provide durability for async workloads. Use Kafka for high-throughput event streaming and replay; use RabbitMQ for complex routing and classic task queues. At-least-once delivery with idempotent consumers is the practical standard — don’t promise exactly-once unless you explain what it costs. Always design a DLQ for messages that fail processing. In a system design interview, introducing a queue is often the move that turns a tightly-coupled, fragile architecture into a scalable one.

Related System Design Topics

Message queues are part of a broader async architecture:

  • CAP Theorem — message queues are AP systems by nature; a broker that’s down should not block producers. Understanding CAP explains why at-least-once delivery is the practical standard.
  • Caching Strategies — write-behind caching uses an internal queue to async-flush cache writes to the database — the same pattern as a message queue.
  • Load Balancing — consumer groups in Kafka are a form of load balancing: partitions are distributed across consumers automatically.
  • Database Sharding — high-throughput event pipelines often write to sharded databases; the queue provides the buffer that makes this practical.

Also see: API Design (REST vs GraphQL vs gRPC) and SQL vs NoSQL — the remaining two system design foundations.

See also: Design a Notification System — full Kafka fan-out architecture for push/email/SMS at scale, and Design Search Autocomplete — async index update pipeline via message queue.

See also: Design a Web Crawler — message queues decouple the fetcher, parser, and storage stages; Design Dropbox / Google Drive — Kafka change feed fans out sync events to connected devices.

See also: Design a Payment System — Saga pattern for distributed payment transactions using message queues, and Design a News Feed — Kafka fan-out for feed cache population.

See also: Design a Monitoring & Alerting System — Kafka buffering log pipelines from Fluent Bit to Elasticsearch, and Design an Ad Click Aggregation System — Kafka as the ingest layer for high-throughput click events.

See also: Design a Recommendation Engine — Kafka streams interaction events to the feature store for real-time ranking signals, and Design an LLM Inference API — request queue enabling continuous batching across concurrent inference requests.

Scroll to Top