Low Level Design: Real-Time Messaging Service

What Is a Real-Time Messaging Service?

A real-time messaging service is the transport layer that moves data from one endpoint to another with millisecond-level latency. Unlike a generic chat service that also handles UI concerns like threads and reactions, the real-time messaging layer focuses purely on connection management, message routing, and delivery guarantees. It underpins chat apps, collaborative editing tools, live feeds, and multiplayer games.

Data Model / Schema

The messaging service maintains minimal state. Most persistent data lives in upstream services; the messaging layer tracks sessions and queued frames:

-- Active sessions (in-memory, e.g., Redis Hash)
sessions:{user_id} = {
  server_node  : STRING,   -- which Chat Server holds the socket
  connected_at : TIMESTAMP,
  last_ping    : TIMESTAMP
}

-- Outbound queue (per user, when socket is temporarily unavailable)
CREATE TABLE outbox (
  id          BIGINT PRIMARY KEY AUTO_INCREMENT,
  user_id     BIGINT NOT NULL,
  payload     JSON NOT NULL,
  created_at  TIMESTAMP DEFAULT NOW(),
  expires_at  TIMESTAMP,
  INDEX (user_id, id)
);

JSON payloads in the outbox store the full message envelope so delivery can be retried without querying upstream services again.

Core Algorithm / Workflow

The real-time path has two legs: the ingress path (client to server) and the egress path (server to client).

Ingress Path

  1. Client sends a frame over WebSocket. Frame format: { type, msg_id, payload }.
  2. Server validates the frame, assigns a server-side timestamp, and publishes to the appropriate Kafka topic.
  3. Server immediately ACKs the frame back to the sender with the server-assigned msg_id.

Egress Path

  1. A Router Service consumes Kafka and resolves the target user_id list from the conversation service.
  2. For each target, the Router looks up the session map to find the correct Chat Server node.
  3. The Router publishes the frame to a per-node Redis channel. The Chat Server node receives it and writes to the open socket.

Failure Handling

Connection drops: Clients use an exponential back-off reconnect loop (starting at 100 ms, capped at 30 s). On reconnect, the client sends the last confirmed msg_id in the session handshake. The server replays any frames in the outbox with a higher ID.

Server node crash: The load balancer detects the dead node via health checks and reroutes new connections. Existing sessions are lost; clients reconnect and replay from the outbox. Session TTLs in Redis expire automatically, preventing stale routing.

Kafka consumer lag: If the Router Service falls behind, messages are buffered in Kafka (configured retention of at least 24 hours). This acts as a natural buffer during traffic spikes without dropping messages.

Scalability Considerations

Connection scaling: Each server node handles ~50 k WebSocket connections using an async I/O event loop (epoll/kqueue). Thousands of nodes behind a Layer 4 load balancer give virtually unlimited horizontal capacity.

Hot conversations: A very active group chat generates fan-out to hundreds of nodes simultaneously. Batch the per-node publish calls and pipeline Redis writes to minimize round trips.

Geo-distribution: Deploy server nodes in multiple regions. Route users to the nearest region via anycast or GeoDNS. Cross-region messages travel the Kafka backbone between regional clusters.

Summary

A real-time messaging service achieves low latency by keeping hot state (sessions, queues) in memory, using async I/O for connection management, and relying on Kafka as a durable, ordered backbone. The separation of the transport layer from business logic (conversations, threads) makes it easy to scale and operate independently.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What transport protocol is best for real-time messaging systems?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “WebSockets are the standard choice for real-time messaging because they provide a full-duplex, persistent connection over a single TCP socket, eliminating the overhead of repeated HTTP handshakes. For environments where WebSockets are unavailable, Server-Sent Events (SSE) or long-polling serve as fallbacks. MQTT is preferred for IoT or low-bandwidth scenarios due to its lightweight publish-subscribe model.”
}
},
{
“@type”: “Question”,
“name”: “How do you design a real-time messaging system to support millions of concurrent users?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Scaling to millions of concurrent users requires a horizontally scalable WebSocket gateway layer where each server maintains persistent connections. A pub/sub broker like Apache Kafka or Redis Pub/Sub routes messages between gateway nodes. Stateless message processing workers behind the gateway handle business logic. Consistent hashing or a session registry (stored in Redis) maps users to their gateway server, allowing any producer to route messages to the correct node.”
}
},
{
“@type”: “Question”,
“name”: “How do you guarantee message delivery in a real-time messaging system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Reliable delivery is achieved through an acknowledgment (ACK) protocol. The sender assigns each message a unique client-side sequence number. The server persists the message and returns an ACK. If no ACK is received within a timeout, the client retransmits. For offline recipients, messages are queued in a persistent store and delivered via push notifications. At-least-once delivery combined with idempotent message processing (deduplication by message ID) ensures correctness.”
}
},
{
“@type”: “Question”,
“name”: “What is the role of a message fan-out service in real-time messaging?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A fan-out service is responsible for distributing a single incoming message to all intended recipients. In a group chat, one message may need to be delivered to hundreds or thousands of users. The fan-out service reads the recipient list, looks up each user’s active connection server from the session registry, and publishes the message to each relevant server’s queue. For large groups, fan-out can be done asynchronously via a message queue to avoid blocking the sender’s request.”
}
}
]
}

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Atlassian Interview Guide

See also: Snap Interview Guide

Scroll to Top