Low Level Design: Real-Time Messaging Service

What Is a Real-Time Messaging Service?

A real-time messaging service is the transport layer that moves data from one endpoint to another with millisecond-level latency. Unlike a generic chat service that also handles UI concerns like threads and reactions, the real-time messaging layer focuses purely on connection management, message routing, and delivery guarantees. It underpins chat apps, collaborative editing tools, live feeds, and multiplayer games.

Data Model / Schema

The messaging service maintains minimal state. Most persistent data lives in upstream services; the messaging layer tracks sessions and queued frames:

-- Active sessions (in-memory, e.g., Redis Hash)
sessions:{user_id} = {
  server_node  : STRING,   -- which Chat Server holds the socket
  connected_at : TIMESTAMP,
  last_ping    : TIMESTAMP
}

-- Outbound queue (per user, when socket is temporarily unavailable)
CREATE TABLE outbox (
  id          BIGINT PRIMARY KEY AUTO_INCREMENT,
  user_id     BIGINT NOT NULL,
  payload     JSON NOT NULL,
  created_at  TIMESTAMP DEFAULT NOW(),
  expires_at  TIMESTAMP,
  INDEX (user_id, id)
);

JSON payloads in the outbox store the full message envelope so delivery can be retried without querying upstream services again.

Core Algorithm / Workflow

The real-time path has two legs: the ingress path (client to server) and the egress path (server to client).

Ingress Path

Client sends a frame over WebSocket. Frame format: { type, msg_id, payload }.
Server validates the frame, assigns a server-side timestamp, and publishes to the appropriate Kafka topic.
Server immediately ACKs the frame back to the sender with the server-assigned msg_id.

Egress Path

A Router Service consumes Kafka and resolves the target user_id list from the conversation service.
For each target, the Router looks up the session map to find the correct Chat Server node.
The Router publishes the frame to a per-node Redis channel. The Chat Server node receives it and writes to the open socket.

Failure Handling

Connection drops: Clients use an exponential back-off reconnect loop (starting at 100 ms, capped at 30 s). On reconnect, the client sends the last confirmed msg_id in the session handshake. The server replays any frames in the outbox with a higher ID.

Server node crash: The load balancer detects the dead node via health checks and reroutes new connections. Existing sessions are lost; clients reconnect and replay from the outbox. Session TTLs in Redis expire automatically, preventing stale routing.

Kafka consumer lag: If the Router Service falls behind, messages are buffered in Kafka (configured retention of at least 24 hours). This acts as a natural buffer during traffic spikes without dropping messages.

Scalability Considerations

Connection scaling: Each server node handles ~50 k WebSocket connections using an async I/O event loop (epoll/kqueue). Thousands of nodes behind a Layer 4 load balancer give virtually unlimited horizontal capacity.

Hot conversations: A very active group chat generates fan-out to hundreds of nodes simultaneously. Batch the per-node publish calls and pipeline Redis writes to minimize round trips.

Geo-distribution: Deploy server nodes in multiple regions. Route users to the nearest region via anycast or GeoDNS. Cross-region messages travel the Kafka backbone between regional clusters.

Summary

A real-time messaging service achieves low latency by keeping hot state (sessions, queues) in memory, using async I/O for connection management, and relying on Kafka as a durable, ordered backbone. The separation of the transport layer from business logic (conversations, threads) makes it easy to scale and operate independently.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What transport protocol is best for real-time messaging systems?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “WebSockets are the standard choice for real-time messaging because they provide a full-duplex, persistent connection over a single TCP socket, eliminating the overhead of repeated HTTP handshakes. For environments where WebSockets are unavailable, Server-Sent Events (SSE) or long-polling serve as fallbacks. MQTT is preferred for IoT or low-bandwidth scenarios due to its lightweight publish-subscribe model.”
}
},
{
“@type”: “Question”,
“name”: “How do you design a real-time messaging system to support millions of concurrent users?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Scaling to millions of concurrent users requires a horizontally scalable WebSocket gateway layer where each server maintains persistent connections. A pub/sub broker like Apache Kafka or Redis Pub/Sub routes messages between gateway nodes. Stateless message processing workers behind the gateway handle business logic. Consistent hashing or a session registry (stored in Redis) maps users to their gateway server, allowing any producer to route messages to the correct node.”
}
},
{
“@type”: “Question”,
“name”: “How do you guarantee message delivery in a real-time messaging system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Reliable delivery is achieved through an acknowledgment (ACK) protocol. The sender assigns each message a unique client-side sequence number. The server persists the message and returns an ACK. If no ACK is received within a timeout, the client retransmits. For offline recipients, messages are queued in a persistent store and delivered via push notifications. At-least-once delivery combined with idempotent message processing (deduplication by message ID) ensures correctness.”
}
},
{
“@type”: “Question”,
“name”: “What is the role of a message fan-out service in real-time messaging?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A fan-out service is responsible for distributing a single incoming message to all intended recipients. In a group chat, one message may need to be delivered to hundreds or thousands of users. The fan-out service reads the recipient list, looks up each user’s active connection server from the session registry, and publishes the message to each relevant server’s queue. For large groups, fan-out can be done asynchronously via a message queue to avoid blocking the sender’s request.”
}
}
]
}