Low Level Design: Real-Time Messaging Service

What Is a Real-Time Messaging Service?

A real-time messaging service is the transport layer that moves data from one endpoint to another with millisecond-level latency. Unlike a generic chat service that also handles UI concerns like threads and reactions, the real-time messaging layer focuses purely on connection management, message routing, and delivery guarantees. It underpins chat apps, collaborative editing tools, live feeds, and multiplayer games.

Data Model / Schema

The messaging service maintains minimal state. Most persistent data lives in upstream services; the messaging layer tracks sessions and queued frames:

-- Active sessions (in-memory, e.g., Redis Hash)
sessions:{user_id} = {
  server_node  : STRING,   -- which Chat Server holds the socket
  connected_at : TIMESTAMP,
  last_ping    : TIMESTAMP
}

-- Outbound queue (per user, when socket is temporarily unavailable)
CREATE TABLE outbox (
  id          BIGINT PRIMARY KEY AUTO_INCREMENT,
  user_id     BIGINT NOT NULL,
  payload     JSON NOT NULL,
  created_at  TIMESTAMP DEFAULT NOW(),
  expires_at  TIMESTAMP,
  INDEX (user_id, id)
);

JSON payloads in the outbox store the full message envelope so delivery can be retried without querying upstream services again.

Core Algorithm / Workflow

The real-time path has two legs: the ingress path (client to server) and the egress path (server to client).

Ingress Path

  1. Client sends a frame over WebSocket. Frame format: { type, msg_id, payload }.
  2. Server validates the frame, assigns a server-side timestamp, and publishes to the appropriate Kafka topic.
  3. Server immediately ACKs the frame back to the sender with the server-assigned msg_id.

Egress Path

  1. A Router Service consumes Kafka and resolves the target user_id list from the conversation service.
  2. For each target, the Router looks up the session map to find the correct Chat Server node.
  3. The Router publishes the frame to a per-node Redis channel. The Chat Server node receives it and writes to the open socket.

Failure Handling

Connection drops: Clients use an exponential back-off reconnect loop (starting at 100 ms, capped at 30 s). On reconnect, the client sends the last confirmed msg_id in the session handshake. The server replays any frames in the outbox with a higher ID.

Server node crash: The load balancer detects the dead node via health checks and reroutes new connections. Existing sessions are lost; clients reconnect and replay from the outbox. Session TTLs in Redis expire automatically, preventing stale routing.

Kafka consumer lag: If the Router Service falls behind, messages are buffered in Kafka (configured retention of at least 24 hours). This acts as a natural buffer during traffic spikes without dropping messages.

Scalability Considerations

Connection scaling: Each server node handles ~50 k WebSocket connections using an async I/O event loop (epoll/kqueue). Thousands of nodes behind a Layer 4 load balancer give virtually unlimited horizontal capacity.

Hot conversations: A very active group chat generates fan-out to hundreds of nodes simultaneously. Batch the per-node publish calls and pipeline Redis writes to minimize round trips.

Geo-distribution: Deploy server nodes in multiple regions. Route users to the nearest region via anycast or GeoDNS. Cross-region messages travel the Kafka backbone between regional clusters.

Summary

A real-time messaging service achieves low latency by keeping hot state (sessions, queues) in memory, using async I/O for connection management, and relying on Kafka as a durable, ordered backbone. The separation of the transport layer from business logic (conversations, threads) makes it easy to scale and operate independently.

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Atlassian Interview Guide

See also: Snap Interview Guide

Scroll to Top