What Is a Real-Time Messaging Service?
A real-time messaging service is the transport layer that moves data from one endpoint to another with millisecond-level latency. Unlike a generic chat service that also handles UI concerns like threads and reactions, the real-time messaging layer focuses purely on connection management, message routing, and delivery guarantees. It underpins chat apps, collaborative editing tools, live feeds, and multiplayer games.
Data Model / Schema
The messaging service maintains minimal state. Most persistent data lives in upstream services; the messaging layer tracks sessions and queued frames:
-- Active sessions (in-memory, e.g., Redis Hash)
sessions:{user_id} = {
server_node : STRING, -- which Chat Server holds the socket
connected_at : TIMESTAMP,
last_ping : TIMESTAMP
}
-- Outbound queue (per user, when socket is temporarily unavailable)
CREATE TABLE outbox (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL,
payload JSON NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP,
INDEX (user_id, id)
);
JSON payloads in the outbox store the full message envelope so delivery can be retried without querying upstream services again.
Core Algorithm / Workflow
The real-time path has two legs: the ingress path (client to server) and the egress path (server to client).
Ingress Path
- Client sends a frame over WebSocket. Frame format:
{ type, msg_id, payload }. - Server validates the frame, assigns a server-side timestamp, and publishes to the appropriate Kafka topic.
- Server immediately ACKs the frame back to the sender with the server-assigned
msg_id.
Egress Path
- A Router Service consumes Kafka and resolves the target
user_idlist from the conversation service. - For each target, the Router looks up the session map to find the correct Chat Server node.
- The Router publishes the frame to a per-node Redis channel. The Chat Server node receives it and writes to the open socket.
Failure Handling
Connection drops: Clients use an exponential back-off reconnect loop (starting at 100 ms, capped at 30 s). On reconnect, the client sends the last confirmed msg_id in the session handshake. The server replays any frames in the outbox with a higher ID.
Server node crash: The load balancer detects the dead node via health checks and reroutes new connections. Existing sessions are lost; clients reconnect and replay from the outbox. Session TTLs in Redis expire automatically, preventing stale routing.
Kafka consumer lag: If the Router Service falls behind, messages are buffered in Kafka (configured retention of at least 24 hours). This acts as a natural buffer during traffic spikes without dropping messages.
Scalability Considerations
Connection scaling: Each server node handles ~50 k WebSocket connections using an async I/O event loop (epoll/kqueue). Thousands of nodes behind a Layer 4 load balancer give virtually unlimited horizontal capacity.
Hot conversations: A very active group chat generates fan-out to hundreds of nodes simultaneously. Batch the per-node publish calls and pipeline Redis writes to minimize round trips.
Geo-distribution: Deploy server nodes in multiple regions. Route users to the nearest region via anycast or GeoDNS. Cross-region messages travel the Kafka backbone between regional clusters.
Summary
A real-time messaging service achieves low latency by keeping hot state (sessions, queues) in memory, using async I/O for connection management, and relying on Kafka as a durable, ordered backbone. The separation of the transport layer from business logic (conversations, threads) makes it easy to scale and operate independently.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What transport protocol is best for real-time messaging systems?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “WebSockets are the standard choice for real-time messaging because they provide a full-duplex, persistent connection over a single TCP socket, eliminating the overhead of repeated HTTP handshakes. For environments where WebSockets are unavailable, Server-Sent Events (SSE) or long-polling serve as fallbacks. MQTT is preferred for IoT or low-bandwidth scenarios due to its lightweight publish-subscribe model.”
}
},
{
“@type”: “Question”,
“name”: “How do you design a real-time messaging system to support millions of concurrent users?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Scaling to millions of concurrent users requires a horizontally scalable WebSocket gateway layer where each server maintains persistent connections. A pub/sub broker like Apache Kafka or Redis Pub/Sub routes messages between gateway nodes. Stateless message processing workers behind the gateway handle business logic. Consistent hashing or a session registry (stored in Redis) maps users to their gateway server, allowing any producer to route messages to the correct node.”
}
},
{
“@type”: “Question”,
“name”: “How do you guarantee message delivery in a real-time messaging system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Reliable delivery is achieved through an acknowledgment (ACK) protocol. The sender assigns each message a unique client-side sequence number. The server persists the message and returns an ACK. If no ACK is received within a timeout, the client retransmits. For offline recipients, messages are queued in a persistent store and delivered via push notifications. At-least-once delivery combined with idempotent message processing (deduplication by message ID) ensures correctness.”
}
},
{
“@type”: “Question”,
“name”: “What is the role of a message fan-out service in real-time messaging?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A fan-out service is responsible for distributing a single incoming message to all intended recipients. In a group chat, one message may need to be delivered to hundreds or thousands of users. The fan-out service reads the recipient list, looks up each user’s active connection server from the session registry, and publishes the message to each relevant server’s queue. For large groups, fan-out can be done asynchronously via a message queue to avoid blocking the sender’s request.”
}
}
]
}
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Atlassian Interview Guide
See also: Snap Interview Guide