Low Level Design: Chat System Internals

⏱ 11 min read

What Makes Chat Systems Hard?

Chat systems look simple on the surface — send a message, receive a message. The complexity is in the guarantees: messages must be delivered reliably, ordered correctly, and shown as read. All of this must work at scale, across unreliable mobile connections, while users come online and go offline constantly. This post covers the low-level design decisions that make those guarantees possible.

Message Delivery Guarantees

There are three delivery semantics to choose from:

At-most-once: Fire and forget. Messages may be lost. Unacceptable for chat.
At-least-once: Retry until ACK received. Messages may be delivered more than once — handle with deduplication.
Exactly-once: Ideal, but expensive to implement end-to-end. Approximate it with at-least-once delivery + idempotent processing.

The standard chat pattern is at-least-once with deduplication. The client generates a client_message_id (UUID) before sending. The server stores this ID alongside the message. On retry, the server checks for an existing message with the same client_message_id from the same sender — if found, returns the existing message_id without creating a duplicate. The client considers delivery confirmed only after receiving an explicit ACK containing the server-assigned message_id.

Message Ordering

Messages within a conversation must appear in a consistent order to all participants. Global ordering across all conversations is unnecessary and prohibitively expensive at scale.

Two common approaches:

Per-conversation sequence number: Each conversation has a monotonically increasing sequence counter. When a message is inserted, it gets the next sequence number. Requires a distributed counter (Redis INCR, or a database sequence) per conversation_id. Simple and effective.
Lamport timestamp: Each message gets a logical clock value — max(sender_clock, received_clock) + 1. Gives causal ordering without a centralized counter. More complex but avoids the bottleneck of a per-conversation lock.

Avoid relying on created_at timestamps for ordering — clocks on different servers and clients are not perfectly synchronized, and two messages created at the same millisecond would have non-deterministic order.

WebSocket for Real-Time Delivery

HTTP is request-response — the server cannot push messages to the client without the client polling. WebSocket provides a persistent, bidirectional connection over a single TCP connection. Once established, the server can push messages to the client instantly without the overhead of repeated HTTP handshakes.

The connection lifecycle: client sends HTTP Upgrade request → server responds with 101 Switching Protocols → both sides communicate over the WebSocket protocol. The connection stays open until explicitly closed by either side or a network interruption occurs.

Fallback strategies for environments that block WebSocket (corporate proxies, older browsers): Server-Sent Events (SSE) for server-to-client push, with HTTP POST for client-to-server. Long polling as a last resort.

Connection Management at Scale

A chat service runs many WebSocket servers. When user A sends a message to user B, the message arrives at A’s server — which is likely not the same server holding B’s WebSocket connection. The system needs a way to route the message to the right server.

The solution: maintain a mapping in Redis: user_id → {server_id, connection_id}. When a user connects, the WebSocket server writes this mapping with a TTL. When a message needs to be delivered, look up B’s server_id in Redis, then route the message to that server via an internal pub/sub channel (Redis Pub/Sub, Kafka, or direct gRPC call). That server pushes the message over B’s WebSocket connection.

When a user disconnects, remove the mapping immediately (or let the TTL expire). If delivery fails because the connection mapping is stale, fall back to storing the message for offline delivery.

Message Fanout for Group Chat

Sending a message to a group of N members requires delivering to N connections potentially spread across many servers. Two approaches:

Push fanout: When a message arrives at the server, look up all group members, look up each member’s server_id in Redis, and push the message to each server. Simple but expensive for large groups — O(N) Redis lookups and N inter-server messages per chat message.
Pull model (inbox): Each user has an inbox — a list of message references. Delivering a message writes one record to the messages table and N small records to each member’s inbox. On reconnect, the client fetches new messages from its inbox since last_seen. Scales better for large groups; trades write amplification for simpler delivery logic.

Hybrid: push for online users (low latency matters), inbox for offline users (they’ll pull on reconnect anyway).

Message Storage Schema

Core tables:

-- Message store (append-only, never update)
CREATE TABLE messages (
  message_id      BIGSERIAL PRIMARY KEY,
  conversation_id BIGINT NOT NULL,
  sender_id       BIGINT NOT NULL,
  content         TEXT,
  type            VARCHAR(20) NOT NULL,  -- text, image, video, file
  client_msg_id   UUID NOT NULL,         -- for dedup
  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE (sender_id, client_msg_id)
);

-- Per-user inbox (reference to messages)
CREATE TABLE user_inbox (
  user_id    BIGINT NOT NULL,
  message_id BIGINT NOT NULL,
  seq        BIGINT NOT NULL,            -- per-user sequence
  read_at    TIMESTAMPTZ,
  PRIMARY KEY (user_id, message_id)
);

For group chats, the message is stored once in messages. Each member gets one row in user_inbox pointing to the same message_id. This avoids duplicating message content N times.

Read Receipts

Read receipts require tracking two events: message delivered to device, and message seen by user.

Delivered: ACK sent by the recipient’s WebSocket client when it receives the message. The server updates a delivered_at timestamp.
Read: When the user opens the conversation and the message is visible on screen, the client sends a read event. The server updates read_at in user_inbox.

For efficiency, clients send a single "read up to message_id X" event rather than one event per message. The server marks all messages with seq <= X in that user’s inbox as read in a single UPDATE.

The sender’s client subscribes to read receipt events for their own messages via WebSocket — when receipts arrive, the UI updates the checkmark indicators in real time.

Presence System

Presence (online/offline/away) is a high-write, high-read, eventually consistent problem. The naive approach — querying the database on every message send — doesn’t scale.

The standard pattern: clients send a heartbeat ping every 30 seconds over the WebSocket connection. The server updates a last_seen timestamp in Redis (a simple key-value: presence:{user_id} → timestamp) with a TTL of 60-90 seconds. If the TTL expires without a new heartbeat, the key disappears and the user is considered offline.

Reading presence: look up presence:{user_id} in Redis. If the key exists and timestamp is within the last 60 seconds → online. Recent but older → away. Key missing → offline. This is O(1) per user lookup, and Redis can handle hundreds of thousands of heartbeat writes per second.

Offline Message Delivery

When a recipient is offline, the message is stored in the database (in their inbox). On reconnect, the client sends its last known last_seq to the server. The server queries user_inbox WHERE user_id = ? AND seq > last_seq ORDER BY seq and streams all missed messages to the client.

For very active users who have been offline for a long time, this could mean thousands of messages. Paginate the response — the client fetches in batches and updates last_seq after each batch. Show the most recent messages first and let the user scroll back for history.

For mobile users who may not reconnect via WebSocket, send a push notification (APNs for iOS, FCM for Android) as a fallback. The push notification wakes the app, which then establishes a WebSocket connection and pulls missed messages. Push notifications contain only a "you have new messages" signal — not the message content — to avoid storing plaintext in Apple/Google infrastructure (especially important for E2EE).

End-to-End Encryption

End-to-end encryption (E2EE) means only the sender and recipient can read messages — the server handles only ciphertext. The gold standard implementation is the Signal Protocol, used by Signal, WhatsApp, and others.

The core concept is the Double Ratchet Algorithm: a combination of a Diffie-Hellman ratchet (for forward secrecy — compromise of a key doesn’t expose past messages) and a symmetric-key ratchet (each message uses a fresh encryption key derived from the previous one). Key components:

X3DH (Extended Triple Diffie-Hellman): Initial key exchange. Each user publishes a set of public keys to the server (identity key, signed prekey, one-time prekeys). Sender uses these to derive a shared secret without both parties being online simultaneously.
Message keys: Each message is encrypted with a unique ephemeral key derived from the ratchet state. Deleting the message key after decryption provides forward secrecy.
Server role: Stores and delivers ciphertext and public keys. Never has access to plaintext or private keys.

For group E2EE, the sender distributes the group message encrypted individually for each member (using their public keys), or uses a shared group key distributed via pairwise encrypted channels — the Sender Keys protocol used by Signal for groups.

Frequently Asked Questions

How does a chat system guarantee message ordering?

Messages within a conversation are ordered by a monotonically increasing sequence number assigned by the server. The server atomically increments a per-conversation counter (in Redis or the DB) and assigns the sequence number when accepting the message. Clients display messages sorted by sequence number, not by arrival time. This ensures a consistent ordering even if network reordering or retries occur. Cross-conversation ordering is not required — each conversation has its own independent sequence.

How are messages delivered to users connected to different servers?

Each server maintains WebSocket connections to its local users. When a message arrives for a user connected to a different server, the sending server must route to the correct server. Approach: store user_id → server_id mapping in Redis. The sending server publishes the message to a Redis pub/sub channel keyed by the recipient’s server_id. The receiving server has a subscriber for its own channel and pushes the message to the appropriate WebSocket connection.

How does the presence system work in a chat application?

Each connected client sends a heartbeat every 30 seconds. The server updates a Redis key (presence:{user_id}) with TTL of 60 seconds on each heartbeat. If the key expires (no heartbeat for 60+ seconds), the user is considered offline. To display presence to others: on connection, notify contacts via pub/sub. On disconnection or TTL expiry, trigger an offline event. For scale, batch presence updates — don’t fan out to all contacts in real time, but fetch on demand when a chat is opened.

What is the message inbox model and how does it save storage?

The inbox model separates message storage from message delivery. Messages are stored once in a messages table (message_id, conversation_id, content, sender_id). Each user has an inbox table with pointers (user_id, conversation_id, last_message_id, unread_count). For a group chat with 1,000 members, one message object is stored — not 1,000 copies. The fan-out is only of message_id pointers into each member’s inbox, which are much cheaper than duplicating full message content.

How does end-to-end encryption work in the Signal Protocol?

The Signal Protocol uses the Double Ratchet Algorithm. Each conversation has an initial shared secret established via X3DH (Extended Triple Diffie-Hellman) key exchange. The double ratchet combines a Diffie-Hellman ratchet (generates new key pairs on each message exchange) with a KDF chain ratchet (derives message keys). Each message is encrypted with a unique ephemeral key, so compromising one message key does not reveal past or future messages (forward secrecy and break-in recovery). The server stores only encrypted ciphertext.