What Is a Real-Time Chat System?
A real-time chat system enables persistent messaging between users with delivery guarantees and online presence. Examples: WhatsApp (2B users), Slack (20M DAU), Discord. Core challenges: message ordering and deduplication, online presence at scale, end-to-end encryption, and efficient message storage for billions of conversations.
System Requirements
Functional
- One-on-one and group messages (up to 256 members)
- Message delivery receipts: sent, delivered, read
- Online/offline presence indicators
- Message history: searchable, persistent
- Media: images, videos, voice messages
Non-Functional
- 2B users, 100B messages/day
- Message delivery latency: <500ms
- Message ordering guaranteed within a conversation
- 7-year message retention
Connection Architecture
Client ──WebSocket──► Gateway Server (connection layer)
│
Message Service ──► Kafka ──► Delivery Workers
│
Cassandra (message store)
│
Presence Service ──► Redis (online status)
Each client maintains a persistent WebSocket connection to a gateway server. Gateway servers are stateless message routers — they look up which gateway holds the recipient’s connection (via Redis hash: user_id → gateway_server_id) and forward the message. With 2B users and 50K connections per gateway server: 40,000 gateway servers.
Message Flow
Sender sends message
→ Gateway accepts, assigns message_id (UUID), timestamps
→ Writes to Kafka (topic: messages, partitioned by conversation_id)
→ Returns acknowledgment to sender (message sent)
→ Delivery worker reads from Kafka
→ Writes message to Cassandra (durable storage)
→ Looks up recipient's gateway server in Redis
→ Pushes message to recipient's WebSocket
→ Recipient's client sends delivery receipt
→ Read receipt propagated back to sender
Message Storage: Cassandra
messages: (conversation_id, message_id, sender_id, content,
type, status, created_at)
PRIMARY KEY (conversation_id, message_id)
CLUSTERING ORDER BY (message_id DESC)
Cassandra is ideal for chat: wide rows model (all messages for a conversation in one partition), linear write scalability, and fast range reads for message history. Message IDs use time-ordered UUIDs (UUID v1 or Snowflake) for natural chronological ordering within a conversation.
Message Ordering and Deduplication
Within a conversation: Kafka partitioning by conversation_id ensures a single consumer processes all messages for a conversation in order. Each message carries a client-generated UUID — if the delivery worker receives the same UUID twice (Kafka at-least-once delivery), it deduplicates using a unique constraint on message_id in Cassandra (INSERT IF NOT EXISTS).
Presence Service
Online status stored in Redis: SETEX presence:{user_id} 30 “online”. Clients send a heartbeat every 15 seconds to renew TTL. If TTL expires: user is offline. Presence queries: when opening a conversation, client requests presence for all members. With 500M online users, each updating every 15 seconds: 33M writes/second → shard Redis by user_id.
Offline Message Delivery
Recipient is offline: message stored in Cassandra (already done). Send a push notification (APNs/FCM) to the recipient’s device. When recipient comes online: their client connects via WebSocket, fetches unread messages from Cassandra using last_seen_message_id as a cursor.
Group Messaging
A group message must be delivered to N members. Options:
- Fan-out on write: create N delivery tasks in Kafka. Simple, but expensive for large groups.
- Fan-out on read: store one message, each member’s client fetches on connect. Cheaper writes, but requires per-member read cursors.
WhatsApp uses fan-out on write for groups up to 256 members (bounded fan-out). Slack uses fan-out on read for large channels (workspace with 10K members would require 10K delivery tasks per message).
Media Storage
Images/videos uploaded to S3. Message stores only the S3 URL. On delivery: CDN serves media directly to recipients. Deduplication: hash media content (SHA-256); if already stored, reuse the URL (saves storage for viral memes forwarded millions of times).
Interview Tips
- WebSocket + gateway servers for persistent connections is the foundation.
- Cassandra (conversation_id, message_id) is the canonical chat storage schema.
- Presence via Redis SETEX with heartbeat renewal is the standard pattern.
- Distinguish group fan-out on write (small groups) vs read (large channels).
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you route a message to the correct gateway server holding the recipient's WebSocket connection?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “With thousands of gateway servers each holding different client connections, the message router needs to know which server has the recipient connected. Solution: maintain a presence/routing table in Redis. When a client connects: SETEX conn:{user_id} 300 "gateway-server-42" (TTL refreshed on each heartbeat). When a message arrives for user B: message service does GET conn:{user_id} → "gateway-server-42". It then makes an internal gRPC call to gateway-server-42: DeliverMessage(user_id=B, message=…). Gateway-server-42 looks up user B's WebSocket connection in its local connection map and sends the message. If GET conn:{user_id} returns nil (user offline): store the message in Cassandra and send a push notification (APNs/FCM) instead. This architecture keeps gateway servers stateless from the routing perspective — all routing state lives in Redis, not in individual servers. Adding new gateway servers is just adding more connection capacity; no re-routing is needed for existing connections.” }
},
{
“@type”: “Question”,
“name”: “How does Cassandra's data model support efficient message history retrieval for chat?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Chat message retrieval has two access patterns: (1) "Load the most recent 50 messages for conversation X" — happens on every conversation open. (2) "Load the next 50 older messages" — infinite scroll backward. Cassandra data model: PRIMARY KEY (conversation_id, message_id) CLUSTERING ORDER BY message_id DESC. This stores all messages for a conversation in a single partition, sorted by message_id in descending order. Query (1): SELECT * FROM messages WHERE conversation_id = X LIMIT 50 — returns the 50 most recent messages in O(1) (single partition, no scatter-gather). Query (2): SELECT * FROM messages WHERE conversation_id = X AND message_id < last_cursor LIMIT 50 — cursor-based pagination. This design works because Cassandra partitions are optimized for sequential reads within a partition key. The message_id should be a time-ordered UUID (Snowflake ID or UUIDv1) so descending order = chronological reverse order. Partitions can grow large (a popular group chat with millions of messages) — Cassandra handles wide rows well, but consider time-bucketing (conversation_id, year_month, message_id) for extremely active conversations.” }
},
{
“@type”: “Question”,
“name”: “How does message ordering work across devices in a distributed chat system?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Message ordering has two components: sender-to-server ordering (ensuring messages from one sender arrive in order) and global conversation ordering (all participants see messages in the same order). Sender-to-server: each message includes a client_sequence_number incremented per conversation per device. The server detects gaps (if sequence 5 arrives before 4, hold 5 until 4 arrives or a timeout). This handles out-of-order TCP segments (rare but possible). Server-to-database: Kafka partitioning by conversation_id ensures all messages for a conversation are processed by a single consumer, preserving order. Message IDs use Snowflake (timestamp + server_id + sequence) — monotonically increasing, so insertion order = chronological order in Cassandra. Global ordering: all clients fetch messages from Cassandra sorted by message_id. Since all messages go through a single Kafka partition per conversation, there is a total order. The "last seen message_id" cursor ensures clients fetch exactly the messages they missed, in order, with no duplicates or gaps.” }
}
]
}