Requirements
- One-on-one and group messaging (up to 500 members)
- Message delivery with read receipts (sent, delivered, read)
- Message history persistence and search
- Online/offline presence indicators
- 500M users, 100M DAU, 10B messages/day
Architecture
Client → WebSocket Server (stateful, per-connection)
→ Chat API (REST for history, group management)
→ Message Store (Cassandra — write-heavy, time-series)
→ Presence Service (Redis — online/offline status)
→ Push Notification Service (APNs/FCM — offline users)
→ Kafka (fan-out, offline delivery, analytics)
WebSocket Connection Management
Chat requires persistent bidirectional connections. WebSocket servers are stateful — each connection lives on a specific server. Challenge: if User A (on WS server 1) sends to User B (on WS server 2), server 1 must forward to server 2.
Solution: Redis Pub/Sub fan-out. Each WS server subscribes to a Redis channel per connected user. When a message arrives for User B: publish to Redis channel user:{B}. All WS servers receive the publish; only the one with B’s connection delivers it.
# On connection: register user's server
redis.hset('user_server', user_id, server_id)
redis.subscribe(f'user:{user_id}') # this server listens for user B's messages
# On message send from A to B:
redis.publish(f'user:{B}', json.dumps(message)) # broadcast to all WS servers
# The WS server that has B's connection delivers it
Data Model
Chat(chat_id UUID, type ENUM(DIRECT,GROUP), name, created_at)
ChatMember(chat_id, user_id, role ENUM(ADMIN,MEMBER), joined_at, last_read_message_id)
Message(message_id UUID, chat_id UUID, sender_id UUID, content TEXT,
type ENUM(TEXT,IMAGE,FILE,SYSTEM), created_at, edited_at)
-- Cassandra schema:
-- Partition key: chat_id
-- Clustering key: created_at DESC, message_id
-- Enables: fetch latest N messages for a chat, paginate backwards
MessageReceipt(message_id, user_id, status ENUM(DELIVERED,READ), updated_at)
Message Delivery Flow
- Client sends message via WebSocket to WS server
- WS server persists to Cassandra (async, acknowledge immediately)
- WS server publishes to Kafka topic: messages
- Fan-out service consumes from Kafka:
- For each group member: publish to Redis user:{member_id}
- For offline members: enqueue push notification (APNs/FCM)
- Update message delivery receipts
- Online recipients receive via WebSocket; offline via push
Read Receipts
Track delivery and read status per message per user. Delivery receipt: when the message is delivered to the client’s device (WebSocket received or push delivered). Read receipt: when the user opens and views the message. Implementation: client sends ACK to server when message is displayed. Update MessageReceipt and update ChatMember.last_read_message_id. For group chats: show read count (N of M members have read). Aggregate: SELECT COUNT(*) FROM MessageReceipt WHERE message_id=X AND status=READ.
Presence Service
# On connect: set online
redis.setex(f'presence:{user_id}', 30, 'online')
redis.publish('presence_events', json.dumps({'user_id': uid, 'status': 'online'}))
# Heartbeat every 20s from client: extend TTL
redis.expire(f'presence:{user_id}', 30)
# On disconnect or TTL expiry: user goes offline
# Subscribers to presence_events receive offline notification
Presence information is broadcast to a user’s contacts via Redis Pub/Sub. Contacts subscribe to presence_events and filter for users they care about. At scale: use a dedicated presence service with sharded Redis.
Message Search
Cassandra is not suited for full-text search. For message search: dual-write to Elasticsearch (async via Kafka). Elasticsearch index: message_id, chat_id, sender_id, content (text), created_at. Search: GET /search?q=keyword&chat_id=X. Search is a secondary use case — latency of seconds is acceptable. Restrict search to chats the user is a member of (filter by chat_id).
Key Design Decisions
- WebSocket for real-time; fallback to long-polling for restricted networks
- Redis Pub/Sub for cross-server message routing — decouples WS servers
- Cassandra for message storage — high write throughput, time-series access pattern
- Kafka fan-out — decouples message receipt from delivery to multiple channels
- Heartbeat-based presence with TTL — handles disconnects without explicit logout
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does a chat system deliver messages in real time using WebSockets?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”WebSocket provides a persistent, full-duplex TCP connection between client and server. Unlike HTTP (request/response), WebSocket allows the server to push messages to the client at any time. On connect: the client performs an HTTP Upgrade to WebSocket. The connection stays open for the session lifetime. On message send: client sends the message payload over the WebSocket connection. The server receives it, persists to the message store, and routes it to the recipient. On receive: the WebSocket server that holds the recipient's connection writes the message to the socket. The client receives it without polling. Scale challenge: WebSocket servers are stateful — each connection is pinned to one server instance. To route messages across server instances (User A on Server 1, User B on Server 2), use Redis Pub/Sub: Server 1 publishes the message; Server 2 (subscribed to User B's channel) delivers it.”}},{“@type”:”Question”,”name”:”How do you store and retrieve chat messages efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Chat messages are write-heavy (billions per day) and read with a time-series access pattern: "give me the last 50 messages in this chat, then load more on scroll." Cassandra is ideal: partition key = chat_id (all messages for a chat on the same node), clustering key = created_at DESC (recent messages first). Read pattern: SELECT * FROM messages WHERE chat_id = X ORDER BY created_at DESC LIMIT 50. Pagination: cursor = (last_created_at, last_message_id) for WHERE created_at < cursor_ts OR (created_at = cursor_ts AND message_id < cursor_mid). Write throughput: Cassandra handles hundreds of thousands of writes/second with horizontal scaling. Retention: keep messages indefinitely for users who need history, or delete after N months for GDPR compliance (soft delete: set status=DELETED, purge content). Media (images, files): store in S3, store only the S3 URL in the message.”}},{“@type”:”Question”,”name”:”How do read receipts work in a chat system at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Read receipts track two states per message per recipient: delivered (message reached the device) and read (user opened the conversation). Delivered: when the WebSocket server pushes the message to the client, the client sends a DELIVERED ACK. Server updates MessageReceipt(message_id, user_id, status=DELIVERED). Read: when the user opens the chat and the message is visible on screen, the client sends a READ ACK (batch — send one READ event for the highest message_id seen). Server updates last_read_message_id in ChatMember. All messages up to that ID are implicitly read. Scale concern: in a 500-person group chat, every message generates up to 500 delivery receipts. With 10B messages/day in a 10-person average chat = 100B receipt events/day. Batch receipt updates: client sends one READ event per chat session open (not per message). Use Kafka to buffer receipt events and update DB in batches. Cache last_read_message_id in Redis for fast unread count queries.”}},{“@type”:”Question”,”name”:”How does presence detection work in a chat system?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Presence shows whether a user is currently online. Implementation: on WebSocket connect, set Redis key presence:{user_id} = online with TTL=30s. Client sends a heartbeat every 20s extending the TTL. On clean disconnect: delete the key. On unclean disconnect (network drop): the key expires naturally after 30s — the user appears offline within 30s of disconnecting. At 100M connected users, storing a Redis key per user: 100M * ~100 bytes = ~10GB — manageable with Redis Cluster. Presence broadcasting: when a user's status changes (online → offline), broadcast to their contacts. Fan-out challenge: if a user has 5,000 contacts, status change = 5,000 notifications. Optimization: lazy presence — only send online/offline updates to contacts who are currently viewing a chat with that user, not all contacts. Group presence: show only aggregated count ("5 of 10 members online") in large groups.”}},{“@type”:”Question”,”name”:”How do you handle message delivery to offline users in a chat system?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Offline users cannot receive WebSocket messages. Delivery to offline users: (1) Push notifications: when a message is sent and the recipient is offline (no active WebSocket connection, presence key expired), send a push notification via APNs (iOS) or FCM (Android). Include the sender name and message preview. On tap: app opens to the specific chat. (2) Message inbox: messages are persisted in Cassandra regardless of online status. When the user reconnects, they fetch missed messages: SELECT * FROM messages WHERE chat_id IN (my_chats) AND created_at > last_seen_at ORDER BY created_at DESC. (3) Unread count: maintain an unread count per (user, chat) in Redis: HINCRBY unread:{user_id} {chat_id} 1 on each new message. On chat open: reset to 0. Show badge on app icon = sum of all unread counts. Offline delivery guarantee: messages are durably stored — users never lose messages even if offline for weeks.”}}]}
Meta system design is the canonical chat system interview topic. See common questions for Meta interview: WhatsApp and Messenger chat system design.
Snap system design covers real-time messaging. Review patterns for Snap interview: chat and messaging system design.
LinkedIn system design covers professional messaging. See design patterns for LinkedIn interview: messaging and chat system design.