Designing a chat application like WhatsApp or Facebook Messenger is a classic system design question that tests real-time communication, message delivery guarantees, presence tracking, and encryption. WhatsApp handles over 100 billion messages per day across 2 billion users. This guide covers the architecture for a production-scale chat system — from message send to message delivery — with the depth expected at senior engineering interviews.
High-Level Architecture
Core services: (1) Connection Service — maintains persistent WebSocket connections with all online users. When a user opens the app, it establishes a WebSocket connection to the nearest connection server. (2) Message Service — handles message creation, validation, storage, and routing. (3) Presence Service — tracks which users are online and their last seen time. (4) Group Service — manages group membership, group messages, and fanout. (5) Notification Service — sends push notifications to offline users. (6) Media Service — handles image, video, and document upload, storage, and delivery. Message flow for 1-to-1 chat: User A sends a message -> WebSocket to connection server -> message service validates and stores -> checks if User B is online -> if online, routes to B connection server -> delivers via WebSocket -> B receives message -> sends delivery acknowledgment. If B is offline: store the message for later delivery and send a push notification.
WebSocket Connection Management
WebSocket provides full-duplex communication between client and server. Each user maintains one persistent WebSocket connection. With 500 million concurrent users, the connection layer must handle 500M persistent connections. Each connection server handles 50,000-500,000 concurrent WebSocket connections (depending on memory — each connection uses 10-50KB). With 500M users: approximately 1,000-10,000 connection servers. Connection routing: when User A sends a message to User B, the message service needs to know which connection server holds B WebSocket. Use a distributed mapping: Redis hash map user_id -> connection_server_id. When a user connects, register the mapping. On disconnect, remove it. Connection heartbeats: clients send a ping every 30 seconds. If no ping for 60 seconds, the connection is considered dead and cleaned up. Mobile optimization: on mobile, the OS may kill background WebSocket connections to save battery. Use push notifications (APNs/FCM) as a fallback for offline delivery. When the user opens the app, re-establish the WebSocket and pull any pending messages.
Message Storage and Delivery Guarantees
Message storage: each message is stored in a database with: message_id (Snowflake-style time-sorted ID), conversation_id, sender_id, content (encrypted), created_at, and delivery_status (sent, delivered, read). Database choice: Cassandra or HBase for high write throughput with time-series access patterns. Partition key: conversation_id. Clustering key: message_id (time-sorted). This enables efficient retrieval of messages in a conversation in chronological order. Delivery guarantee: at-least-once. The server sends the message to the recipient and waits for a delivery ACK. If no ACK within 5 seconds, retry. The client deduplicates by message_id (if it receives the same message_id twice, ignore the duplicate). Message statuses: sent (server received from sender — single checkmark), delivered (recipient device received — double checkmark), read (recipient opened the conversation — blue checkmarks). Each status change is sent as a separate message from client to server. Offline message delivery: messages for offline users are stored in a pending messages queue per user. When the user comes online, the connection server pulls pending messages and delivers them in order.
Group Chat Architecture
Group messages require fanout: one message from the sender must be delivered to all group members. Small groups (up to 256 members, like WhatsApp): the sender sends the message once to the server. The server fans out by sending the message to each group member individually. With a 100-member group, this creates 99 messages. The fan-out happens on the server side — the sender does not need to send 99 copies. For each member, the server checks if they are online (deliver via WebSocket) or offline (store for later + push notification). Large groups / channels (thousands or millions of members, like Telegram channels): server-side fanout is too expensive for millions of members. Use a pull model: the message is stored once, and members pull new messages when they open the channel. A push notification tells them there are new messages. Hybrid approach: push to the first N active members (who are online), pull for the rest. Group metadata: group_id, name, avatar, member list, admins. Store in a relational database (PostgreSQL). Member limit per group keeps the fanout manageable. Message ordering in groups: use the message_id (time-sorted Snowflake ID) as the ordering key. Messages arrive at the server in roughly chronological order. The server assigns the message_id, ensuring a consistent order for all members.
End-to-End Encryption
End-to-end encryption (E2E) ensures that only the sender and recipient can read the message — the server sees only ciphertext. WhatsApp uses the Signal Protocol. Key exchange: each user generates a pair of keys (public and private). Public keys are uploaded to the server. When User A wants to message User B, A downloads B public key, generates a shared secret using Diffie-Hellman key exchange, and encrypts the message with the shared secret. The server relays the ciphertext to B. B decrypts using their private key. The server never has access to the plaintext. For group E2E encryption: the group creator generates a group encryption key and shares it with each member individually (encrypted with each member public key). Messages are encrypted with the group key. When a member is removed, a new group key is generated and distributed. Tradeoff: E2E encryption prevents server-side features like search (the server cannot index encrypted content), content moderation (cannot scan for prohibited content), and server-side backup (backups must be encrypted on the client side). These are deliberate tradeoffs for privacy.
Presence and Typing Indicators
Presence: tracking whether a user is online or offline, and their “last seen” time. Implementation: when a user WebSocket connection is active and they interact with the app, they are “online.” Store presence in Redis: SET presence:{user_id} online EX 60. The client sends a heartbeat every 30 seconds that refreshes the TTL. If the TTL expires (no heartbeat for 60 seconds), the user is “offline.” Last seen: when the user goes offline, record the timestamp. Other users see “last seen at 3:45 PM.” Privacy settings: allow users to hide their presence and last seen (WhatsApp supports this). Typing indicators: when User A starts typing in a conversation with User B, A client sends a “typing” event to the server. The server forwards it to B via WebSocket. B app shows “typing…” for a few seconds (with a TTL — if no new typing event arrives, the indicator disappears). Typing indicators are fire-and-forget — no delivery guarantee needed. If the event is lost, the indicator just does not show. Do not store typing events — they are transient.