Designing a chat application like WhatsApp or Facebook Messenger is a classic system design question that tests real-time communication, message delivery guarantees, presence tracking, and encryption. WhatsApp handles over 100 billion messages per day across 2 billion users. This guide covers the architecture for a production-scale chat system — from message send to message delivery — with the depth expected at senior engineering interviews.
High-Level Architecture
Core services: (1) Connection Service — maintains persistent WebSocket connections with all online users. When a user opens the app, it establishes a WebSocket connection to the nearest connection server. (2) Message Service — handles message creation, validation, storage, and routing. (3) Presence Service — tracks which users are online and their last seen time. (4) Group Service — manages group membership, group messages, and fanout. (5) Notification Service — sends push notifications to offline users. (6) Media Service — handles image, video, and document upload, storage, and delivery. Message flow for 1-to-1 chat: User A sends a message -> WebSocket to connection server -> message service validates and stores -> checks if User B is online -> if online, routes to B connection server -> delivers via WebSocket -> B receives message -> sends delivery acknowledgment. If B is offline: store the message for later delivery and send a push notification.
WebSocket Connection Management
WebSocket provides full-duplex communication between client and server. Each user maintains one persistent WebSocket connection. With 500 million concurrent users, the connection layer must handle 500M persistent connections. Each connection server handles 50,000-500,000 concurrent WebSocket connections (depending on memory — each connection uses 10-50KB). With 500M users: approximately 1,000-10,000 connection servers. Connection routing: when User A sends a message to User B, the message service needs to know which connection server holds B WebSocket. Use a distributed mapping: Redis hash map user_id -> connection_server_id. When a user connects, register the mapping. On disconnect, remove it. Connection heartbeats: clients send a ping every 30 seconds. If no ping for 60 seconds, the connection is considered dead and cleaned up. Mobile optimization: on mobile, the OS may kill background WebSocket connections to save battery. Use push notifications (APNs/FCM) as a fallback for offline delivery. When the user opens the app, re-establish the WebSocket and pull any pending messages.
Message Storage and Delivery Guarantees
Message storage: each message is stored in a database with: message_id (Snowflake-style time-sorted ID), conversation_id, sender_id, content (encrypted), created_at, and delivery_status (sent, delivered, read). Database choice: Cassandra or HBase for high write throughput with time-series access patterns. Partition key: conversation_id. Clustering key: message_id (time-sorted). This enables efficient retrieval of messages in a conversation in chronological order. Delivery guarantee: at-least-once. The server sends the message to the recipient and waits for a delivery ACK. If no ACK within 5 seconds, retry. The client deduplicates by message_id (if it receives the same message_id twice, ignore the duplicate). Message statuses: sent (server received from sender — single checkmark), delivered (recipient device received — double checkmark), read (recipient opened the conversation — blue checkmarks). Each status change is sent as a separate message from client to server. Offline message delivery: messages for offline users are stored in a pending messages queue per user. When the user comes online, the connection server pulls pending messages and delivers them in order.
Group Chat Architecture
Group messages require fanout: one message from the sender must be delivered to all group members. Small groups (up to 256 members, like WhatsApp): the sender sends the message once to the server. The server fans out by sending the message to each group member individually. With a 100-member group, this creates 99 messages. The fan-out happens on the server side — the sender does not need to send 99 copies. For each member, the server checks if they are online (deliver via WebSocket) or offline (store for later + push notification). Large groups / channels (thousands or millions of members, like Telegram channels): server-side fanout is too expensive for millions of members. Use a pull model: the message is stored once, and members pull new messages when they open the channel. A push notification tells them there are new messages. Hybrid approach: push to the first N active members (who are online), pull for the rest. Group metadata: group_id, name, avatar, member list, admins. Store in a relational database (PostgreSQL). Member limit per group keeps the fanout manageable. Message ordering in groups: use the message_id (time-sorted Snowflake ID) as the ordering key. Messages arrive at the server in roughly chronological order. The server assigns the message_id, ensuring a consistent order for all members.
End-to-End Encryption
End-to-end encryption (E2E) ensures that only the sender and recipient can read the message — the server sees only ciphertext. WhatsApp uses the Signal Protocol. Key exchange: each user generates a pair of keys (public and private). Public keys are uploaded to the server. When User A wants to message User B, A downloads B public key, generates a shared secret using Diffie-Hellman key exchange, and encrypts the message with the shared secret. The server relays the ciphertext to B. B decrypts using their private key. The server never has access to the plaintext. For group E2E encryption: the group creator generates a group encryption key and shares it with each member individually (encrypted with each member public key). Messages are encrypted with the group key. When a member is removed, a new group key is generated and distributed. Tradeoff: E2E encryption prevents server-side features like search (the server cannot index encrypted content), content moderation (cannot scan for prohibited content), and server-side backup (backups must be encrypted on the client side). These are deliberate tradeoffs for privacy.
Presence and Typing Indicators
Presence: tracking whether a user is online or offline, and their “last seen” time. Implementation: when a user WebSocket connection is active and they interact with the app, they are “online.” Store presence in Redis: SET presence:{user_id} online EX 60. The client sends a heartbeat every 30 seconds that refreshes the TTL. If the TTL expires (no heartbeat for 60 seconds), the user is “offline.” Last seen: when the user goes offline, record the timestamp. Other users see “last seen at 3:45 PM.” Privacy settings: allow users to hide their presence and last seen (WhatsApp supports this). Typing indicators: when User A starts typing in a conversation with User B, A client sends a “typing” event to the server. The server forwards it to B via WebSocket. B app shows “typing…” for a few seconds (with a TTL — if no new typing event arrives, the indicator disappears). Typing indicators are fire-and-forget — no delivery guarantee needed. If the event is lost, the indicator just does not show. Do not store typing events — they are transient.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does a chat application deliver messages in real-time?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Real-time delivery uses persistent WebSocket connections. When a user opens the app, it establishes a WebSocket to the nearest connection server. The server maintains a mapping of user_id to connection_server_id in Redis. When User A sends a message to User B: the message goes to the message service via A WebSocket, the service stores the message, looks up B connection server in Redis, routes the message to that server, which delivers it via B WebSocket. If B is offline (no WebSocket), the message is stored in a pending queue and a push notification is sent. When B opens the app, it reconnects via WebSocket and pulls all pending messages. Delivery acknowledgment: B client sends an ACK after receiving the message. If no ACK within 5 seconds, the server retries. The client deduplicates by message_id.”}},{“@type”:”Question”,”name”:”How does group chat message fanout work at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For small groups (up to 256 members like WhatsApp): the sender sends the message once to the server. The server fans out by sending to each group member individually — checking online status for each: online members get immediate WebSocket delivery, offline members get push notifications and stored messages. For a 100-member group, this creates 99 individual deliveries. For large groups or channels (thousands to millions of members like Telegram): server-side fanout is too expensive. Use a pull model: store the message once in the channel message log. When members open the channel, they pull new messages since their last read offset. A push notification alerts them to check. Hybrid approach: push to the most recently active members, pull for the rest. This balances real-time delivery for engaged users with efficiency for the long tail.”}},{“@type”:”Question”,”name”:”How does end-to-end encryption work in a chat application?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”End-to-end encryption (E2EE) ensures only the sender and recipient can read messages — the server handles only ciphertext. WhatsApp uses the Signal Protocol. Setup: each user generates a public-private key pair. Public keys are uploaded to the server. To message User B: User A downloads B public key, performs Diffie-Hellman key exchange to derive a shared secret, and encrypts the message with this secret. The server relays the ciphertext. B decrypts with their private key. Group E2EE: the group creator generates a symmetric group key and distributes it to each member encrypted with their individual public key. Messages are encrypted with the group key. When a member is removed, a new group key is generated. Tradeoffs: E2EE prevents server-side search (cannot index encrypted content), content moderation (cannot scan messages), and server-side backup (backups must be client-encrypted).”}},{“@type”:”Question”,”name”:”How do you scale WebSocket connections for 500 million concurrent users?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Each connection server handles 50K-500K concurrent WebSocket connections (limited by memory — each connection uses 10-50KB). For 500M concurrent users: 1,000-10,000 connection servers. Architecture: (1) Connection routing — a Redis hash maps user_id to connection_server_id. When a user connects, register the mapping. On disconnect, remove it. (2) Load balancing — use a Layer 4 load balancer (TCP-level) to distribute WebSocket connections across servers. HTTP-level (L7) load balancers also work but add overhead. (3) Geographic distribution — deploy connection servers in multiple regions. Route users to the nearest region via GeoDNS. (4) Heartbeats — clients ping every 30 seconds. Connections without a heartbeat for 60 seconds are cleaned up. (5) Graceful shutdown — when deploying a new version of the connection server, drain existing connections over 30 seconds before shutting down. Clients reconnect to a new server. (6) Fallback — if WebSocket fails (firewalls, proxies), fall back to long polling.”}}]}