Chat System Low-Level Design (WhatsApp / Messenger)

Requirements

  • One-on-one and group messaging (up to 500 members)
  • Message delivery with read receipts (sent, delivered, read)
  • Message history persistence and search
  • Online/offline presence indicators
  • 500M users, 100M DAU, 10B messages/day

Architecture

Client → WebSocket Server (stateful, per-connection)
       → Chat API (REST for history, group management)
       → Message Store (Cassandra — write-heavy, time-series)
       → Presence Service (Redis — online/offline status)
       → Push Notification Service (APNs/FCM — offline users)
       → Kafka (fan-out, offline delivery, analytics)

WebSocket Connection Management

Chat requires persistent bidirectional connections. WebSocket servers are stateful — each connection lives on a specific server. Challenge: if User A (on WS server 1) sends to User B (on WS server 2), server 1 must forward to server 2.

Solution: Redis Pub/Sub fan-out. Each WS server subscribes to a Redis channel per connected user. When a message arrives for User B: publish to Redis channel user:{B}. All WS servers receive the publish; only the one with B’s connection delivers it.

# On connection: register user's server
redis.hset('user_server', user_id, server_id)
redis.subscribe(f'user:{user_id}')  # this server listens for user B's messages

# On message send from A to B:
redis.publish(f'user:{B}', json.dumps(message))  # broadcast to all WS servers
# The WS server that has B's connection delivers it

Data Model

Chat(chat_id UUID, type ENUM(DIRECT,GROUP), name, created_at)
ChatMember(chat_id, user_id, role ENUM(ADMIN,MEMBER), joined_at, last_read_message_id)
Message(message_id UUID, chat_id UUID, sender_id UUID, content TEXT,
        type ENUM(TEXT,IMAGE,FILE,SYSTEM), created_at, edited_at)
-- Cassandra schema:
-- Partition key: chat_id
-- Clustering key: created_at DESC, message_id
-- Enables: fetch latest N messages for a chat, paginate backwards

MessageReceipt(message_id, user_id, status ENUM(DELIVERED,READ), updated_at)

Message Delivery Flow

  1. Client sends message via WebSocket to WS server
  2. WS server persists to Cassandra (async, acknowledge immediately)
  3. WS server publishes to Kafka topic: messages
  4. Fan-out service consumes from Kafka:
    • For each group member: publish to Redis user:{member_id}
    • For offline members: enqueue push notification (APNs/FCM)
    • Update message delivery receipts
  5. Online recipients receive via WebSocket; offline via push

Read Receipts

Track delivery and read status per message per user. Delivery receipt: when the message is delivered to the client’s device (WebSocket received or push delivered). Read receipt: when the user opens and views the message. Implementation: client sends ACK to server when message is displayed. Update MessageReceipt and update ChatMember.last_read_message_id. For group chats: show read count (N of M members have read). Aggregate: SELECT COUNT(*) FROM MessageReceipt WHERE message_id=X AND status=READ.

Presence Service

# On connect: set online
redis.setex(f'presence:{user_id}', 30, 'online')
redis.publish('presence_events', json.dumps({'user_id': uid, 'status': 'online'}))

# Heartbeat every 20s from client: extend TTL
redis.expire(f'presence:{user_id}', 30)

# On disconnect or TTL expiry: user goes offline
# Subscribers to presence_events receive offline notification

Presence information is broadcast to a user’s contacts via Redis Pub/Sub. Contacts subscribe to presence_events and filter for users they care about. At scale: use a dedicated presence service with sharded Redis.

Cassandra is not suited for full-text search. For message search: dual-write to Elasticsearch (async via Kafka). Elasticsearch index: message_id, chat_id, sender_id, content (text), created_at. Search: GET /search?q=keyword&chat_id=X. Search is a secondary use case — latency of seconds is acceptable. Restrict search to chats the user is a member of (filter by chat_id).

Key Design Decisions

  • WebSocket for real-time; fallback to long-polling for restricted networks
  • Redis Pub/Sub for cross-server message routing — decouples WS servers
  • Cassandra for message storage — high write throughput, time-series access pattern
  • Kafka fan-out — decouples message receipt from delivery to multiple channels
  • Heartbeat-based presence with TTL — handles disconnects without explicit logout

Meta system design is the canonical chat system interview topic. See common questions for Meta interview: WhatsApp and Messenger chat system design.

Snap system design covers real-time messaging. Review patterns for Snap interview: chat and messaging system design.

LinkedIn system design covers professional messaging. See design patterns for LinkedIn interview: messaging and chat system design.

Scroll to Top