Message Schema
Every message is a structured record stored durably before any delivery is attempted:
- message_id: Snowflake ID — 64-bit integer encoding timestamp, datacenter ID, and sequence number. Time-sortable without a secondary sort key.
- conversation_id: identifies the 1:1 or group conversation thread
- sender_id: authenticated user who created the message
- content: text payload or reference to media object in blob storage
- type: text | image | file | voice | system
- status: SENT | DELIVERED | READ (updated as delivery progresses)
- created_at, deleted_at: soft delete via
deleted_attimestamp
Message Storage
Cassandra is well-suited for chat message storage:
- Partition key:
conversation_id— all messages for a conversation land on the same partition - Clustering key:
message_id DESC— messages stored in reverse chronological order within the partition - Append-only writes: chat messages are never updated in place (edits create new versions), so Cassandra's write-optimized SSTable structure is ideal
- Time-range queries: fetch last N messages with
WHERE conversation_id = ? AND message_id < ? LIMIT 50— efficient with the clustering key
Delivery Flow
The sequence from send to delivery:
- Sender submits message to chat server over WebSocket or HTTP
- Chat server persists message to Cassandra with status SENT
- ACK sent back to sender — sender's UI updates message to SENT state
- Chat server looks up recipient(s)' connection routing table to find their WebSocket server
- Message forwarded to recipient's WebSocket server → pushed to recipient's connected client
- Recipient's client ACKs delivery → server updates status to DELIVERED
Online Delivery via WebSocket
WebSocket provides a persistent, full-duplex connection between client and chat server. When the recipient is connected:
- The chat server pushes the message over the open WebSocket connection immediately
- No polling required — sub-100ms delivery latency is achievable
- Connection routing: a distributed hash table (Redis) maps user_id → WebSocket server address, allowing any chat server to forward to the correct server holding the recipient's connection
Offline Queuing
When the recipient is disconnected, the message is persisted with status SENT and delivery deferred:
- On reconnect, the client sends its
last_seen_message_id - The server queries Cassandra for all messages in the conversation with
message_id > last_seen_message_id - Missed messages are replayed in order over the newly established WebSocket connection
Message Ordering
Snowflake IDs provide total ordering without distributed coordination:
- The timestamp component (41 bits) ensures messages from different senders are ordered by wall clock time
- The sequence number component resolves ties within the same millisecond on the same generator node
- Clients sort received messages by message_id — lexicographic sort equals chronological sort
At-Least-Once Delivery
The sender retries if it does not receive an ACK within a timeout (e.g., 5 seconds):
- Each message carries a unique
message_id(client-generated or server-assigned) - The server applies a unique constraint on
message_idat the storage layer — duplicate submits are idempotent: the existing record is returned, not a new one created - This guarantees at-least-once delivery without risk of duplicate messages appearing in the conversation
Read Receipts
Read receipts allow senders to know when their message has been seen:
- When the recipient's client renders a message, it sends a read event:
{ conversation_id, last_read_message_id } - The server updates the recipient's
last_read_message_idin aconversation_memberstable - The server broadcasts the read event to other conversation members so their UIs can display read receipts
- Batch reads: clients batch read events into a single update rather than sending one per message to reduce write amplification
Group Chat Fan-Out
In 1:1 chat, delivery is simple. Group chat requires delivering one message to N members:
- Fan-out-on-write (small groups, <500 members): write one copy of the message to each member's delivery queue immediately. Simple, low read latency.
- Fan-out-on-read (large groups, >500 members): store one copy of the message, members fetch it when they open the conversation. Avoids write amplification for very large groups (e.g., broadcast channels).
Message Editing, Deletion, Push Notifications, and E2E Encryption
Editing: store a new version of the message content with an edited_at timestamp. Keep edit history for audit purposes. Clients display the latest version with an “edited” indicator.
Soft deletion: set deleted_at on the message record. Clients display “This message was deleted.” No content is transmitted after deletion.
Push notifications: for offline users, send an APNs (iOS) or FCM (Android) push notification with a truncated message preview. The notification wakes the app, which then fetches the full message history via the offline queue mechanism.
End-to-end encryption: the sender encrypts message content with the recipient's public key before transmitting to the server. The server stores only ciphertext — it cannot read message content. Key exchange uses the Signal protocol (double ratchet algorithm) for forward secrecy.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you design the message storage schema for a chat service that supports billions of messages?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Model messages with a wide-column store (e.g., Cassandra or DynamoDB) using a composite partition key of (conversation_id) and a clustering key of (message_id DESC), where message_id is a time-ordered UUID (UUIDv7 or Snowflake ID). This layout gives O(1) writes and efficient range scans for paginated history. Store message payload, sender_id, type, and client-assigned idempotency_key in the row. Cap partition size by bucketing large conversations: partition key becomes (conversation_id, bucket) where bucket = floor(message_timestamp / bucket_duration). Maintain a separate `conversations` table in a relational store for metadata (participant list, last_message_id, created_at) and a `conversation_members` table for group membership and per-member last-read pointers.”
}
},
{
“@type”: “Question”,
“name”: “How do you guarantee at-least-once message delivery with deduplication in a chat system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The sender assigns a client-generated idempotency_key (UUID) to each message. The server stores this key with the message and returns an ACK containing the server-assigned message_id. If the sender doesn't receive an ACK within a timeout, it retransmits the same message with the same idempotency_key. The server detects duplicates via a UNIQUE constraint (or conditional write) on (conversation_id, idempotency_key) and returns the original message_id without re-inserting. For delivery to recipients, publish the message to a queue (e.g., Kafka topic partitioned by conversation_id). Each recipient's push worker consumes the queue, pushes to the client via WebSocket or FCM, and tracks delivery state in a `message_deliveries` table (message_id, recipient_id, status ENUM(‘pending’,’delivered’,’read’)). Clients ACK receipt; unACKed messages are retried by the worker.”
}
},
{
“@type”: “Question”,
“name”: “How do read receipts work at scale, and how do you avoid write amplification in large group chats?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “In 1:1 chats, a read receipt is a simple UPDATE on `message_deliveries` (recipient_id, message_id) → status=’read’, plus a WebSocket push to the sender. In group chats with N members, a naive per-message-per-member receipt table produces O(N × M) rows for M messages. Mitigate with a cursor approach: store only each member's `last_read_message_id` (a high-water mark) in the `conversation_members` table. A message is considered ‘read by all’ when MIN(last_read_message_id) across all members exceeds its ID. This reduces receipt storage to O(N) per conversation regardless of message count. Batch read-receipt updates: clients send a single ‘read up to message_id X’ event rather than one event per message. Debounce on the server side with a short window (1–2 seconds) before flushing to the DB to absorb rapid scroll-through events.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle message ordering and gap detection when clients reconnect after being offline?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Assign each message a monotonically increasing sequence number scoped to the conversation (a per-conversation counter maintained in Redis with INCR, persisted to the DB on write). Clients track the highest sequence number they've received. On reconnect, the client sends a ‘sync’ request with its last_seen_seq; the server queries the message store for all messages WHERE conversation_id = X AND seq > last_seen_seq ORDER BY seq ASC LIMIT 200. Clients detect gaps by checking for sequence discontinuities in the stream. For real-time delivery, use a fan-out-on-write pattern: when a message is persisted, push it to an in-memory pub/sub channel (e.g., Redis Pub/Sub) keyed by conversation_id; all connected participants' WebSocket servers subscribe to this channel and forward to clients. Offline clients miss pub/sub events and rely entirely on the sync-on-reconnect pull path.”
}
}
]
}
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Atlassian Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Snap Interview Guide
See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering