Designing an email system is a favorite at Google (who built Gmail) and at companies that rely heavily on transactional email. It combines protocol knowledge, distributed storage, full-text search, and deliverability challenges at billion-user scale.
Email Protocols
| Protocol | Purpose | Port |
|---|---|---|
| SMTP | Sending and relaying email between servers | 25 (server-to-server), 587 (submission with auth) |
| IMAP | Client retrieves and manages mail on server (server stores master copy) | 993 (TLS) |
| POP3 | Client downloads and deletes mail from server (local copy) | 995 (TLS) |
Modern email clients (Gmail web, Outlook, Apple Mail) use IMAP or proprietary APIs (Gmail API, Microsoft Graph). SMTP is still the universal protocol for server-to-server delivery.
High-Level Architecture
Sending:
[Client] → SMTP Submission (port 587, auth) → [Outbound SMTP Gateway]
↓
[DNS MX lookup → target server]
↓
[Target SMTP Server (inbound)]
Receiving:
[Inbound SMTP Gateway] → [Spam filter] → [Message Queue (Kafka)]
↓
[Storage Service] → [Object Store]
↓
[Search Indexer → Elasticsearch]
↓
[Client (IMAP / Gmail API)]
Message Storage Design
Database Schema
-- Mailboxes and folders
CREATE TABLE mailboxes (
id BIGINT PRIMARY KEY, -- user_id
email VARCHAR(255) UNIQUE NOT NULL,
quota_bytes BIGINT DEFAULT 15 * 1024 * 1024 * 1024, -- 15GB
used_bytes BIGINT DEFAULT 0
);
-- Threads (email conversations)
CREATE TABLE threads (
id BIGINT PRIMARY KEY,
subject TEXT NOT NULL,
last_msg_at TIMESTAMP NOT NULL
);
-- Messages (metadata only — body stored in object store)
CREATE TABLE messages (
id BIGINT PRIMARY KEY,
thread_id BIGINT REFERENCES threads(id),
mailbox_id BIGINT REFERENCES mailboxes(id),
from_addr VARCHAR(255) NOT NULL,
to_addrs TEXT NOT NULL, -- JSON array
subject TEXT,
body_key VARCHAR(255), -- S3 object key
size_bytes INT NOT NULL,
received_at TIMESTAMP NOT NULL,
labels TEXT DEFAULT "[]", -- JSON array: [INBOX, UNREAD, STARRED]
is_read BOOLEAN DEFAULT FALSE,
spam_score FLOAT
);
CREATE INDEX idx_messages_mailbox_thread ON messages(mailbox_id, thread_id);
CREATE INDEX idx_messages_mailbox_time ON messages(mailbox_id, received_at DESC);
Body Storage
Email bodies range from 1KB to 25MB (with attachments). Store raw MIME in object storage (S3/GCS) with the key stored in the messages table. Use content-addressable storage (SHA-256 hash of content as key) to deduplicate identical attachments sent to multiple recipients in the same domain.
import hashlib
def store_message_body(raw_mime: bytes) -> str:
content_hash = hashlib.sha256(raw_mime).hexdigest()
key = f"messages/{content_hash[:2]}/{content_hash}"
if not s3.object_exists(key):
s3.put(key, raw_mime, content_type="message/rfc822")
return key # store this key in the messages table
Inbox Zero: Thread Grouping
Gmail groups messages into conversations (threads) using the References and In-Reply-To headers from the MIME headers. Two messages are in the same thread if they share a Message-ID chain:
def assign_thread(msg) -> int:
# Check if this is a reply to an existing thread
in_reply_to = msg.headers.get("In-Reply-To")
references = msg.headers.get("References", "").split()
# Look for any referenced message in our DB
for msg_id in ([in_reply_to] + references):
if msg_id:
parent = db.query(
"SELECT thread_id FROM messages WHERE message_id = %s", msg_id
)
if parent:
return parent.thread_id
# No parent found — create a new thread
return db.insert_thread(subject=msg.subject)
Full-Text Search
Gmail users search billions of messages. Traditional B-tree indexes are insufficient for full-text search across subjects, bodies, and senders.
# Elasticsearch index for email search
PUT /emails
{
"mappings": {
"properties": {
"mailbox_id": { "type": "keyword" },
"from_addr": { "type": "keyword" },
"to_addrs": { "type": "keyword" },
"subject": { "type": "text", "analyzer": "english" },
"body": { "type": "text", "analyzer": "english" },
"received_at":{ "type": "date" },
"labels": { "type": "keyword" },
"is_read": { "type": "boolean" }
}
}
}
# Search: all unread emails from Alice about Q4 report
{
"query": {
"bool": {
"must": [
{ "term": { "mailbox_id": "user_123" } },
{ "term": { "is_read": false } },
{ "term": { "from_addr": "alice@company.com" } },
{ "match": { "subject": "Q4 report" } }
]
}
},
"sort": [{ "received_at": "desc" }]
}
Spam Filtering Pipeline
Spam filtering happens before the message is stored:
def spam_pipeline(message) -> tuple[bool, float]:
score = 0.0
# Layer 1: IP reputation (fast, external blocklist)
if ip_blocklist.contains(message.sender_ip):
return True, 1.0 # immediate block
# Layer 2: Authentication checks (SPF, DKIM, DMARC)
if not verify_spf(message):
score += 0.3
if not verify_dkim(message):
score += 0.3
# Layer 3: Content analysis (ML model)
features = extract_features(message)
ml_score = spam_classifier.predict(features) # LightGBM or BERT
score = max(score, ml_score)
# Layer 4: User-level feedback
user_spam_rate = get_user_feedback_rate(message.from_addr, message.to_addrs)
score = score * 0.7 + user_spam_rate * 0.3
return score > 0.8, score
# SPF: DNS lookup to verify sending IP is authorized for sender domain
# DKIM: cryptographic signature in email header verified against DNS public key
# DMARC: policy for what to do when SPF/DKIM fail
Scaling to Billions of Mailboxes
Storage Sharding
Shard the messages table by mailbox_id. All messages for a user land on the same shard, enabling efficient single-user queries without cross-shard joins. Gmail uses Colossus (distributed file system) for raw storage and Bigtable for metadata.
Outbound Email Deliverability
- IP warming: gradually increase sending volume from new IPs to build reputation
- Dedicated IP pools per customer type (transactional vs marketing) — marketing spam does not hurt transactional reputation
- Bounce handling: remove hard-bounced addresses from send lists immediately; too many bounces harm sender score
- Unsubscribe: one-click unsubscribe required by Gmail and Yahoo (2024) for bulk senders
Interview Discussion Points
- How do you handle attachment deduplication? Content-addressable storage (SHA-256 hash as key) — identical PDFs sent to a million users stored once
- How does Gmail provide 1ms search on 15 years of email? Pre-built inverted index in Elasticsearch/Bigtable; user data is segmented per-user for isolated query performance
- How do you handle the “Mark as spam” training loop? User spam feedback is aggregated per sender, per domain, and fed back as features into the spam classifier training pipeline
Frequently Asked Questions
What protocols does an email system use?
Three protocols handle email: SMTP (Simple Mail Transfer Protocol) sends email between servers (port 25) and handles client submission (port 587 with authentication). IMAP (Internet Message Access Protocol, port 993 with TLS) allows clients to retrieve and manage email while keeping the master copy on the server — supporting multiple devices synced to the same mailbox. POP3 (Post Office Protocol 3, port 995) downloads email to the local client and typically deletes it from the server — one device only. Modern email clients use IMAP or proprietary APIs (Gmail API). SMTP is the universal inter-server protocol.
How does spam filtering work in an email system?
Production spam filtering uses multiple layers. First, IP reputation checks against blocklists reject mail from known spam sources before any content analysis. Second, authentication checks verify that the sending IP is authorized to send for the sender domain (SPF — Sender Policy Framework), that the message has a valid cryptographic signature (DKIM — DomainKeys Identified Mail), and that the domain has a policy for handling authentication failures (DMARC). Third, ML-based content analysis extracts features from the subject, body, links, and HTML structure and scores the message with a trained classifier (LightGBM or BERT-based). Fourth, user-level feedback (mark as spam/not spam) is aggregated per sender and incorporated into personalized filtering.
How does Gmail scale to billions of mailboxes?
Gmail uses a tiered storage architecture: message metadata (sender, recipient, timestamps, labels, thread ID) in a horizontally sharded relational database (sharded by mailbox/user ID so all of one user data lives on the same shard), message bodies and attachments in Google Colossus (distributed file system) referenced by content-hash keys for deduplication, and full-text search in a custom search infrastructure backed by Bigtable. User data is isolated per-shard, so a search query only touches the relevant shard. Content-addressable storage deduplicates identical attachments across recipients. Elasticsearch handles near-real-time indexing for search features with a sub-second indexing lag.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What protocols does an email system use?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Three protocols handle email: SMTP (Simple Mail Transfer Protocol) sends email between servers (port 25) and handles client submission (port 587 with authentication). IMAP (Internet Message Access Protocol, port 993 with TLS) allows clients to retrieve and manage email while keeping the master copy on the server — supporting multiple devices synced to the same mailbox. POP3 (Post Office Protocol 3, port 995) downloads email to the local client and typically deletes it from the server — one device only. Modern email clients use IMAP or proprietary APIs (Gmail API). SMTP is the universal inter-server protocol.”
}
},
{
“@type”: “Question”,
“name”: “How does spam filtering work in an email system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Production spam filtering uses multiple layers. First, IP reputation checks against blocklists reject mail from known spam sources before any content analysis. Second, authentication checks verify that the sending IP is authorized to send for the sender domain (SPF — Sender Policy Framework), that the message has a valid cryptographic signature (DKIM — DomainKeys Identified Mail), and that the domain has a policy for handling authentication failures (DMARC). Third, ML-based content analysis extracts features from the subject, body, links, and HTML structure and scores the message with a trained classifier (LightGBM or BERT-based). Fourth, user-level feedback (mark as spam/not spam) is aggregated per sender and incorporated into personalized filtering.”
}
},
{
“@type”: “Question”,
“name”: “How does Gmail scale to billions of mailboxes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Gmail uses a tiered storage architecture: message metadata (sender, recipient, timestamps, labels, thread ID) in a horizontally sharded relational database (sharded by mailbox/user ID so all of one user data lives on the same shard), message bodies and attachments in Google Colossus (distributed file system) referenced by content-hash keys for deduplication, and full-text search in a custom search infrastructure backed by Bigtable. User data is isolated per-shard, so a search query only touches the relevant shard. Content-addressable storage deduplicates identical attachments across recipients. Elasticsearch handles near-real-time indexing for search features with a sub-second indexing lag.”
}
}
]
}