System Design Interview: Email System at Gmail Scale

⏱ 7 min read

Designing an email system is a favorite at Google (who built Gmail) and at companies that rely heavily on transactional email. It combines protocol knowledge, distributed storage, full-text search, and deliverability challenges at billion-user scale.

Email Protocols

Protocol	Purpose	Port
SMTP	Sending and relaying email between servers	25 (server-to-server), 587 (submission with auth)
IMAP	Client retrieves and manages mail on server (server stores master copy)	993 (TLS)
POP3	Client downloads and deletes mail from server (local copy)	995 (TLS)

Modern email clients (Gmail web, Outlook, Apple Mail) use IMAP or proprietary APIs (Gmail API, Microsoft Graph). SMTP is still the universal protocol for server-to-server delivery.

High-Level Architecture

Sending:
  [Client] → SMTP Submission (port 587, auth) → [Outbound SMTP Gateway]
                                                          ↓
                                              [DNS MX lookup → target server]
                                                          ↓
                                              [Target SMTP Server (inbound)]

Receiving:
  [Inbound SMTP Gateway] → [Spam filter] → [Message Queue (Kafka)]
                                                    ↓
                                          [Storage Service] → [Object Store]
                                                    ↓
                                          [Search Indexer → Elasticsearch]
                                                    ↓
                                          [Client (IMAP / Gmail API)]

Message Storage Design

Database Schema

-- Mailboxes and folders
CREATE TABLE mailboxes (
    id          BIGINT PRIMARY KEY,  -- user_id
    email       VARCHAR(255) UNIQUE NOT NULL,
    quota_bytes BIGINT DEFAULT 15 * 1024 * 1024 * 1024,  -- 15GB
    used_bytes  BIGINT DEFAULT 0
);

-- Threads (email conversations)
CREATE TABLE threads (
    id          BIGINT PRIMARY KEY,
    subject     TEXT NOT NULL,
    last_msg_at TIMESTAMP NOT NULL
);

-- Messages (metadata only — body stored in object store)
CREATE TABLE messages (
    id           BIGINT PRIMARY KEY,
    thread_id    BIGINT REFERENCES threads(id),
    mailbox_id   BIGINT REFERENCES mailboxes(id),
    from_addr    VARCHAR(255) NOT NULL,
    to_addrs     TEXT NOT NULL,        -- JSON array
    subject      TEXT,
    body_key     VARCHAR(255),         -- S3 object key
    size_bytes   INT NOT NULL,
    received_at  TIMESTAMP NOT NULL,
    labels       TEXT DEFAULT "[]",   -- JSON array: [INBOX, UNREAD, STARRED]
    is_read      BOOLEAN DEFAULT FALSE,
    spam_score   FLOAT
);

CREATE INDEX idx_messages_mailbox_thread ON messages(mailbox_id, thread_id);
CREATE INDEX idx_messages_mailbox_time ON messages(mailbox_id, received_at DESC);

Body Storage

Email bodies range from 1KB to 25MB (with attachments). Store raw MIME in object storage (S3/GCS) with the key stored in the messages table. Use content-addressable storage (SHA-256 hash of content as key) to deduplicate identical attachments sent to multiple recipients in the same domain.

import hashlib

def store_message_body(raw_mime: bytes) -> str:
    content_hash = hashlib.sha256(raw_mime).hexdigest()
    key = f"messages/{content_hash[:2]}/{content_hash}"

    if not s3.object_exists(key):
        s3.put(key, raw_mime, content_type="message/rfc822")

    return key   # store this key in the messages table

Inbox Zero: Thread Grouping

Gmail groups messages into conversations (threads) using the References and In-Reply-To headers from the MIME headers. Two messages are in the same thread if they share a Message-ID chain:

def assign_thread(msg) -> int:
    # Check if this is a reply to an existing thread
    in_reply_to = msg.headers.get("In-Reply-To")
    references = msg.headers.get("References", "").split()

    # Look for any referenced message in our DB
    for msg_id in ([in_reply_to] + references):
        if msg_id:
            parent = db.query(
                "SELECT thread_id FROM messages WHERE message_id = %s", msg_id
            )
            if parent:
                return parent.thread_id

    # No parent found — create a new thread
    return db.insert_thread(subject=msg.subject)

Full-Text Search

Gmail users search billions of messages. Traditional B-tree indexes are insufficient for full-text search across subjects, bodies, and senders.

# Elasticsearch index for email search
PUT /emails
{
  "mappings": {
    "properties": {
      "mailbox_id": { "type": "keyword" },
      "from_addr":  { "type": "keyword" },
      "to_addrs":   { "type": "keyword" },
      "subject":    { "type": "text", "analyzer": "english" },
      "body":       { "type": "text", "analyzer": "english" },
      "received_at":{ "type": "date" },
      "labels":     { "type": "keyword" },
      "is_read":    { "type": "boolean" }
    }
  }
}

# Search: all unread emails from Alice about Q4 report
{
  "query": {
    "bool": {
      "must": [
        { "term": { "mailbox_id": "user_123" } },
        { "term": { "is_read": false } },
        { "term": { "from_addr": "alice@company.com" } },
        { "match": { "subject": "Q4 report" } }
      ]
    }
  },
  "sort": [{ "received_at": "desc" }]
}

Spam Filtering Pipeline

Spam filtering happens before the message is stored:

def spam_pipeline(message) -> tuple[bool, float]:
    score = 0.0

    # Layer 1: IP reputation (fast, external blocklist)
    if ip_blocklist.contains(message.sender_ip):
        return True, 1.0   # immediate block

    # Layer 2: Authentication checks (SPF, DKIM, DMARC)
    if not verify_spf(message):
        score += 0.3
    if not verify_dkim(message):
        score += 0.3

    # Layer 3: Content analysis (ML model)
    features = extract_features(message)
    ml_score = spam_classifier.predict(features)  # LightGBM or BERT
    score = max(score, ml_score)

    # Layer 4: User-level feedback
    user_spam_rate = get_user_feedback_rate(message.from_addr, message.to_addrs)
    score = score * 0.7 + user_spam_rate * 0.3

    return score > 0.8, score

# SPF: DNS lookup to verify sending IP is authorized for sender domain
# DKIM: cryptographic signature in email header verified against DNS public key
# DMARC: policy for what to do when SPF/DKIM fail

Scaling to Billions of Mailboxes

Storage Sharding

Shard the messages table by mailbox_id. All messages for a user land on the same shard, enabling efficient single-user queries without cross-shard joins. Gmail uses Colossus (distributed file system) for raw storage and Bigtable for metadata.

Outbound Email Deliverability

IP warming: gradually increase sending volume from new IPs to build reputation
Dedicated IP pools per customer type (transactional vs marketing) — marketing spam does not hurt transactional reputation
Bounce handling: remove hard-bounced addresses from send lists immediately; too many bounces harm sender score
Unsubscribe: one-click unsubscribe required by Gmail and Yahoo (2024) for bulk senders

Interview Discussion Points

How do you handle attachment deduplication? Content-addressable storage (SHA-256 hash as key) — identical PDFs sent to a million users stored once
How does Gmail provide 1ms search on 15 years of email? Pre-built inverted index in Elasticsearch/Bigtable; user data is segmented per-user for isolated query performance
How do you handle the “Mark as spam” training loop? User spam feedback is aggregated per sender, per domain, and fed back as features into the spam classifier training pipeline

Frequently Asked Questions

What protocols does an email system use?

Three protocols handle email: SMTP (Simple Mail Transfer Protocol) sends email between servers (port 25) and handles client submission (port 587 with authentication). IMAP (Internet Message Access Protocol, port 993 with TLS) allows clients to retrieve and manage email while keeping the master copy on the server — supporting multiple devices synced to the same mailbox. POP3 (Post Office Protocol 3, port 995) downloads email to the local client and typically deletes it from the server — one device only. Modern email clients use IMAP or proprietary APIs (Gmail API). SMTP is the universal inter-server protocol.

How does spam filtering work in an email system?

Production spam filtering uses multiple layers. First, IP reputation checks against blocklists reject mail from known spam sources before any content analysis. Second, authentication checks verify that the sending IP is authorized to send for the sender domain (SPF — Sender Policy Framework), that the message has a valid cryptographic signature (DKIM — DomainKeys Identified Mail), and that the domain has a policy for handling authentication failures (DMARC). Third, ML-based content analysis extracts features from the subject, body, links, and HTML structure and scores the message with a trained classifier (LightGBM or BERT-based). Fourth, user-level feedback (mark as spam/not spam) is aggregated per sender and incorporated into personalized filtering.

How does Gmail scale to billions of mailboxes?

Gmail uses a tiered storage architecture: message metadata (sender, recipient, timestamps, labels, thread ID) in a horizontally sharded relational database (sharded by mailbox/user ID so all of one user data lives on the same shard), message bodies and attachments in Google Colossus (distributed file system) referenced by content-hash keys for deduplication, and full-text search in a custom search infrastructure backed by Bigtable. User data is isolated per-shard, so a search query only touches the relevant shard. Content-addressable storage deduplicates identical attachments across recipients. Elasticsearch handles near-real-time indexing for search features with a sub-second indexing lag.

Companies That Ask This Question

Stripe Engineering Interview Guide

Atlassian Engineering Interview Guide