“Design a spam classifier” is one of the most common ML system design questions at Google, Meta, and Microsoft. Unlike pure algorithm questions, this tests your ability to scope a complete ML system: data collection, feature engineering, model selection, serving architecture, feedback loops, and adversarial robustness.
Step 1: Clarify Requirements
Before jumping to models, ask:
- What is spam? Email spam, SMS, social media posts, comments, reviews, ads? The definition shapes features and labels entirely.
- What’s the precision/recall trade-off? False positives (legitimate email marked spam) are worse than false negatives for most users. What’s acceptable?
- What’s the latency requirement? Email classification can tolerate 500ms; SMS must be real-time (<50ms).
- Volume? Gmail processes 300 billion emails/day. This shapes serving infrastructure.
- Languages and domains? English-only vs multilingual determines tokenization and embedding choices.
Reasonable assumptions: email spam classifier, 100M emails/day, 50ms P99 latency, English+Spanish, false positive rate must be <0.1%.
Step 2: Data Collection and Labeling
Sources of labeled data:
- User feedback: “Mark as spam” / “Not spam” buttons — high quality but sparse, biased toward visible spam
- Honeypot accounts: email addresses published online to attract spam; all received mail is labeled spam
- Manual review queue: internal team labels borderline cases
- Third-party datasets: SpamAssassin, Enron corpus (use with care — 2000s spam patterns)
Label quality issues:
- User disagreement: marketing email labeled spam by some, legitimate by others
- Temporal staleness: spam patterns evolve; labels from 6 months ago may be misleading
- Selection bias: users rarely mark missed spam; you only see what they report
Step 3: Feature Engineering
Heuristic signals (fast, interpretable, high precision for obvious spam):
- Sender domain reputation score (queried from reputation DB)
- SPF/DKIM/DMARC authentication pass/fail
- Sender’s historical spam rate from this account
- Reply-to domain != From domain
- Number of recipients (bulk sending pattern)
- URL count; presence of known malicious domains
- HTML-to-text ratio (spam often has excessive HTML)
Content features (for ML model):
- TF-IDF bag-of-words on subject + body
- Character n-grams (catch obfuscation: “V1agra”, “fr3e”)
- Subject line features: ALL CAPS ratio, excessive punctuation, urgency words
- Embedding from pre-trained model (BERT, or domain-fine-tuned model)
Behavioral signals:
- Open rate for this sender across all recipients
- Reply rate, unsubscribe rate
- Graph features: has the sender interacted with this recipient before?
Step 4: Model Selection
Two-stage architecture (industry standard):
Stage 1 — Rule-based pre-filter (blocks ~60-70% of obvious spam):
- Known spam IP blocklist
- DNS-based blocklist (DNSBL) lookup
- SpamAssassin-style scoring rules
- Handles ~60% of volume with no ML cost; very low latency
Stage 2 — ML classifier (for borderline cases):
| Model | Pros | Cons |
|---|---|---|
| Naive Bayes | Extremely fast, interpretable, handles high-dimensional text well | Independence assumption violated in practice |
| Logistic Regression + TF-IDF | Fast, sparse, good baseline, interpretable coefficients | Misses semantic meaning, no cross-feature interactions |
| Gradient Boosted Trees (LightGBM) | Handles mixed features (text + behavioral + metadata), fast serving | No direct text sequence modeling |
| BERT fine-tuned | Best accuracy, understands context and obfuscation | High latency (100-400ms), expensive to serve |
Recommended stack for 50ms P99:
LightGBM on TF-IDF + heuristic features for 95% of traffic. Route low-confidence predictions to a distilled BERT model. Cache sender reputation scores.
Step 5: Evaluation Framework
from sklearn.metrics import precision_recall_curve, roc_auc_score
import numpy as np
def evaluate_spam_classifier(y_true, y_scores, false_positive_budget=0.001):
"""
For spam classification, we typically operate at a fixed false positive rate.
Find the threshold that maximizes recall while keeping FPR <= budget.
"""
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_true, y_scores)
# Find highest threshold where FPR <= budget
valid_mask = fpr = optimal_threshold).astype(int)
from sklearn.metrics import classification_report
print(f"Operating threshold: {optimal_threshold:.4f}")
print(f"False Positive Rate: {fpr[best_idx]:.4f}")
print(f"True Positive Rate (Recall): {tpr[best_idx]:.4f}")
print(f"AUC-ROC: {roc_auc_score(y_true, y_scores):.4f}")
print(classification_report(y_true, y_pred, target_names=['Ham', 'Spam']))
return optimal_threshold
Step 6: Serving Architecture
User sends email
↓
DNS/IP Blocklist Check (< 1ms) ──→ Block immediately if in blocklist
↓
Reputation Service (async lookup, ~2ms)
↓
Feature Extraction Service
- Parse email headers/body
- TF-IDF vectorization
- Behavioral feature lookup (Redis cache)
↓
LightGBM Inference (< 5ms)
score 0.8: → Spam folder
0.2-0.8: → BERT re-score (< 40ms)
↓
Feedback collector logs decision
↓
Async: update sender reputation, log for retraining
Step 7: Adversarial Robustness
Spammers adapt. Your defense:
- Text obfuscation: “V!agra” → character n-grams catch this better than word-level features
- Image spam: embed spam text in images — requires OCR pipeline or image classification
- Adversarial examples: adding innocuous words to fool classifiers — monitor for distribution shift in features that change without changing semantics
- Account hijacking: use trusted accounts to send spam — behavioral signals (sudden change in volume/recipients) are key
Step 8: Continuous Learning
- Retrain weekly on sliding window of recent labeled data + fixed sample of historical data
- A/B test new model vs. current champion on 5% of traffic before full rollout
- Shadow mode: run new model alongside current; compare decisions before switching
- Monitor for false positive regression — new model must not increase legitimate email blocked rate
Depth Levels
Junior: Describe features you’d use and choose a model. Discuss precision/recall trade-off.
Senior: Design two-stage pipeline, discuss serving latency, describe retraining loop.
Staff: Handle adversarial robustness, multilingual spam, user-level personalization (different spam thresholds per user), and regulatory constraints (GDPR for behavioral signal storage).
Related ML Topics
- NLP Interview Questions — TF-IDF, BERT fine-tuning, and tokenization trade-offs all appear in spam classifier design; BPE handles obfuscation better than word-level tokenization
- Handling Imbalanced Datasets — spam is typically 1-5 percent of email volume; scale_pos_weight, SMOTE, and focal loss are all applicable
- Classification Metrics — spam classifiers operate at fixed false positive rate; precision/recall at threshold and AUC-ROC are the primary evaluation metrics
- How to Detect Model Drift in Production — spammer adaptation is a form of concept drift; prediction score distribution monitoring catches it early
- ML System Design: Build a Fraud Detection System — spam and fraud detection share the same architecture patterns: rule-based pre-filter, ML scorer, adversarial adaptation challenges