Email Delivery System Low-Level Design

Requirements

  • Send transactional emails (password reset, order confirmation, notifications) and marketing emails (campaigns, newsletters)
  • 100M emails/day, peak 10K/second during campaigns
  • Delivery within 30 seconds for transactional, best-effort for marketing
  • Track delivery status: sent, delivered, bounced, spam-complained, opened, clicked
  • Respect unsubscribes; maintain sender reputation

Architecture

Service → Email API → Kafka (email_jobs topic)
                    → Email Worker Pool
                      → SMTP Relay (SendGrid / SES / Mailgun)
                      → Delivery Status Webhook from SMTP relay
                      → Status DB (PostgreSQL)
                      → Event Kafka (email_events) → Analytics

Data Model

EmailJob(job_id UUID, type ENUM(TRANSACTIONAL,MARKETING), recipient_id,
         to_address, from_address, subject, template_id, template_vars JSONB,
         scheduled_at, status ENUM(PENDING,SENT,DELIVERED,BOUNCED,FAILED),
         provider_message_id, created_at, sent_at)

EmailTemplate(template_id, name, subject_template, html_template, text_template,
              version INT, created_at)

EmailSuppression(email_address, reason ENUM(BOUNCE,SPAM,UNSUBSCRIBE),
                 added_at, source)

Transactional vs Marketing Email

Transactional emails (triggered by user action): highest priority, sent immediately, not subject to unsubscribe (legal exception in most jurisdictions). Examples: password reset, order shipped, 2FA code. Marketing emails (promotional): lower priority, respect unsubscribe, rate-limited to avoid spam classification, sent in batches during business hours. Keep transactional and marketing in separate Kafka topics (different consumer groups with different priorities) and use separate sending domains and IPs (protect transactional reputation from marketing complaints).

Template Rendering

Render email templates at send time, not at job creation time. This allows template updates to take effect on scheduled campaigns. Template engine: Jinja2/Handlebars — render HTML with template_vars. Always generate both HTML and plaintext versions (some clients prefer plaintext; improves spam score). Inline CSS (many email clients strip <style> blocks): use a CSS inliner before sending. Track links: replace all URLs with tracking URLs (https://track.example.com/c/{job_id}/{link_hash}) for click tracking.

Sending and Reputation Management

Email reputation is fragile — a high bounce or spam complaint rate causes ISPs to block your domain. Key practices:

  • Bounce handling: hard bounces (permanent: address doesn’t exist) → add to suppression list immediately. Soft bounces (temporary: mailbox full) → retry 3 times over 24h, then suppress.
  • Spam complaint handling: mailbox providers send complaints via FBL (Feedback Loop). Add complainers to suppression list; never email them again.
  • Suppression list check: before sending, check if the recipient is in EmailSuppression. Reject the job without sending.
  • Rate limiting per domain: warm up new IPs gradually. Limit to 1K/hour for new IPs, increasing over weeks.
  • SPF, DKIM, DMARC: authenticate outbound email to prevent spoofing and improve deliverability.

Delivery Status Tracking

SMTP relays (SendGrid, SES) send webhooks on delivery events: delivered, bounced, spam_report, open, click. Ingest these via a webhook endpoint → Kafka → status processor → update EmailJob.status. Store all events in an EmailEvent table for analytics. Open tracking: embed a 1×1 pixel image with a unique URL (https://track.example.com/o/{job_id}) — when the email client loads the image, the open is recorded. Note: Apple Mail Privacy Protection pre-fetches tracking pixels, inflating open rates — treat opens as an unreliable metric.

Campaign Sending

Marketing campaigns send to millions of recipients. Never send all at once — ISPs rate-limit bulk senders. Campaign scheduler: send at 10K/minute, distributed across 30-60 minutes. Use time zone targeting: deliver at 10am local time for each recipient. Batch job: SELECT recipient_ids FROM campaign_recipients WHERE campaign_id=X AND sent_at IS NULL LIMIT 1000, process in chunks. Mark sent_at to prevent re-sending on restart (idempotent job).

Key Design Decisions

  • Separate sending infrastructure for transactional vs marketing — reputation isolation
  • Suppression list check before every send — CAN-SPAM/GDPR compliance
  • Webhook status callbacks from relay — async delivery confirmation, no polling
  • Render templates at send time, not creation time — enables template updates for scheduled campaigns
  • Gradual campaign sending — protects sender reputation, respects ISP rate limits


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the difference between transactional and marketing email delivery?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Transactional emails are triggered by user actions and contain information the user requested or expects: password reset, order confirmation, shipping notification, 2FA code. They must be delivered immediately (target: < 30 seconds), are not subject to unsubscribe in most jurisdictions (CAN-SPAM exception for transactional messages), and have strict latency SLAs. Marketing emails are promotional, sent in bulk to opted-in lists: newsletters, discount offers, feature announcements. They must include an unsubscribe link, should be rate-limited to avoid spam classification, and can tolerate higher latency. Critical separation: use different sending domains, IPs, and sending pools for each type. If your marketing campaign generates spam complaints, it should not affect your transactional sender reputation (which would delay password reset emails). ISPs track reputation per domain and per IP.”}},{“@type”:”Question”,”name”:”How do bounce handling and suppression lists protect email sender reputation?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Sender reputation determines whether ISPs deliver your email to the inbox or spam folder. High bounce rates (> 2%) or spam complaint rates (> 0.1%) trigger ISP throttling or blocking. Bounce types: hard bounce — permanent failure (address doesn't exist, domain invalid). Always suppress immediately and never retry. Soft bounce — temporary failure (mailbox full, server temporarily unavailable). Retry up to 3 times over 24 hours; if still failing, suppress. Suppression list: a blocklist of email addresses that should never be emailed. Add to suppression on: hard bounce, spam complaint (FBL — Feedback Loop reports from mailbox providers), manual unsubscribe. Check the suppression list before every send and reject the job silently. Sending to known bad addresses wastes resources and harms reputation. Keep suppression lists permanent — never remove an address that complained or hard-bounced.”}},{“@type”:”Question”,”name”:”How does email open and click tracking work technically?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Open tracking: embed a 1×1 transparent pixel image in the HTML email body. The image URL is unique per email: https://track.example.com/open/{email_job_id}. When the email client loads the image, the tracking server records the open (email_job_id, timestamp, user_agent, ip). Limitation: Apple Mail Privacy Protection (MPP) pre-fetches tracking pixels via Apple's proxy servers, inflating open rates for Apple Mail users. Email clients with image blocking (Outlook default) never load the pixel — undercounts. Click tracking: replace all links in the email with tracking URLs: https://track.example.com/click/{email_job_id}/{link_hash}. On click: redirect to the original URL and record the click event. This captures true click intent (unlike opens, which can be inflated by prefetching). Both techniques use URL redirects, so the tracking server sees the click/open before the user reaches the destination.”}},{“@type”:”Question”,”name”:”How do SPF, DKIM, and DMARC improve email deliverability?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”SPF (Sender Policy Framework): a DNS TXT record that lists IP addresses authorized to send email for a domain. Receiving servers check if the sending IP is in the SPF record. Prevents IP spoofing. Example: v=spf1 include:sendgrid.net ~all. DKIM (DomainKeys Identified Mail): the sending server signs the email body and headers with a private key. The receiving server verifies the signature using the public key from DNS. Ensures the email was not tampered with in transit and authenticates the sender's domain. DMARC (Domain-based Message Authentication, Reporting & Conformance): a policy layer on top of SPF and DKIM. Tells receiving servers what to do when SPF/DKIM checks fail (none/quarantine/reject). Also provides aggregate reports of authentication results. Configure: start with p=none (monitoring), move to p=quarantine, then p=reject. All three together make phishing and spoofing of your domain much harder and improve deliverability to major ISPs.”}},{“@type”:”Question”,”name”:”How do you send 10 million marketing emails without being flagged as spam?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Bulk email requires careful execution: (1) List hygiene: only email addresses that have explicitly opted in. Remove addresses that haven't engaged in 6 months (reduces complaints). (2) Gradual sending rate: send at 10-50K/hour initially, not all at once. ISPs have per-hour limits. Sending 10M at once triggers rate limits and spam filters. (3) IP warming: new IPs must be warmed up — start at 1K/day, double every 2 days over several weeks. Sudden high volume from a new IP is a spam signal. (4) Separate sending domain: use a subdomain (mail.example.com) for bulk, keeping the main domain clean. (5) Unsubscribe handling: one-click unsubscribe (RFC 8058 List-Unsubscribe-Post header). Process unsubscribes within 10 days (CAN-SPAM) or immediately (best practice). (6) Engagement-based sending: send to engaged users first. ISPs use engagement signals (opens, clicks) as positive reputation indicators. Stop sending to non-openers after 3-6 months.”}}]}

Shopify system design covers transactional email and notification delivery. See common questions for Shopify interview: email and notification delivery system design.

Amazon system design covers large-scale email and notification delivery. Review patterns for Amazon interview: email delivery and notification system design.

LinkedIn system design covers email notification and messaging at scale. See design patterns for LinkedIn interview: email and messaging system design.

Scroll to Top