Email Delivery System: Low-Level Design

⏱ 4 min read

A transactional email delivery system sends emails triggered by application events — welcome emails on registration, password reset links, order confirmations, and invoice notifications. At scale, the system must handle millions of emails per day, maintain high deliverability (avoiding spam filters), track delivery status, and manage bounce and unsubscribe lists. SendGrid, AWS SES, and Mailgun are managed solutions; understanding their internals is valuable for system design interviews.

Email Pipeline Architecture

Email sending is a multi-stage pipeline: (1) Trigger: application event (user registered) → publish to message queue (Kafka/SQS). (2) Email renderer: consumer reads the event, fetches the email template, renders it with the event data (user name, activation link). Templates stored in a template service (version-controlled HTML templates). (3) Deliverability service: performs pre-send checks: is the recipient on the suppression list (bounced before, unsubscribed)? Is the sending rate within provider limits? (4) SMTP sending: the email is sent to an SMTP relay (SendGrid, SES) or direct SMTP to recipient’s mail server. (5) Status tracking: delivery webhooks from the SMTP provider update the email_events table (delivered, bounced, opened, clicked, spam-reported). Async processing: email rendering and delivery are async — the application enqueues the email event and returns immediately. The user receives their email within seconds, but the application doesn’t block waiting for SMTP delivery confirmation.

SMTP and DNS Records for Deliverability

Email deliverability — emails landing in inbox vs. spam — depends on correct DNS configuration and sender reputation. Required DNS records: (1) SPF (Sender Policy Framework): a TXT record listing IP addresses authorized to send email for your domain. Receiving mail servers check that your email came from an authorized IP: v=spf1 include:sendgrid.net ~all. (2) DKIM (DomainKeys Identified Mail): your SMTP server signs each email with a private key; the public key is published as a DNS TXT record. Receiving servers verify the signature — proves the email wasn’t tampered with in transit. (3) DMARC (Domain-based Message Authentication, Reporting & Conformance): a TXT record specifying what to do if SPF or DKIM fails (none, quarantine, reject) and where to send aggregate reports. v=DMARC1; p=quarantine; rua=mailto:dmarc@example.com. Without these records, email from your domain is likely to be marked as spam. IP warming: when sending from a new IP, gradually increase volume over weeks — ISPs reject sudden high-volume sends from unknown IPs as potential spam.

Bounce and Unsubscribe Management

Sending email to invalid or unsubscribed addresses harms your sender reputation and may violate CAN-SPAM and GDPR. Bounce types: (1) Hard bounce: permanent delivery failure (email address doesn’t exist, domain doesn’t exist). Immediately add to the suppression list — never send to this address again. (2) Soft bounce: temporary failure (mailbox full, server temporarily unavailable). Retry with exponential backoff for 24-72 hours; if still failing, convert to hard bounce. Suppression list: a table (email_suppressions) storing addresses that should not receive email: (email, reason, suppressed_at). Check before every send. Unsubscribes: include an unsubscribe link in every marketing email (required by CAN-SPAM). The unsubscribe link points to a one-click unsubscribe endpoint that adds the address to the suppression list. Gmail’s list-unsubscribe header enables one-click unsubscription directly from the Gmail UI. Complaint handling: ISPs send spam complaint notifications (FBL — Feedback Loop). When a user marks your email as spam: add to suppression list immediately. A complaint rate above 0.1% causes ISPs to block your sending IP.

Email Tracking and Analytics

Track email engagement to measure campaign effectiveness and identify deliverability issues. Open tracking: embed a 1×1 pixel image in the email body with a unique URL (tracking.example.com/open/{email_id}). When the email client loads the image, the tracking server records the open event. Limitation: image loading is blocked by many email clients (Gmail images are proxied through Google’s servers — the tracking pixel fires on Gmail’s servers, not the recipient’s client). Open rates are directional signals, not exact counts. Click tracking: wrap all links in the email with a redirect URL (tracking.example.com/click/{email_id}/{link_id}). The redirect server records the click and redirects to the original URL. More reliable than open tracking — links must be clicked, not just loaded. Analytics dashboard: track per-campaign metrics: sent count, delivered count, bounce rate, open rate, click-through rate, unsubscribe rate, spam complaint rate. Alert if bounce rate > 5% or complaint rate > 0.1% — these indicate deliverability problems requiring immediate action.

Rate Limiting and Priority Queues

SMTP providers enforce sending rate limits (e.g., 100 emails/second on SendGrid’s shared IP pool; higher on dedicated IPs). Application-level rate limiting prevents exceeding provider limits and manages priority. Priority queues: high-priority emails (password reset, security alerts) should be delivered immediately even if a batch campaign is in progress. Implementation: separate queues by priority — critical (password_reset), transactional (order_confirmation), marketing (newsletter). Workers process critical queue first; marketing emails are rate-limited to not crowd out transactional. Dedicated IP pools: use separate sending IP pools for transactional and marketing email. Marketing email has a lower reputation (higher complaint rates from promotional content); its spam folder placement should not affect transactional email deliverability. SendGrid and SES support configuring multiple IP pools per account. Scheduling: batch newsletters are scheduled during off-peak hours (early morning, avoiding prime business hours) to avoid rate limit conflicts with real-time transactional sends.