Email Queue System — Low-Level Design
An email queue system buffers outgoing emails from application code, delivers them reliably through an external provider (SendGrid, SES), handles retries on failure, and prevents duplicate sends. This pattern is used everywhere: transactional emails, marketing campaigns, and system alerts.
Core Data Model
EmailMessage
id BIGSERIAL PK
to_address TEXT NOT NULL
from_address TEXT NOT NULL
subject TEXT NOT NULL
body_html TEXT
body_text TEXT
template_id TEXT -- optional: use a template
template_vars JSONB
idempotency_key TEXT UNIQUE NOT NULL -- prevents duplicate sends
status TEXT DEFAULT 'queued' -- queued, sending, sent, failed, cancelled
attempt_count INT DEFAULT 0
next_attempt_at TIMESTAMPTZ DEFAULT NOW()
provider_message_id TEXT -- ID returned by SendGrid/SES
sent_at TIMESTAMPTZ
created_at TIMESTAMPTZ
EmailEvent
id BIGSERIAL PK
message_id BIGINT FK NOT NULL
event_type TEXT NOT NULL -- delivered, opened, clicked, bounced, spam
occurred_at TIMESTAMPTZ
metadata JSONB
Enqueuing an Email
def enqueue_email(to, subject, template_id, template_vars, idempotency_key):
"""Idempotent: same key → returns existing record without re-queuing."""
db.execute("""
INSERT INTO EmailMessage
(to_address, from_address, subject, template_id, template_vars,
idempotency_key, status, created_at, next_attempt_at)
VALUES
(%(to)s, 'noreply@example.com', %(subject)s, %(tmpl)s, %(vars)s,
%(key)s, 'queued', NOW(), NOW())
ON CONFLICT (idempotency_key) DO NOTHING
""", {'to': to, 'subject': subject, 'tmpl': template_id,
'vars': json.dumps(template_vars), 'key': idempotency_key})
Delivery Worker
def process_email_batch(batch_size=100):
# Claim a batch atomically
messages = db.execute("""
UPDATE EmailMessage
SET status='sending', attempt_count=attempt_count+1
WHERE id IN (
SELECT id FROM EmailMessage
WHERE status IN ('queued', 'failed')
AND next_attempt_at <= NOW()
AND attempt_count < 5
ORDER BY next_attempt_at
LIMIT %(batch_size)s
FOR UPDATE SKIP LOCKED
)
RETURNING *
""", {'batch_size': batch_size})
for msg in messages:
deliver_message(msg)
def deliver_message(msg):
try:
rendered = render_template(msg.template_id, msg.template_vars)
response = sendgrid.send(
to=msg.to_address,
subject=msg.subject,
html=rendered.html,
text=rendered.text,
)
db.execute("""
UPDATE EmailMessage
SET status='sent', sent_at=NOW(),
provider_message_id=%(pid)s
WHERE id=%(id)s
""", {'pid': response.message_id, 'id': msg.id})
except (SendGridRateLimitError, TemporaryError):
# Exponential backoff: 1min, 5min, 30min, 2h, 8h
delays = [60, 300, 1800, 7200, 28800]
delay = delays[min(msg.attempt_count, len(delays)-1)]
db.execute("""
UPDATE EmailMessage
SET status='queued',
next_attempt_at=NOW() + INTERVAL '%(delay)s seconds'
WHERE id=%(id)s
""", {'delay': delay, 'id': msg.id})
except PermanentError: # invalid address, etc.
db.execute("UPDATE EmailMessage SET status='failed' WHERE id=%(id)s",
{'id': msg.id})
Handling Provider Webhooks (Delivery Events)
-- SendGrid/SES POSTs events to your webhook endpoint
def handle_provider_webhook(events):
for event in events:
message = db.get_by(EmailMessage, provider_message_id=event['sg_message_id'])
if not message:
return # Unknown message, ignore
db.execute("""
INSERT INTO EmailEvent (message_id, event_type, occurred_at, metadata)
VALUES (%(mid)s, %(type)s, %(ts)s, %(meta)s)
ON CONFLICT DO NOTHING -- idempotent: provider may send duplicates
""", {'mid': message.id, 'type': event['event'],
'ts': event['timestamp'], 'meta': json.dumps(event)})
if event['event'] == 'bounce':
# Suppress future emails to this address
mark_address_undeliverable(event['email'])
elif event['event'] == 'unsubscribe':
update_marketing_consent(event['email'], granted=False)
Unsubscribe and Suppression List
EmailSuppression
email TEXT PK
reason TEXT -- 'unsubscribe', 'hard_bounce', 'spam_report'
created_at TIMESTAMPTZ
def enqueue_email(to, ...):
# Always check suppression before queuing
if db.exists(EmailSuppression, email=to):
log.info(f'Suppressed email to {to}')
return None
# proceed with enqueue
Key Interview Points
- Idempotency key prevents duplicates: Application code passes a stable key (e.g., order-12345-confirmation) so retries and re-runs never double-send the same email.
- FOR UPDATE SKIP LOCKED: Enables multiple parallel workers to claim disjoint batches without contention. Add workers freely to increase throughput.
- Separate transactional from marketing: Use different sending domains and IP pools. Transactional emails (receipts, password resets) should never share IP reputation with marketing blasts.
- Suppression before queuing: Check the suppression list at enqueue time, not at delivery time, to avoid wasting a queue slot and a provider API call on a known-bad address.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do you prevent the same email from being sent twice?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use an idempotency key on the EmailMessage record with a UNIQUE constraint and INSERT … ON CONFLICT DO NOTHING. The application generates a stable key before inserting: for a password reset, use reset-{user_id}-{token}; for an order confirmation, use order-{order_id}-confirmation. If the job that enqueues the email runs twice (due to a retry), the second INSERT is a no-op. The email is delivered exactly once. Without idempotency keys, any retry logic (task queue retries, cronjob failures) can cause duplicate sends.”}},{“@type”:”Question”,”name”:”How do you retry failed email deliveries with exponential backoff?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Store next_attempt_at on the EmailMessage record. After a failed delivery attempt, compute the delay based on attempt_count: [60, 300, 1800, 7200, 28800] seconds for attempts 1-5. After 5 attempts: set status=failed and alert the team. The delivery worker queries WHERE status=queued AND next_attempt_at<=NOW() — it naturally picks up retries when they come due. Use FOR UPDATE SKIP LOCKED so multiple workers process independent messages in parallel without double-delivering.”}},{“@type”:”Question”,”name”:”How do you handle bounced or unsubscribed email addresses?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Receive delivery events from your provider (SendGrid, SES) via webhook. On hard bounce (permanent delivery failure): add the address to an EmailSuppression table and never send to it again. On spam report: same suppression. On unsubscribe: update the user’s marketing consent preferences in addition to suppressing. Check the suppression list at enqueue time (before inserting to the queue) rather than at delivery time — this saves a wasted queue slot and a provider API call. Respect suppressions for transactional emails too if the bounce indicates an invalid address.”}},{“@type”:”Question”,”name”:”How do you scale email delivery to millions per day?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Run multiple delivery worker processes — FOR UPDATE SKIP LOCKED allows horizontal scaling without double-delivery. Batch API calls to SendGrid/SES: most providers accept 1,000 recipients per API call. Separate transactional from marketing sends: use different IP pools (warm IPs for transactional, separate pool for bulk) and separate sending domains to protect deliverability. For very high volume (100M+/day): use dedicated IPs, implement feedback loops with ISPs, monitor bounce rates per IP, and throttle per-IP send rates to avoid blacklisting.”}},{“@type”:”Question”,”name”:”How do you track email open and click rates?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For opens: embed a 1×1 tracking pixel (GET /track/open?mid={message_id}). When loaded, record an EmailEvent with event_type=opened. Note: privacy-conscious email clients (Apple Mail Privacy Protection) pre-fetch tracking pixels, so open rates are inflated. For clicks: replace links with redirect URLs (/track/click?mid={message_id}&url={encoded_url}). On click: record an EmailEvent with event_type=clicked and the target URL, then redirect. Store events in EmailEvent table. For aggregate reporting: join EmailMessage with EmailEvent to compute per-campaign open and click rates.”}}]}
Email queue and transactional messaging design is discussed in Amazon system design interview questions.
Email queue and notification delivery design is covered in Shopify system design interview preparation.
Email queue and transactional email system design is discussed in Stripe system design interview guide.