Email Queue System — Low-Level Design
An email queue system buffers outgoing emails from application code, delivers them reliably through an external provider (SendGrid, SES), handles retries on failure, and prevents duplicate sends. This pattern is used everywhere: transactional emails, marketing campaigns, and system alerts.
Core Data Model
EmailMessage
id BIGSERIAL PK
to_address TEXT NOT NULL
from_address TEXT NOT NULL
subject TEXT NOT NULL
body_html TEXT
body_text TEXT
template_id TEXT -- optional: use a template
template_vars JSONB
idempotency_key TEXT UNIQUE NOT NULL -- prevents duplicate sends
status TEXT DEFAULT 'queued' -- queued, sending, sent, failed, cancelled
attempt_count INT DEFAULT 0
next_attempt_at TIMESTAMPTZ DEFAULT NOW()
provider_message_id TEXT -- ID returned by SendGrid/SES
sent_at TIMESTAMPTZ
created_at TIMESTAMPTZ
EmailEvent
id BIGSERIAL PK
message_id BIGINT FK NOT NULL
event_type TEXT NOT NULL -- delivered, opened, clicked, bounced, spam
occurred_at TIMESTAMPTZ
metadata JSONB
Enqueuing an Email
def enqueue_email(to, subject, template_id, template_vars, idempotency_key):
"""Idempotent: same key → returns existing record without re-queuing."""
db.execute("""
INSERT INTO EmailMessage
(to_address, from_address, subject, template_id, template_vars,
idempotency_key, status, created_at, next_attempt_at)
VALUES
(%(to)s, 'noreply@example.com', %(subject)s, %(tmpl)s, %(vars)s,
%(key)s, 'queued', NOW(), NOW())
ON CONFLICT (idempotency_key) DO NOTHING
""", {'to': to, 'subject': subject, 'tmpl': template_id,
'vars': json.dumps(template_vars), 'key': idempotency_key})
Delivery Worker
def process_email_batch(batch_size=100):
# Claim a batch atomically
messages = db.execute("""
UPDATE EmailMessage
SET status='sending', attempt_count=attempt_count+1
WHERE id IN (
SELECT id FROM EmailMessage
WHERE status IN ('queued', 'failed')
AND next_attempt_at <= NOW()
AND attempt_count < 5
ORDER BY next_attempt_at
LIMIT %(batch_size)s
FOR UPDATE SKIP LOCKED
)
RETURNING *
""", {'batch_size': batch_size})
for msg in messages:
deliver_message(msg)
def deliver_message(msg):
try:
rendered = render_template(msg.template_id, msg.template_vars)
response = sendgrid.send(
to=msg.to_address,
subject=msg.subject,
html=rendered.html,
text=rendered.text,
)
db.execute("""
UPDATE EmailMessage
SET status='sent', sent_at=NOW(),
provider_message_id=%(pid)s
WHERE id=%(id)s
""", {'pid': response.message_id, 'id': msg.id})
except (SendGridRateLimitError, TemporaryError):
# Exponential backoff: 1min, 5min, 30min, 2h, 8h
delays = [60, 300, 1800, 7200, 28800]
delay = delays[min(msg.attempt_count, len(delays)-1)]
db.execute("""
UPDATE EmailMessage
SET status='queued',
next_attempt_at=NOW() + INTERVAL '%(delay)s seconds'
WHERE id=%(id)s
""", {'delay': delay, 'id': msg.id})
except PermanentError: # invalid address, etc.
db.execute("UPDATE EmailMessage SET status='failed' WHERE id=%(id)s",
{'id': msg.id})
Handling Provider Webhooks (Delivery Events)
-- SendGrid/SES POSTs events to your webhook endpoint
def handle_provider_webhook(events):
for event in events:
message = db.get_by(EmailMessage, provider_message_id=event['sg_message_id'])
if not message:
return # Unknown message, ignore
db.execute("""
INSERT INTO EmailEvent (message_id, event_type, occurred_at, metadata)
VALUES (%(mid)s, %(type)s, %(ts)s, %(meta)s)
ON CONFLICT DO NOTHING -- idempotent: provider may send duplicates
""", {'mid': message.id, 'type': event['event'],
'ts': event['timestamp'], 'meta': json.dumps(event)})
if event['event'] == 'bounce':
# Suppress future emails to this address
mark_address_undeliverable(event['email'])
elif event['event'] == 'unsubscribe':
update_marketing_consent(event['email'], granted=False)
Unsubscribe and Suppression List
EmailSuppression
email TEXT PK
reason TEXT -- 'unsubscribe', 'hard_bounce', 'spam_report'
created_at TIMESTAMPTZ
def enqueue_email(to, ...):
# Always check suppression before queuing
if db.exists(EmailSuppression, email=to):
log.info(f'Suppressed email to {to}')
return None
# proceed with enqueue
Key Interview Points
- Idempotency key prevents duplicates: Application code passes a stable key (e.g., order-12345-confirmation) so retries and re-runs never double-send the same email.
- FOR UPDATE SKIP LOCKED: Enables multiple parallel workers to claim disjoint batches without contention. Add workers freely to increase throughput.
- Separate transactional from marketing: Use different sending domains and IP pools. Transactional emails (receipts, password resets) should never share IP reputation with marketing blasts.
- Suppression before queuing: Check the suppression list at enqueue time, not at delivery time, to avoid wasting a queue slot and a provider API call on a known-bad address.
Email queue and transactional messaging design is discussed in Amazon system design interview questions.
Email queue and notification delivery design is covered in Shopify system design interview preparation.
Email queue and transactional email system design is discussed in Stripe system design interview guide.