Email Queue System Low-Level Design

Email Queue System — Low-Level Design

An email queue system buffers outgoing emails from application code, delivers them reliably through an external provider (SendGrid, SES), handles retries on failure, and prevents duplicate sends. This pattern is used everywhere: transactional emails, marketing campaigns, and system alerts.

Core Data Model

EmailMessage
  id              BIGSERIAL PK
  to_address      TEXT NOT NULL
  from_address    TEXT NOT NULL
  subject         TEXT NOT NULL
  body_html       TEXT
  body_text       TEXT
  template_id     TEXT              -- optional: use a template
  template_vars   JSONB
  idempotency_key TEXT UNIQUE NOT NULL  -- prevents duplicate sends
  status          TEXT DEFAULT 'queued'  -- queued, sending, sent, failed, cancelled
  attempt_count   INT DEFAULT 0
  next_attempt_at TIMESTAMPTZ DEFAULT NOW()
  provider_message_id TEXT          -- ID returned by SendGrid/SES
  sent_at         TIMESTAMPTZ
  created_at      TIMESTAMPTZ

EmailEvent
  id              BIGSERIAL PK
  message_id      BIGINT FK NOT NULL
  event_type      TEXT NOT NULL     -- delivered, opened, clicked, bounced, spam
  occurred_at     TIMESTAMPTZ
  metadata        JSONB

Enqueuing an Email

def enqueue_email(to, subject, template_id, template_vars, idempotency_key):
    """Idempotent: same key → returns existing record without re-queuing."""
    db.execute("""
        INSERT INTO EmailMessage
          (to_address, from_address, subject, template_id, template_vars,
           idempotency_key, status, created_at, next_attempt_at)
        VALUES
          (%(to)s, 'noreply@example.com', %(subject)s, %(tmpl)s, %(vars)s,
           %(key)s, 'queued', NOW(), NOW())
        ON CONFLICT (idempotency_key) DO NOTHING
    """, {'to': to, 'subject': subject, 'tmpl': template_id,
          'vars': json.dumps(template_vars), 'key': idempotency_key})

Delivery Worker

def process_email_batch(batch_size=100):
    # Claim a batch atomically
    messages = db.execute("""
        UPDATE EmailMessage
        SET status='sending', attempt_count=attempt_count+1
        WHERE id IN (
            SELECT id FROM EmailMessage
            WHERE status IN ('queued', 'failed')
              AND next_attempt_at <= NOW()
              AND attempt_count < 5
            ORDER BY next_attempt_at
            LIMIT %(batch_size)s
            FOR UPDATE SKIP LOCKED
        )
        RETURNING *
    """, {'batch_size': batch_size})

    for msg in messages:
        deliver_message(msg)

def deliver_message(msg):
    try:
        rendered = render_template(msg.template_id, msg.template_vars)
        response = sendgrid.send(
            to=msg.to_address,
            subject=msg.subject,
            html=rendered.html,
            text=rendered.text,
        )
        db.execute("""
            UPDATE EmailMessage
            SET status='sent', sent_at=NOW(),
                provider_message_id=%(pid)s
            WHERE id=%(id)s
        """, {'pid': response.message_id, 'id': msg.id})
    except (SendGridRateLimitError, TemporaryError):
        # Exponential backoff: 1min, 5min, 30min, 2h, 8h
        delays = [60, 300, 1800, 7200, 28800]
        delay = delays[min(msg.attempt_count, len(delays)-1)]
        db.execute("""
            UPDATE EmailMessage
            SET status='queued',
                next_attempt_at=NOW() + INTERVAL '%(delay)s seconds'
            WHERE id=%(id)s
        """, {'delay': delay, 'id': msg.id})
    except PermanentError:  # invalid address, etc.
        db.execute("UPDATE EmailMessage SET status='failed' WHERE id=%(id)s",
                   {'id': msg.id})

Handling Provider Webhooks (Delivery Events)

-- SendGrid/SES POSTs events to your webhook endpoint
def handle_provider_webhook(events):
    for event in events:
        message = db.get_by(EmailMessage, provider_message_id=event['sg_message_id'])
        if not message:
            return  # Unknown message, ignore

        db.execute("""
            INSERT INTO EmailEvent (message_id, event_type, occurred_at, metadata)
            VALUES (%(mid)s, %(type)s, %(ts)s, %(meta)s)
            ON CONFLICT DO NOTHING  -- idempotent: provider may send duplicates
        """, {'mid': message.id, 'type': event['event'],
              'ts': event['timestamp'], 'meta': json.dumps(event)})

        if event['event'] == 'bounce':
            # Suppress future emails to this address
            mark_address_undeliverable(event['email'])
        elif event['event'] == 'unsubscribe':
            update_marketing_consent(event['email'], granted=False)

Unsubscribe and Suppression List

EmailSuppression
  email           TEXT PK
  reason          TEXT    -- 'unsubscribe', 'hard_bounce', 'spam_report'
  created_at      TIMESTAMPTZ

def enqueue_email(to, ...):
    # Always check suppression before queuing
    if db.exists(EmailSuppression, email=to):
        log.info(f'Suppressed email to {to}')
        return None
    # proceed with enqueue

Key Interview Points

  • Idempotency key prevents duplicates: Application code passes a stable key (e.g., order-12345-confirmation) so retries and re-runs never double-send the same email.
  • FOR UPDATE SKIP LOCKED: Enables multiple parallel workers to claim disjoint batches without contention. Add workers freely to increase throughput.
  • Separate transactional from marketing: Use different sending domains and IP pools. Transactional emails (receipts, password resets) should never share IP reputation with marketing blasts.
  • Suppression before queuing: Check the suppression list at enqueue time, not at delivery time, to avoid wasting a queue slot and a provider API call on a known-bad address.

Email queue and transactional messaging design is discussed in Amazon system design interview questions.

Email queue and notification delivery design is covered in Shopify system design interview preparation.

Email queue and transactional email system design is discussed in Stripe system design interview guide.

Scroll to Top