Email Queue System Low-Level Design

Email Queue System — Low-Level Design

An email queue system buffers outgoing emails from application code, delivers them reliably through an external provider (SendGrid, SES), handles retries on failure, and prevents duplicate sends. This pattern is used everywhere: transactional emails, marketing campaigns, and system alerts.

Core Data Model

EmailMessage
  id              BIGSERIAL PK
  to_address      TEXT NOT NULL
  from_address    TEXT NOT NULL
  subject         TEXT NOT NULL
  body_html       TEXT
  body_text       TEXT
  template_id     TEXT              -- optional: use a template
  template_vars   JSONB
  idempotency_key TEXT UNIQUE NOT NULL  -- prevents duplicate sends
  status          TEXT DEFAULT 'queued'  -- queued, sending, sent, failed, cancelled
  attempt_count   INT DEFAULT 0
  next_attempt_at TIMESTAMPTZ DEFAULT NOW()
  provider_message_id TEXT          -- ID returned by SendGrid/SES
  sent_at         TIMESTAMPTZ
  created_at      TIMESTAMPTZ

EmailEvent
  id              BIGSERIAL PK
  message_id      BIGINT FK NOT NULL
  event_type      TEXT NOT NULL     -- delivered, opened, clicked, bounced, spam
  occurred_at     TIMESTAMPTZ
  metadata        JSONB

Enqueuing an Email

def enqueue_email(to, subject, template_id, template_vars, idempotency_key):
    """Idempotent: same key → returns existing record without re-queuing."""
    db.execute("""
        INSERT INTO EmailMessage
          (to_address, from_address, subject, template_id, template_vars,
           idempotency_key, status, created_at, next_attempt_at)
        VALUES
          (%(to)s, 'noreply@example.com', %(subject)s, %(tmpl)s, %(vars)s,
           %(key)s, 'queued', NOW(), NOW())
        ON CONFLICT (idempotency_key) DO NOTHING
    """, {'to': to, 'subject': subject, 'tmpl': template_id,
          'vars': json.dumps(template_vars), 'key': idempotency_key})

Delivery Worker

def process_email_batch(batch_size=100):
    # Claim a batch atomically
    messages = db.execute("""
        UPDATE EmailMessage
        SET status='sending', attempt_count=attempt_count+1
        WHERE id IN (
            SELECT id FROM EmailMessage
            WHERE status IN ('queued', 'failed')
              AND next_attempt_at <= NOW()
              AND attempt_count < 5
            ORDER BY next_attempt_at
            LIMIT %(batch_size)s
            FOR UPDATE SKIP LOCKED
        )
        RETURNING *
    """, {'batch_size': batch_size})

    for msg in messages:
        deliver_message(msg)

def deliver_message(msg):
    try:
        rendered = render_template(msg.template_id, msg.template_vars)
        response = sendgrid.send(
            to=msg.to_address,
            subject=msg.subject,
            html=rendered.html,
            text=rendered.text,
        )
        db.execute("""
            UPDATE EmailMessage
            SET status='sent', sent_at=NOW(),
                provider_message_id=%(pid)s
            WHERE id=%(id)s
        """, {'pid': response.message_id, 'id': msg.id})
    except (SendGridRateLimitError, TemporaryError):
        # Exponential backoff: 1min, 5min, 30min, 2h, 8h
        delays = [60, 300, 1800, 7200, 28800]
        delay = delays[min(msg.attempt_count, len(delays)-1)]
        db.execute("""
            UPDATE EmailMessage
            SET status='queued',
                next_attempt_at=NOW() + INTERVAL '%(delay)s seconds'
            WHERE id=%(id)s
        """, {'delay': delay, 'id': msg.id})
    except PermanentError:  # invalid address, etc.
        db.execute("UPDATE EmailMessage SET status='failed' WHERE id=%(id)s",
                   {'id': msg.id})

Handling Provider Webhooks (Delivery Events)

-- SendGrid/SES POSTs events to your webhook endpoint
def handle_provider_webhook(events):
    for event in events:
        message = db.get_by(EmailMessage, provider_message_id=event['sg_message_id'])
        if not message:
            return  # Unknown message, ignore

        db.execute("""
            INSERT INTO EmailEvent (message_id, event_type, occurred_at, metadata)
            VALUES (%(mid)s, %(type)s, %(ts)s, %(meta)s)
            ON CONFLICT DO NOTHING  -- idempotent: provider may send duplicates
        """, {'mid': message.id, 'type': event['event'],
              'ts': event['timestamp'], 'meta': json.dumps(event)})

        if event['event'] == 'bounce':
            # Suppress future emails to this address
            mark_address_undeliverable(event['email'])
        elif event['event'] == 'unsubscribe':
            update_marketing_consent(event['email'], granted=False)

Unsubscribe and Suppression List

EmailSuppression
  email           TEXT PK
  reason          TEXT    -- 'unsubscribe', 'hard_bounce', 'spam_report'
  created_at      TIMESTAMPTZ

def enqueue_email(to, ...):
    # Always check suppression before queuing
    if db.exists(EmailSuppression, email=to):
        log.info(f'Suppressed email to {to}')
        return None
    # proceed with enqueue

Key Interview Points

  • Idempotency key prevents duplicates: Application code passes a stable key (e.g., order-12345-confirmation) so retries and re-runs never double-send the same email.
  • FOR UPDATE SKIP LOCKED: Enables multiple parallel workers to claim disjoint batches without contention. Add workers freely to increase throughput.
  • Separate transactional from marketing: Use different sending domains and IP pools. Transactional emails (receipts, password resets) should never share IP reputation with marketing blasts.
  • Suppression before queuing: Check the suppression list at enqueue time, not at delivery time, to avoid wasting a queue slot and a provider API call on a known-bad address.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do you prevent the same email from being sent twice?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use an idempotency key on the EmailMessage record with a UNIQUE constraint and INSERT … ON CONFLICT DO NOTHING. The application generates a stable key before inserting: for a password reset, use reset-{user_id}-{token}; for an order confirmation, use order-{order_id}-confirmation. If the job that enqueues the email runs twice (due to a retry), the second INSERT is a no-op. The email is delivered exactly once. Without idempotency keys, any retry logic (task queue retries, cronjob failures) can cause duplicate sends.”}},{“@type”:”Question”,”name”:”How do you retry failed email deliveries with exponential backoff?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Store next_attempt_at on the EmailMessage record. After a failed delivery attempt, compute the delay based on attempt_count: [60, 300, 1800, 7200, 28800] seconds for attempts 1-5. After 5 attempts: set status=failed and alert the team. The delivery worker queries WHERE status=queued AND next_attempt_at<=NOW() — it naturally picks up retries when they come due. Use FOR UPDATE SKIP LOCKED so multiple workers process independent messages in parallel without double-delivering.”}},{“@type”:”Question”,”name”:”How do you handle bounced or unsubscribed email addresses?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Receive delivery events from your provider (SendGrid, SES) via webhook. On hard bounce (permanent delivery failure): add the address to an EmailSuppression table and never send to it again. On spam report: same suppression. On unsubscribe: update the user’s marketing consent preferences in addition to suppressing. Check the suppression list at enqueue time (before inserting to the queue) rather than at delivery time — this saves a wasted queue slot and a provider API call. Respect suppressions for transactional emails too if the bounce indicates an invalid address.”}},{“@type”:”Question”,”name”:”How do you scale email delivery to millions per day?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Run multiple delivery worker processes — FOR UPDATE SKIP LOCKED allows horizontal scaling without double-delivery. Batch API calls to SendGrid/SES: most providers accept 1,000 recipients per API call. Separate transactional from marketing sends: use different IP pools (warm IPs for transactional, separate pool for bulk) and separate sending domains to protect deliverability. For very high volume (100M+/day): use dedicated IPs, implement feedback loops with ISPs, monitor bounce rates per IP, and throttle per-IP send rates to avoid blacklisting.”}},{“@type”:”Question”,”name”:”How do you track email open and click rates?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For opens: embed a 1×1 tracking pixel (GET /track/open?mid={message_id}). When loaded, record an EmailEvent with event_type=opened. Note: privacy-conscious email clients (Apple Mail Privacy Protection) pre-fetch tracking pixels, so open rates are inflated. For clicks: replace links with redirect URLs (/track/click?mid={message_id}&url={encoded_url}). On click: record an EmailEvent with event_type=clicked and the target URL, then redirect. Store events in EmailEvent table. For aggregate reporting: join EmailMessage with EmailEvent to compute per-campaign open and click rates.”}}]}

Email queue and transactional messaging design is discussed in Amazon system design interview questions.

Email queue and notification delivery design is covered in Shopify system design interview preparation.

Email queue and transactional email system design is discussed in Stripe system design interview guide.

Scroll to Top