Notification Delivery Service Low-Level Design: Multi-Channel Dispatch, Priority Queues, and Delivery Tracking

Notification Schema

Every notification is a structured record created before any delivery is attempted:

  • notification_id: UUID or Snowflake — unique identifier, used for deduplication
  • user_id: target recipient
  • type: marketing | transactional | security — determines default channel policy and override permissions
  • title, body: rendered content for display channels
  • channels_requested[]: which channels the sender wants to use (push, email, SMS, in-app)
  • data{}: arbitrary key-value payload for deep-link routing in the receiving app
  • priority: low | normal | high | critical — controls queue routing and retry urgency
  • idempotency_key: caller-supplied key for deduplication — prevents duplicate sends on retry
  • created_at: used for TTL enforcement and analytics

Channel Dispatch Flow

When a notification is created, the dispatch pipeline executes before any message leaves the system:

  1. Preference check: load the user's per-channel opt-in/opt-out settings from the preferences store
  2. Channel filtering: intersect channels_requested with user's opted-in channels; security notifications bypass marketing opt-outs
  3. Priority resolution: determine the effective priority; security notifications always escalate to high/critical
  4. Queue routing: publish one task per approved channel to the appropriate priority queue

User preferences are cached in Redis (TTL 5 minutes) to avoid a database read on every notification.

Priority Queues

Separate queues per priority level ensure critical notifications are never starved by marketing volume:

  • Critical queue: polled continuously; workers assigned exclusively — used for security alerts, 2FA codes, payment failures
  • High queue: polled every few seconds; shared workers with critical fallback
  • Normal queue: polled every 30 seconds; standard transactional notifications
  • Low queue: polled infrequently (minutes); marketing, digest emails, weekly reports

During traffic spikes, low-priority queues grow while critical queues drain immediately. This is intentional — a promotional email arriving 10 minutes late is acceptable; a password reset code delayed by 10 minutes is not.

Push Notification Worker

Push workers interface with mobile platform providers:

  • APNs (Apple Push Notification service): HTTP/2 API with JWT token authentication (rotate every 60 minutes); connection pooling critical — APNs limits connections per app; send up to 1000 notifications per second per connection
  • FCM (Firebase Cloud Messaging): HTTP v1 API with OAuth 2.0 service account; supports topic messaging for broadcast use cases
  • Token lifecycle: APNs returns status 410 (Gone) for unregistered tokens — remove from DB immediately to avoid repeated failed sends; FCM provides a new canonical registration token in the response when a token is refreshed

Email and SMS Workers

Email worker: sends via SES or SendGrid transactional API. Rate limits are per sending domain (e.g., 14 sends/second on SES default). Handle bounce callbacks via SNS webhook — hard bounces must be removed from the active address list immediately to protect domain reputation.

SMS worker: sends via Twilio or Amazon SNS. Phone numbers must be in E.164 format (+14155551234). Delivery receipts arrive asynchronously via webhook — update notification status on receipt. Maintain an opt-out registry: numbers that replied STOP must never be messaged again (regulatory requirement in most jurisdictions).

In-App Notification Worker

In-app notifications are stored in a notifications table and delivered over WebSocket if the user is connected:

  • Write the notification record with status PENDING
  • Check the presence service — if user has an active WebSocket connection, push immediately and mark DELIVERED
  • If offline, leave as PENDING; client fetches unread notifications on next app open
  • In-app notifications do not require provider integration and have effectively zero delivery cost

Delivery Status Tracking

Each notification per channel follows a state machine:

PENDING → DISPATCHED → DELIVERED
                    ↘ FAILED
  • PENDING: queued, not yet sent to provider
  • DISPATCHED: submitted to provider API successfully; awaiting delivery confirmation
  • DELIVERED: provider confirmed delivery (push/SMS delivery receipt, email open event, in-app ACK)
  • FAILED: non-retryable error or max retries exhausted

Status transitions are written to a notification_events log table for auditability and analytics.

Retry Policy with Exponential Backoff

Transient failures (provider rate limit, network timeout) are retried with exponential backoff:

  • Retry delays: 1s, 2s, 4s, 8s, 16s — max 5 attempts before marking FAILED
  • Add jitter (±20%) to prevent thundering herd when many notifications retry simultaneously after a provider outage

Non-retryable errors terminate immediately without retry:

  • Invalid device token (push) — token is stale, remove from DB
  • User opted out (email/SMS) — add to suppression list
  • Invalid phone number format

Deduplication and Analytics

Deduplication: before dispatching any channel task, check Redis with SET notification:{idempotency_key} 1 NX EX 86400. If the key already exists, the notification was already sent — skip silently. This prevents duplicate sends caused by upstream retries or at-least-once queue semantics.

Analytics dashboard: aggregate delivery events to compute per-channel delivery rate, per-notification-type open rate, failure breakdown by error code, and latency percentiles from created_at to DELIVERED. Sudden delivery rate drops indicate provider outages or certificate expiry.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

Scroll to Top