Notification Schema
Every notification is a structured record created before any delivery is attempted:
- notification_id: UUID or Snowflake — unique identifier, used for deduplication
- user_id: target recipient
- type: marketing | transactional | security — determines default channel policy and override permissions
- title, body: rendered content for display channels
- channels_requested[]: which channels the sender wants to use (push, email, SMS, in-app)
- data{}: arbitrary key-value payload for deep-link routing in the receiving app
- priority: low | normal | high | critical — controls queue routing and retry urgency
- idempotency_key: caller-supplied key for deduplication — prevents duplicate sends on retry
- created_at: used for TTL enforcement and analytics
Channel Dispatch Flow
When a notification is created, the dispatch pipeline executes before any message leaves the system:
- Preference check: load the user's per-channel opt-in/opt-out settings from the preferences store
- Channel filtering: intersect channels_requested with user's opted-in channels; security notifications bypass marketing opt-outs
- Priority resolution: determine the effective priority; security notifications always escalate to high/critical
- Queue routing: publish one task per approved channel to the appropriate priority queue
User preferences are cached in Redis (TTL 5 minutes) to avoid a database read on every notification.
Priority Queues
Separate queues per priority level ensure critical notifications are never starved by marketing volume:
- Critical queue: polled continuously; workers assigned exclusively — used for security alerts, 2FA codes, payment failures
- High queue: polled every few seconds; shared workers with critical fallback
- Normal queue: polled every 30 seconds; standard transactional notifications
- Low queue: polled infrequently (minutes); marketing, digest emails, weekly reports
During traffic spikes, low-priority queues grow while critical queues drain immediately. This is intentional — a promotional email arriving 10 minutes late is acceptable; a password reset code delayed by 10 minutes is not.
Push Notification Worker
Push workers interface with mobile platform providers:
- APNs (Apple Push Notification service): HTTP/2 API with JWT token authentication (rotate every 60 minutes); connection pooling critical — APNs limits connections per app; send up to 1000 notifications per second per connection
- FCM (Firebase Cloud Messaging): HTTP v1 API with OAuth 2.0 service account; supports topic messaging for broadcast use cases
- Token lifecycle: APNs returns status 410 (Gone) for unregistered tokens — remove from DB immediately to avoid repeated failed sends; FCM provides a new canonical registration token in the response when a token is refreshed
Email and SMS Workers
Email worker: sends via SES or SendGrid transactional API. Rate limits are per sending domain (e.g., 14 sends/second on SES default). Handle bounce callbacks via SNS webhook — hard bounces must be removed from the active address list immediately to protect domain reputation.
SMS worker: sends via Twilio or Amazon SNS. Phone numbers must be in E.164 format (+14155551234). Delivery receipts arrive asynchronously via webhook — update notification status on receipt. Maintain an opt-out registry: numbers that replied STOP must never be messaged again (regulatory requirement in most jurisdictions).
In-App Notification Worker
In-app notifications are stored in a notifications table and delivered over WebSocket if the user is connected:
- Write the notification record with status PENDING
- Check the presence service — if user has an active WebSocket connection, push immediately and mark DELIVERED
- If offline, leave as PENDING; client fetches unread notifications on next app open
- In-app notifications do not require provider integration and have effectively zero delivery cost
Delivery Status Tracking
Each notification per channel follows a state machine:
PENDING → DISPATCHED → DELIVERED
↘ FAILED
- PENDING: queued, not yet sent to provider
- DISPATCHED: submitted to provider API successfully; awaiting delivery confirmation
- DELIVERED: provider confirmed delivery (push/SMS delivery receipt, email open event, in-app ACK)
- FAILED: non-retryable error or max retries exhausted
Status transitions are written to a notification_events log table for auditability and analytics.
Retry Policy with Exponential Backoff
Transient failures (provider rate limit, network timeout) are retried with exponential backoff:
- Retry delays: 1s, 2s, 4s, 8s, 16s — max 5 attempts before marking FAILED
- Add jitter (±20%) to prevent thundering herd when many notifications retry simultaneously after a provider outage
Non-retryable errors terminate immediately without retry:
- Invalid device token (push) — token is stale, remove from DB
- User opted out (email/SMS) — add to suppression list
- Invalid phone number format
Deduplication and Analytics
Deduplication: before dispatching any channel task, check Redis with SET notification:{idempotency_key} 1 NX EX 86400. If the key already exists, the notification was already sent — skip silently. This prevents duplicate sends caused by upstream retries or at-least-once queue semantics.
Analytics dashboard: aggregate delivery events to compute per-channel delivery rate, per-notification-type open rate, failure breakdown by error code, and latency percentiles from created_at to DELIVERED. Sudden delivery rate drops indicate provider outages or certificate expiry.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering