Low Level Design: Push Notification Service

Overview

A push notification service delivers real-time messages to mobile and web clients via platform gateways: Apple Push Notification service (APNs) for iOS and Firebase Cloud Messaging (FCM) for Android and web. The design must handle millions of registered devices, fan out a single notification to large user segments, track delivery, and retry failed sends without spamming users.

Data Model

CREATE TABLE device_tokens (
    id            BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    user_id       BIGINT UNSIGNED NOT NULL,
    platform      ENUM('apns','fcm','webpush') NOT NULL,
    token         VARCHAR(512) NOT NULL,
    app_version   VARCHAR(32),
    locale        VARCHAR(16),
    is_active     TINYINT(1) NOT NULL DEFAULT 1,
    created_at    DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at    DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    UNIQUE KEY uq_token (platform, token),
    INDEX idx_user (user_id, platform, is_active)
) ENGINE=InnoDB;

CREATE TABLE notifications (
    id            BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    campaign_id   BIGINT UNSIGNED,
    title         VARCHAR(256) NOT NULL,
    body          TEXT NOT NULL,
    payload       JSON,
    target_type   ENUM('user','segment','broadcast') NOT NULL,
    target_ref    VARCHAR(256),
    scheduled_at  DATETIME,
    created_at    DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_campaign (campaign_id),
    INDEX idx_scheduled (scheduled_at)
) ENGINE=InnoDB;

CREATE TABLE notification_sends (
    id            BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    notification_id BIGINT UNSIGNED NOT NULL,
    device_token_id BIGINT UNSIGNED NOT NULL,
    status        ENUM('pending','sent','delivered','failed','invalid_token') NOT NULL DEFAULT 'pending',
    provider_msg_id VARCHAR(256),
    attempt_count TINYINT UNSIGNED NOT NULL DEFAULT 0,
    last_error    VARCHAR(512),
    sent_at       DATETIME,
    delivered_at  DATETIME,
    next_retry_at DATETIME,
    INDEX idx_notification (notification_id, status),
    INDEX idx_retry (status, next_retry_at),
    INDEX idx_device (device_token_id)
) ENGINE=InnoDB;

CREATE TABLE dead_letter (
    id            BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    notification_id BIGINT UNSIGNED NOT NULL,
    device_token_id BIGINT UNSIGNED NOT NULL,
    last_error    TEXT,
    failed_at     DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;

The device_tokens table is the registry. A user may have multiple tokens (multiple devices, reinstalls). The unique key on (platform, token) prevents duplicate registrations. notification_sends is the per-device delivery ledger and drives the retry loop. dead_letter captures permanently failed sends for alerting and audit.

Core Workflow

1. Token Registration

When a mobile app starts, it requests a push token from the OS (APNs or FCM). The app sends the token to your backend registration endpoint. The backend upserts into device_tokens: if the token already exists for that platform, update updated_at and set is_active = 1. If the user signs out, mark the token is_active = 0 rather than deleting — you need the row to handle provider feedback about stale tokens.

2. Notification Ingestion

A producer (marketing tool, application event, scheduled job) writes a row to notifications and publishes a message to a fanout queue (e.g., Kafka topic notification.fanout) containing the notification ID and targeting criteria. This decouples ingestion latency from delivery throughput.

3. Fanout

A fanout worker consumes from notification.fanout. For target_type = 'user', it queries device_tokens for that user. For 'segment', it queries a segmentation service or pre-built audience table. For 'broadcast', it streams all active tokens in batches of 10,000. For each token, it inserts a row into notification_sends (status = 'pending') and publishes a send task to a send queue (Kafka topic notification.send, partitioned by platform so APNs and FCM workers are independent). Fanout for a broadcast to 50M devices is inherently slow — batch inserts and async queue publishing keep it tractable; target 500K rows/second on a well-tuned MySQL with batch INSERT.

4. Send Workers

Platform-specific workers consume from notification.send:

APNs worker: uses HTTP/2 persistent connections to api.push.apple.com. Batches are not supported by APNs — each notification is an individual HTTP/2 request, but multiplexing over a single connection allows high throughput (~1000 req/s per connection). Uses JWT or certificate auth.
FCM worker: uses the FCM HTTP v1 API. Supports sending to a single registration token per request, or up to 500 tokens via the legacy batch send endpoint. Prefer the v1 API for richer error codes.

On success: update notification_sends status to 'sent', store the provider message ID, record sent_at.

On failure: inspect the error code. If the provider returns an invalid/unregistered token error (APNs: BadDeviceToken, Unregistered; FCM: UNREGISTERED, INVALID_ARGUMENT): set status = 'invalid_token' and mark the device token is_active = 0. For transient errors (rate limit, server error): increment attempt_count, compute next_retry_at with exponential backoff, set status = 'pending'.

5. Retry Loop

A retry scheduler runs every 30 seconds and queries:

SELECT id, notification_id, device_token_id
FROM notification_sends
WHERE status = 'pending'
  AND next_retry_at <= NOW()
  AND attempt_count < 5
ORDER BY next_retry_at
LIMIT 5000;

It re-publishes each row to the send queue. After 5 attempts, move the row to dead_letter and set status = 'failed'. Exponential backoff formula: delay = min(base * 2^attempt + jitter, max_delay) where base = 30s, max_delay = 3600s, jitter = random(0, 30s). Jitter prevents thundering herd when a provider outage resolves.

6. Delivery Receipts

APNs does not provide delivery receipts — only send confirmation. FCM provides a delivery receipt via the FCM Data Messages callback if the app implements it. When a receipt arrives at your receipt endpoint, update notification_sends.status = 'delivered' and set delivered_at. Delivery rate = delivered / sent is a key business metric.

Key Design Decisions and Trade-offs

Kafka vs. SQS/RabbitMQ for fanout: Kafka’s log-based model allows replaying fanout for debugging and supports multiple consumer groups without message deletion. The trade-off is operational complexity. SQS is simpler but messages are deleted on consume — no replay.
Per-device rows in notification_sends vs. aggregated tracking: Per-device rows give precise retry and audit capability but create write amplification for large broadcasts. For 50M-device broadcasts, sharding notification_sends across multiple MySQL instances or using a columnar store (Redshift, BigQuery) for analytics while keeping only active-retry rows in MySQL is a practical compromise.
Token refresh vs. invalidation: Marking tokens inactive rather than deleting them avoids race conditions where a token is deleted between the fanout read and the send attempt. A nightly cleanup job deletes tokens that have been inactive for 90 days.
Idempotency: Each (notification_id, device_token_id) pair has a unique row in notification_sends. The send worker checks status before sending — if status is already 'sent' or 'delivered', it skips. This prevents double-sends from queue redelivery.
Priority lanes: Use separate Kafka topics or queue priorities for transactional notifications (OTP, order confirmation) vs. marketing. Transactional sends should not be blocked by a large broadcast fanout.

Failure Handling and Edge Cases

APNs connection drop: Maintain a pool of HTTP/2 connections with health checks. On a connection error, remove from pool and reconnect with exponential backoff. Unsent tasks return to the queue.
FCM quota exceeded: FCM enforces per-project QPS limits. Implement token bucket rate limiting in the FCM worker. On a 429 response, respect the Retry-After header.
Fanout worker crash mid-broadcast: Kafka consumer commits offsets only after the entire batch of notification_sends rows is inserted and tasks published. If the worker crashes, it reprocesses from the last committed offset. The UNIQUE KEY on (notification_id, device_token_id) prevents duplicate sends rows — inserts become no-ops via INSERT IGNORE.
Clock skew on retry_at: Use UTC everywhere. The retry scheduler should use UTC_TIMESTAMP() in MySQL, not application-side NOW().
Silent notifications and background fetch: APNs silent notifications have a lower priority and may be coalesced or dropped by iOS. Do not use them for time-sensitive delivery; use alert notifications with apns-priority: 10.
Opt-out and GDPR: Before fanout, check a suppression list (Redis SET keyed by user_id). Users who have opted out or requested deletion must not receive notifications. The suppression check must happen in the fanout worker, not in the send worker, to avoid unnecessary queue messages.

Scalability Considerations

Fanout throughput: For 50M devices, inserting notification_sends rows at 500K rows/s takes 100 seconds. Parallelize fanout workers across partitions. Use LOAD DATA INFILE or batch INSERT (1000 rows per statement) for maximum MySQL throughput.
Send throughput: APNs supports ~1000 req/s per HTTP/2 connection. Run 50 connections per APNs worker pod, 20 pods = 1M sends/s capacity. Scale pods based on queue lag metric.
notification_sends table size: A broadcast to 50M users produces 50M rows. Partition the table by created_at (monthly range partitions) and archive old partitions to cold storage after 30 days.
Token registry size: At 3 tokens per user and 100M users, the device_tokens table has 300M rows. Shard by user_id if a single MySQL instance cannot sustain the fanout read QPS. Alternatively, pre-compute audience token lists into a distributed cache (Redis Cluster) at campaign creation time.
Multi-region: Deploy send workers in the same AWS region as the APNs/FCM endpoints (us-east-1 for APNs, us-central1 for FCM) to minimize TLS handshake and network latency. Fanout workers and the notification_sends database can be in your primary region with cross-region replication for disaster recovery.

Summary

A production push notification service centers on three concerns: a reliable device token registry with active/inactive lifecycle management, a scalable fanout pipeline that decouples broadcast scope from send latency, and a per-device delivery ledger that drives idempotent retry with exponential backoff. Platform gateways (APNs and FCM) each have distinct error codes, QPS limits, and connection models that must be handled independently. The key architectural choice is Kafka-backed fanout for replayability, combined with MySQL partitioning and priority send lanes to keep transactional notifications fast even during large marketing broadcasts. Delivery tracking closes the loop by feeding receipt data back into the ledger for both retry logic and business analytics.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a push notification service route messages to APNs vs FCM?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each device token carries a platform identifier (iOS or Android) stored at registration time. When a notification is dispatched, the service looks up the token record to determine the target platform, then forwards the payload to Apple’s APNs or Google’s FCM using their respective HTTP/2 or XMPP APIs. A routing layer—often a stateless dispatcher—holds connection pools to both providers and selects the correct pool based on the platform field, keeping the two delivery paths fully independent.”
}
},
{
“@type”: “Question”,
“name”: “How are device tokens managed and expired tokens handled?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Device tokens are stored in a token registry keyed by user ID and device ID. Each entry records the platform, the token string, and a last-seen timestamp updated on every successful send. When APNs or FCM returns a token-invalid or unregistered error, the service marks the token as expired and stops sending to it. A periodic cleanup job purges tokens that have been expired or unseen for a configurable window (e.g., 90 days). Refresh flows—where the app re-registers on launch—upsert the record and replace stale tokens automatically.”
}
},
{
“@type”: “Question”,
“name”: “How is fan-out to millions of devices handled efficiently?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Fan-out is handled by splitting work across a message queue (e.g., Kafka). A producer publishes one job per notification event; a pool of worker processes consumes jobs and expands each into per-token send tasks, often writing those tasks back to a high-throughput queue partitioned by platform. Workers then batch-send to APNs (which supports up to 1,000 concurrent HTTP/2 streams per connection) and FCM (which supports topic or batch APIs). Rate limiting, back-pressure, and retry with exponential backoff prevent provider throttling. For broadcast events, pre-computed segment lists stored in Redis allow workers to page through recipients without hitting the database on every message.”
}
},
{
“@type”: “Question”,
“name”: “How are delivery receipts and read receipts tracked for push notifications?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Delivery receipts come from two sources: provider callbacks (APNs feedback service, FCM delivery reports) and client-side acknowledgements. After a send, the service records a ‘sent’ state in a time-series store keyed by notification ID and device ID. When the provider confirms delivery, the state advances to ‘delivered’. Read receipts require the mobile app to call back a receipt endpoint when the user opens the notification; this write advances the state to ‘read’. All state transitions are appended to an event log for analytics. Dashboards query aggregated counts per campaign or per notification type, enabling open-rate and delivery-rate reporting.”
}
}
]
}