Push Notification Gateway Low-Level Design: Token Management, Provider Abstraction, and Fanout at Scale

Device Token Storage

A device token is a provider-issued opaque string that identifies a specific app install on a specific device. The token storage schema:

  • user_id: the authenticated user owning this device
  • device_id: stable client-generated UUID persisted across app restarts (not the hardware UDID)
  • platform: ios | android | huawei | web
  • token: provider-issued push token; up to 256 bytes
  • app_version: used to segment delivery by version for staged rollouts
  • last_seen: timestamp of last app open — used to skip tokens for inactive users
  • is_active: boolean; set to false when the provider reports the token as invalid

Index on (user_id, is_active) for efficient per-user token lookups. Partition by user_id to keep a user's tokens co-located.

Token Lifecycle Management

Tokens are not permanent — they can be invalidated at any time:

  • New install: app registers with the provider on first launch and receives a token. The app sends this token to the backend, which upserts it by (user_id, device_id) — handles reinstalls cleanly.
  • Token rotation: FCM may rotate tokens periodically. The app detects a new token via the onTokenRefresh callback and registers it. Always use upsert — never insert — to avoid duplicate records.
  • Token expiry: APNs returns HTTP 410 Gone when a token is permanently invalid (app uninstalled). Mark is_active = false immediately. FCM returns UNREGISTERED error code for the same case.
  • Inactive cleanup: tokens with last_seen > 90 days ago are candidates for deactivation — these users are unlikely to receive pushes even if tokens remain technically valid.

Provider Abstraction Layer

The gateway exposes a single internal send interface and routes to the correct provider based on the token's platform field:

interface PushProvider {
    send(token: string, payload: Payload): SendResult
    sendBatch(tokens: string[], payload: Payload): BatchResult[]
}

Implementations: APNSProvider, FCMProvider, HMSProvider (Huawei), WebPushProvider. Each implementation handles provider-specific authentication, request format, and error code mapping. The calling code is provider-agnostic — it passes a token and payload and receives a normalized result.

Fanout at Scale

Sending a push notification to 10 million devices requires a distributed fanout job:

  1. Query the device_tokens table for all active tokens matching the target segment (e.g., all users, or users matching a cohort filter)
  2. Chunk the token list into batches of 1,000
  3. Enqueue each chunk as a task in a distributed queue (SQS, Kafka)
  4. A pool of push workers consumes chunks and sends via provider batch APIs
  5. Each worker aggregates success/failure counts and writes them to a results store

Provider batch API limits:

  • FCM batch API: up to 500 tokens per HTTP request; responses map 1:1 to input tokens
  • APNs: no native batch API — use HTTP/2 multiplexing to pipeline up to 1,500 concurrent requests per connection (Apple's recommended limit)

A fanout to 10M devices with 100 workers, each sending 500-token FCM batches at 10 requests/second, completes in approximately 20,000 seconds / 100 workers = 200 seconds — roughly 3 minutes.

Priority and Collapse Keys

APNs priority:

  • apns-priority: 10 — deliver immediately, wakes device screen
  • apns-priority: 5 — low power, delivered opportunistically when device is not in low-power mode

Collapse keys (APNs: apns-collapse-id, FCM: collapse_key): if multiple notifications with the same collapse key are queued for an offline device, only the most recent is delivered when the device comes online. This is ideal for chat unread count badges — only the latest count matters, not every individual increment.

Payload Size Limits and Silent Push

Payload size limits:

  • APNs: 4KB total payload
  • FCM: 4KB data payload; notification title+body displayed by the system tray do not count toward this limit

If content exceeds limits, truncate the body and include a message_id in the data payload so the app can fetch the full content from the API after receiving the push.

Silent push (background fetch): apns-push-type: background with content-available: 1 wakes the app in the background to fetch new data without showing a visible notification. iOS limits this to approximately 3 background wakes per hour per app — do not use for time-sensitive delivery.

Bulk Push Job Architecture

A dedicated bulk push service handles broadcast campaigns:

  • Job created with targeting criteria (all users, users in segment X, users with app version Y)
  • Job processor queries tokens in pages of 10,000, chunked into worker tasks
  • Workers run in parallel across a pool, each maintaining persistent HTTP/2 connections to provider endpoints
  • Job progress tracked: total_tokens, sent, delivered, failed — polled by the campaign dashboard
  • Rate limiting: respect per-provider rate limits; implement token bucket on each worker to avoid 429 responses

Delivery Rate Monitoring and Certificate Rotation

Delivery monitoring: track success rate per provider, per platform, per app version. Emit a PagerDuty alert when:

  • Success rate drops below 95% (indicates provider issue or widespread token expiry)
  • Error rate for a specific error code spikes (e.g., sudden increase in UNREGISTERED suggests a bad token import)
  • APNs connection errors spike (certificate near expiry)

Certificate and key rotation:

  • APNs authentication keys (.p8) do not expire, but rotate annually as a security practice. Update the key in the secrets manager and redeploy without downtime.
  • APNs TLS certificates (legacy method) expire after 1 year — set a calendar reminder 30 days before expiry; expired certificates cause all APNs sends to fail immediately.
  • FCM service account keys: rotate via Google Cloud IAM; update in secrets manager without service restart if the provider implementation reads credentials dynamically.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you design a token management system for push notifications across FCM and APNs?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Store device tokens in a `device_tokens` table: (token_id PK, user_id, platform ENUM(‘fcm’,’apns’), token TEXT, app_version, created_at, last_seen_at, is_valid BOOL). Index on (user_id, platform) for fan-out queries. Tokens are invalidated when: (1) FCM returns a ‘registration-token-not-registered’ error — set is_valid=false immediately; (2) APNs returns a 410 Gone with an unregister timestamp — invalidate if the token's registration predates the unregister time; (3) a new token is registered for the same (user_id, platform, device_fingerprint) — invalidate the old one. Clients refresh tokens on app open and after FCM token rotation events (FCM rotates tokens periodically). Run a nightly cleanup job to hard-delete tokens invalid for >30 days. Token volume can be large (billions for a major app) — shard the table by user_id hash if a single Postgres instance can't handle the write rate.”
}
},
{
“@type”: “Question”,
“name”: “How do you abstract FCM and APNs behind a unified push gateway interface?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Define a provider interface: send(token, payload) → (provider_message_id, error). Implement FCM and APNs adapters behind this interface. The gateway's dispatch layer selects the adapter based on the token's platform field — callers never reference FCM or APNs directly. Maintain separate connection pools and credential managers per provider: FCM uses OAuth2 tokens (v1 HTTP API) refreshed every 3600s; APNs uses TLS client certificates or JWT tokens (ES256, valid 60 min) with HTTP/2 persistent connections (APNs requires multiplexed HTTP/2 — reusing connections is critical for throughput). Abstract payload differences: FCM uses `notification` + `data` keys; APNs uses `aps` + custom keys. A unified `PushPayload` struct is translated to provider-specific JSON by each adapter. This lets you add new providers (e.g., Huawei HMS) by adding an adapter without touching upstream callers.”
}
},
{
“@type”: “Question”,
“name”: “How do you design a fanout system to send push notifications to millions of devices efficiently?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “For a broadcast to M users each with average D devices, a naive approach requires M×D sequential API calls — too slow. Instead: (1) partition the target user set into batches (e.g., 10K users each); (2) for each batch, query device_tokens WHERE user_id IN (…) AND is_valid=true; (3) dispatch tokens to a Kafka topic partitioned by token prefix hash; (4) push workers consume from partitions and call FCM/APNs. FCM supports batch send (up to 500 tokens per HTTP request via the legacy API, or 1 per request with v1 but with high parallelism); APNs requires one HTTP/2 request per token but supports up to 1000 concurrent streams per connection. Size the push worker fleet for peak throughput: at 10K sends/sec per worker with 50ms average APNs RTT, you need 500 in-flight requests per worker — use async I/O (e.g., asyncio, Netty) not thread-per-request. Monitor tokens-per-second and error rate per provider as primary throughput SLIs.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle APNs and FCM errors at scale and maintain token validity without manual intervention?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Classify provider errors into three categories: (1) Transient (FCM: 500/503, APNs: 429 TooManyRequests) — retry with exponential backoff and jitter, up to 3 attempts; back-pressure the upstream queue if retry rate exceeds a threshold. (2) Permanent-token (FCM: ‘registration-token-not-registered’, APNs: 410 with Apns-Id timestamp) — immediately mark the token is_valid=false in the DB and publish a `token.invalidated` event for analytics; do not retry. (3) Permanent-credential (FCM: 401, APNs: 403) — alert on-call, pause the provider adapter, do not retry individual sends. Process error responses asynchronously: push workers write raw provider responses to an `error_log` Kafka topic; a separate error processor applies the classification logic and DB updates, decoupling error handling latency from send throughput. Measure token churn rate (invalidations/day ÷ total valid tokens) as a health signal — a spike indicates a client bug causing token re-registration loops.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

Scroll to Top