Device Token Storage
A device token is a provider-issued opaque string that identifies a specific app install on a specific device. The token storage schema:
- user_id: the authenticated user owning this device
- device_id: stable client-generated UUID persisted across app restarts (not the hardware UDID)
- platform: ios | android | huawei | web
- token: provider-issued push token; up to 256 bytes
- app_version: used to segment delivery by version for staged rollouts
- last_seen: timestamp of last app open — used to skip tokens for inactive users
- is_active: boolean; set to false when the provider reports the token as invalid
Index on (user_id, is_active) for efficient per-user token lookups. Partition by user_id to keep a user's tokens co-located.
Token Lifecycle Management
Tokens are not permanent — they can be invalidated at any time:
- New install: app registers with the provider on first launch and receives a token. The app sends this token to the backend, which upserts it by (user_id, device_id) — handles reinstalls cleanly.
- Token rotation: FCM may rotate tokens periodically. The app detects a new token via the
onTokenRefreshcallback and registers it. Always useupsert— neverinsert— to avoid duplicate records. - Token expiry: APNs returns HTTP 410 Gone when a token is permanently invalid (app uninstalled). Mark
is_active = falseimmediately. FCM returnsUNREGISTEREDerror code for the same case. - Inactive cleanup: tokens with
last_seen > 90 days agoare candidates for deactivation — these users are unlikely to receive pushes even if tokens remain technically valid.
Provider Abstraction Layer
The gateway exposes a single internal send interface and routes to the correct provider based on the token's platform field:
interface PushProvider {
send(token: string, payload: Payload): SendResult
sendBatch(tokens: string[], payload: Payload): BatchResult[]
}
Implementations: APNSProvider, FCMProvider, HMSProvider (Huawei), WebPushProvider. Each implementation handles provider-specific authentication, request format, and error code mapping. The calling code is provider-agnostic — it passes a token and payload and receives a normalized result.
Fanout at Scale
Sending a push notification to 10 million devices requires a distributed fanout job:
- Query the device_tokens table for all active tokens matching the target segment (e.g., all users, or users matching a cohort filter)
- Chunk the token list into batches of 1,000
- Enqueue each chunk as a task in a distributed queue (SQS, Kafka)
- A pool of push workers consumes chunks and sends via provider batch APIs
- Each worker aggregates success/failure counts and writes them to a results store
Provider batch API limits:
- FCM batch API: up to 500 tokens per HTTP request; responses map 1:1 to input tokens
- APNs: no native batch API — use HTTP/2 multiplexing to pipeline up to 1,500 concurrent requests per connection (Apple's recommended limit)
A fanout to 10M devices with 100 workers, each sending 500-token FCM batches at 10 requests/second, completes in approximately 20,000 seconds / 100 workers = 200 seconds — roughly 3 minutes.
Priority and Collapse Keys
APNs priority:
apns-priority: 10— deliver immediately, wakes device screenapns-priority: 5— low power, delivered opportunistically when device is not in low-power mode
Collapse keys (APNs: apns-collapse-id, FCM: collapse_key): if multiple notifications with the same collapse key are queued for an offline device, only the most recent is delivered when the device comes online. This is ideal for chat unread count badges — only the latest count matters, not every individual increment.
Payload Size Limits and Silent Push
Payload size limits:
- APNs: 4KB total payload
- FCM: 4KB data payload; notification title+body displayed by the system tray do not count toward this limit
If content exceeds limits, truncate the body and include a message_id in the data payload so the app can fetch the full content from the API after receiving the push.
Silent push (background fetch): apns-push-type: background with content-available: 1 wakes the app in the background to fetch new data without showing a visible notification. iOS limits this to approximately 3 background wakes per hour per app — do not use for time-sensitive delivery.
Bulk Push Job Architecture
A dedicated bulk push service handles broadcast campaigns:
- Job created with targeting criteria (all users, users in segment X, users with app version Y)
- Job processor queries tokens in pages of 10,000, chunked into worker tasks
- Workers run in parallel across a pool, each maintaining persistent HTTP/2 connections to provider endpoints
- Job progress tracked: total_tokens, sent, delivered, failed — polled by the campaign dashboard
- Rate limiting: respect per-provider rate limits; implement token bucket on each worker to avoid 429 responses
Delivery Rate Monitoring and Certificate Rotation
Delivery monitoring: track success rate per provider, per platform, per app version. Emit a PagerDuty alert when:
- Success rate drops below 95% (indicates provider issue or widespread token expiry)
- Error rate for a specific error code spikes (e.g., sudden increase in UNREGISTERED suggests a bad token import)
- APNs connection errors spike (certificate near expiry)
Certificate and key rotation:
- APNs authentication keys (
.p8) do not expire, but rotate annually as a security practice. Update the key in the secrets manager and redeploy without downtime. - APNs TLS certificates (legacy method) expire after 1 year — set a calendar reminder 30 days before expiry; expired certificates cause all APNs sends to fail immediately.
- FCM service account keys: rotate via Google Cloud IAM; update in secrets manager without service restart if the provider implementation reads credentials dynamically.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering