Push notifications look simple from the user side — a banner appears, you tap it, the app opens. From an engineering perspective, designing a push system at scale (millions of users, billions of notifications per day) is a classic mobile system design question because it touches three platforms (iOS, Android, web), several backend services, and serious correctness concerns around dedup, ordering, and delivery guarantees.
Functional requirements
- Send notifications to specific users on iOS and Android
- Support broadcast (e.g., “new episode of your podcast”)
- Personalized content (badge counts, custom images)
- Silent / data-only pushes that wake the app for background sync
- Delivery receipt where possible
Architecture
Three layers: application (your service), fanout, delivery (APNs, FCM, web push).
Device registration
On launch, the app requests a push token from the OS (APNs token on iOS, FCM token on Android). The app uploads the token + user ID + device metadata to your backend.
Tokens rotate. Always re-upload on app launch. Server should de-dupe and update the token-to-user mapping.
Fanout pipeline
When a notification needs to be sent (e.g., user A gets a new follower), the producer publishes to Kafka. A fanout worker resolves user A → list of devices → APNs/FCM payloads → delivery service.
Fanout is the bottleneck for broadcasts. Pre-compute target user lists where possible.
Delivery service
Maintains long-lived HTTP/2 connections to APNs and FCM. Throttles per-token to respect platform limits. Handles failures: invalid tokens are removed, rate-limit responses are retried with backoff, server errors are queued.
Deduplication
Each notification has a server-generated ID. APNs supports apns-collapse-id to coalesce. The client also dedupes on receive using a recent-IDs cache.
Silent push for background sync
iOS: content-available: 1 in the payload, app receives it via application(_:didReceiveRemoteNotification:fetchCompletionHandler:) and has ~30 seconds to do background work.
Android: data-only FCM messages route to the FirebaseMessagingService. Same ~30-second budget unless using high-priority data messages.
Correctness concerns
- Order: APNs and FCM do not guarantee ordering. If order matters, embed a sequence number.
- Dedupe: client and server both must dedupe. Network retries can deliver the same push twice.
- Battery drain: excessive silent pushes will get your app throttled by the OS.
Quiet hours and rate limits
Server-side per-user policy: do not send between 10pm and 7am local time. Server-side rate limit: at most N notifications per user per hour to avoid spamming.
Frequently Asked Questions
What happens if the user is offline?
APNs and FCM hold the most recent notification (with collapse ID) and deliver when the device reconnects. Anything older with the same collapse ID is dropped.
How do I get a delivery receipt?
APNs offers feedback for invalid tokens but no per-message receipt. FCM has limited support. For real receipts, you need the app to confirm receipt back to your server.
Can I send a notification to a user without their consent?
No. Both iOS and Android require user permission for visible notifications. Silent pushes do not require permission but are subject to OS budgeting.