System Design: Notification System — Push, Email, SMS, In-App, Fanout, Delivery Tracking, User Preferences

A notification system delivers timely information to users through multiple channels — push notifications, email, SMS, and in-app messages. Companies like Facebook, Uber, and Amazon send billions of notifications daily. Designing a notification system that is reliable, scalable, and respectful of user preferences is a classic system design interview question. This guide covers the end-to-end architecture from event generation to delivery tracking.

High-Level Architecture

Components: (1) Event producers — services that generate notification triggers: order service emits OrderShipped, payment service emits PaymentFailed, social service emits NewFollower. Events are published to Kafka topics. (2) Notification service — consumes events, determines which users to notify, checks user preferences, renders the notification content, and routes to the appropriate delivery channel. (3) User preference service — stores user notification preferences: which event types they want to receive, through which channels (push, email, SMS), quiet hours (do not send between 10 PM and 8 AM), and frequency caps (at most 5 marketing emails per week). (4) Template service — stores notification templates with placeholders: “Hi {user_name}, your order {order_id} has shipped!” Templates are versioned and support localization (English, Spanish, Japanese). (5) Delivery services — channel-specific services that handle the actual sending: push notification service (APNs for iOS, FCM for Android), email service (SES, SendGrid), SMS service (Twilio), and in-app notification service (WebSocket or polling). (6) Delivery tracking — records the status of each notification: queued, sent, delivered, opened, clicked, bounced, failed.

Notification Fanout

Fanout is the process of expanding a single event into individual notifications for each affected user. Types: (1) Single-user notification — OrderShipped affects one user. The notification service looks up the user, checks preferences, and sends one notification. Simple. (2) Group notification — a message in a group chat notifies all group members. The notification service fetches the group membership list and creates one notification per member (minus the sender). Moderate fanout: 10-100 users. (3) Broadcast notification — a popular user posts a new photo, notifying all 10 million followers. Massive fanout. This cannot be done synchronously — the notification service must queue the work. Implementation: read the follower list from the database (or a pre-computed follower cache), batch followers into chunks of 1000, and enqueue each chunk as a separate notification job. Workers process chunks in parallel. For celebrity users with millions of followers, the fanout can take minutes. Prioritize: deliver to active users first (users who opened the app in the last 24 hours), then to inactive users. This ensures the notification reaches the most engaged users quickly.

Delivery Channels

Push notifications (mobile): send via Apple Push Notification Service (APNs) for iOS and Firebase Cloud Messaging (FCM) for Android. The app registers a device token on install and sends it to the backend. The push service sends the notification payload to APNs/FCM with the device token. APNs/FCM delivers to the device. Challenges: device tokens change (when the user reinstalls the app), tokens can be invalid (user uninstalled), and delivery is not guaranteed (device may be offline). Email: send via Amazon SES, SendGrid, or Mailgun. Challenges: deliverability (avoiding spam filters), bounce handling (remove invalid addresses), and unsubscribe management (CAN-SPAM compliance requires a one-click unsubscribe link). SMS: send via Twilio or AWS SNS. Most expensive channel ($0.01-0.05 per message). Reserve for critical notifications: security alerts (2FA codes), delivery updates, payment confirmations. In-app notifications: stored in a notifications table and displayed when the user opens the app. Delivery: the app polls the notification API on load, or receives real-time updates via WebSocket/SSE. In-app notifications have the highest engagement rate because the user is already in the app.

Rate Limiting and User Experience

Notification fatigue is the biggest risk. Too many notifications cause users to disable notifications entirely or uninstall the app. Rate limiting strategies: (1) Per-user frequency caps — at most 5 push notifications per day, 2 marketing emails per week. Track notification counts per user in Redis (INCR with TTL). (2) Quiet hours — do not send non-critical notifications between 10 PM and 8 AM in the user local timezone. Queue them and send at the next available window. (3) Aggregation — instead of sending a notification for each new follower, aggregate: “3 people followed you today” as a single notification. Implement by buffering events for a time window (1 hour) and sending one aggregated notification. (4) Priority levels — critical (payment failed, security alert) bypass rate limits and quiet hours. High (order updates) respect quiet hours but not frequency caps. Low (marketing, social) respect all limits. (5) Smart delivery — ML model predicts the optimal time to deliver a notification based on the user historical engagement patterns. Send when the user is most likely to open it.

Reliability and Exactly-Once Delivery

Notification delivery must be reliable — a missed payment failure notification can cost the user money. At-least-once delivery: the notification service publishes to Kafka with acks=all. The delivery worker consumes, sends the notification, and commits the offset. If the worker crashes after sending but before committing, the notification is resent on restart. Duplicate delivery: the user receives the same notification twice. Deduplication: assign each notification a unique notification_id. Before sending, check if this ID has already been sent (lookup in Redis or the delivery tracking database). Skip if already sent. Retry strategy: if the delivery channel returns a transient error (APNs timeout, SES throttling), retry with exponential backoff (1s, 2s, 4s, 8s). After 3 retries, move to the dead letter queue for investigation. If the error is permanent (invalid device token, bounced email address), do not retry — mark the delivery as failed and update the user contact information. Delivery tracking: record the status of every notification: queued, sent, delivered (APNs/FCM delivery receipt), opened (email open tracking pixel, push notification open callback), clicked (link tracking). This data feeds analytics dashboards and ML models for notification optimization.

Scroll to Top