How is real-time notification delivery implemented with WebSocket?

Each connected client opens a long-lived WebSocket connection to a notification gateway. When a new notification is written to the database, a message is published to a broker (e.g., Kafka or Redis Pub/Sub) keyed by user ID. The gateway server that holds the matching connection consumes the message and pushes it to the client instantly, avoiding polling.

How does bulk mark-as-read work efficiently?

Rather than issuing one UPDATE per notification, bulk mark-as-read sends a single query such as UPDATE notifications SET read=true WHERE user_id=? AND created_at

How are unread counts maintained accurately?

A reliable pattern is to keep a Redis counter (INCR on new notification, DECR on read) rather than performing a COUNT(*) query on every page load. On write, the counter is incremented transactionally alongside the notification insert. On bulk read, it is reset to zero or decremented by the exact batch size. Periodic reconciliation against the database corrects any drift.

Low Level Design: In-App Notification Center

Q: What is an in-app notification center?

An in-app notification center is a persistent inbox inside a product that stores and displays notifications—such as mentions, alerts, and system messages—for a user. Unlike push notifications, it's always accessible within the app, retains history, and lets users manage read/unread state at their own pace.

⏱ 7 min read

What Is an In-App Notification Center?

An in-app notification center is the bell-icon inbox found in products like GitHub, LinkedIn, and Slack. Users open it to see a paginated list of recent notifications — likes, mentions, replies, system alerts — and can mark them read individually or in bulk. At scale this system must handle millions of users, each with hundreds of unread notifications, while supporting real-time delivery and fast reads.

Requirements

Functional

Generate notifications for events (mention, like, follow, comment, system alert).
Store per-user notification lists with read/unread state.
Display notifications in reverse-chronological order with infinite scroll / cursor pagination.
Mark a single notification as read.
Bulk mark-all-as-read.
Real-time delivery to online users.
Unread badge count visible in the nav bar.

Non-Functional

Read latency < 100 ms for the first page.
Delivery latency < 2 s for online users.
Support 50 M DAU, each receiving up to 100 notifications per day.
Notifications retained for 90 days.

Core Entities

notifications
  id            BIGINT PK
  recipient_id  BIGINT   -- the user who sees it
  actor_id      BIGINT   -- user who triggered the event (nullable for system)
  type          VARCHAR  -- like | comment | mention | follow | system
  entity_type   VARCHAR  -- post | comment | user
  entity_id     BIGINT
  is_read       BOOLEAN  DEFAULT false
  created_at    TIMESTAMP

notification_counts          -- denormalized unread count cache
  user_id       BIGINT PK
  unread_count  INT      DEFAULT 0
  updated_at    TIMESTAMP

API Design

GET  /v1/notifications?cursor=&limit=20
     Response: { items: [...], next_cursor, unread_count }

POST /v1/notifications/{id}/read
     Response: 204

POST /v1/notifications/read-all
     Response: 204

WebSocket ws://api/notifications/stream
     Server pushes: { type: new_notification, payload: {...} }

Notification Storage Strategy

Option A: Fan-Out on Write (Push Model)

When an event fires, the producer immediately writes one row per recipient into the notifications table. Reads are cheap: a simple indexed scan on (recipient_id, created_at DESC). This is the right default for most products.

Tradeoff: A user with 10 M followers (celebrity problem) triggers 10 M writes on a single event. Mitigate by capping fan-out (skip offline users, use async workers) or switching celebrities to fan-out on read.

Option B: Fan-Out on Read (Pull Model)

Store only the event. At read time, look up who the user follows, fetch recent events from those actors, and merge. Read is expensive; rarely used alone except for very high-follower accounts.

Hybrid (Recommended at Scale)

Fan-out on write for regular users. Fan-out on read for users with follower count above a threshold (e.g., 1 M). At read time, merge the pre-written notifications with on-demand celebrity events.

Database Indexing

-- Primary read path
CREATE INDEX idx_notif_recipient_time
  ON notifications (recipient_id, created_at DESC);

-- For unread-only filter
CREATE INDEX idx_notif_recipient_unread
  ON notifications (recipient_id, is_read, created_at DESC);

With 90-day retention and 50 M DAU at 100 notifications/day, the table holds ~450 B rows. Partition by recipient_id (hash) or by created_at range to keep index sizes manageable. Consider moving rows older than 30 days to cold storage (S3 + Athena) and only serving the hot partition from PostgreSQL or Cassandra.

Unread Count

Never run SELECT COUNT(*) WHERE is_read = false on every page load — it table-scans. Instead maintain a denormalized counter in the notification_counts table.

-- On insert
UPDATE notification_counts
  SET unread_count = unread_count + 1
  WHERE user_id = :recipient_id;

-- On mark-read (single)
UPDATE notification_counts
  SET unread_count = GREATEST(0, unread_count - 1)
  WHERE user_id = :user_id;

-- On mark-all-read
UPDATE notifications SET is_read = true WHERE recipient_id = :user_id AND is_read = false;
UPDATE notification_counts SET unread_count = 0 WHERE user_id = :user_id;

Cache the count in Redis with a short TTL (10 s) keyed as notif:unread:{user_id} so the nav bar badge reads from cache, not the DB.

Cursor Pagination

Offset pagination breaks on large offsets and is inconsistent when new rows are inserted. Use a cursor based on the last seen notification id.

-- First page
SELECT * FROM notifications
WHERE recipient_id = :uid
ORDER BY created_at DESC, id DESC
LIMIT 21;  -- fetch 21 to know if a next page exists

-- Subsequent pages (cursor = last id from previous page)
SELECT * FROM notifications
WHERE recipient_id = :uid
  AND (created_at, id) < (:cursor_ts, :cursor_id)
ORDER BY created_at DESC, id DESC
LIMIT 21;

Encode the cursor as a base64 JSON blob so it is opaque to clients: {ts: 2024-01-15T10:00:00Z, id: 98765}.

Real-Time Delivery via WebSocket

Client  --WS connect-->  Gateway (stateful)
Gateway --subscribe-->   Redis Pub/Sub  channel: notif:{user_id}
Producer --publish-->    Redis Pub/Sub  channel: notif:{user_id}
Redis   --fan-out-->     all Gateway pods subscribed to that channel
Gateway --push frame-->  Client

Steps on event occurrence:

Event producer writes the notification row(s) to DB (async worker).
Producer publishes a lightweight message to Redis channel notif:{recipient_id}.
The Gateway pod holding the user WebSocket receives it and pushes a JSON frame to the client.
Client appends the notification to the top of the list and increments the badge count locally.

If the user is offline, no WebSocket exists and Redis discards the pub/sub message — the notification is still persisted in the DB and will appear on next load.

Bulk Mark-All-As-Read

Naively updating millions of rows synchronously is too slow. Use a watermark pattern:

user_read_watermarks
  user_id       BIGINT PK
  read_before   TIMESTAMP  -- all notifications before this are treated as read
  updated_at    TIMESTAMP

On mark-all-read, set read_before = NOW(). At read time, apply: is_read OR created_at < read_before. This makes the DB write O(1) instead of O(n). New notifications arriving after the watermark still track individual is_read state.

Notification Grouping / Aggregation

Instead of showing 47 separate like notifications, group them: Alice, Bob, and 45 others liked your post. Implement by storing a group_key on each notification row:

group_key  VARCHAR  -- e.g., like:post:12345

At read time, GROUP BY group_key and aggregate actor names. Alternatively, maintain a separate aggregated_notifications table updated by a background job every 60 s. The latter trades freshness for read simplicity.

Scalability Summary

Concern	Solution
Write throughput	Async workers, partitioned DB
Read latency	Composite index, Redis count cache
Celebrity fan-out	Hybrid push/pull model
Real-time delivery	WebSocket + Redis Pub/Sub
Unread count	Denormalized counter + Redis cache
Bulk read	Watermark pattern
Storage	90-day TTL, cold-tier archival

Common Interview Follow-Ups

How do you guarantee exactly-once delivery? Idempotency key on the notification row keyed on (actor_id, type, entity_id, recipient_id, day). Insert with ON CONFLICT DO NOTHING.
Push notifications (mobile)? After DB write, enqueue a job to call APNs / FCM. Separate from the in-app center but uses the same notification record.
Multi-device? WebSocket gateway tracks all sessions per user. Publish once to Redis; all devices receive the frame.
Privacy? When the source entity (post, comment) is deleted, soft-delete or tombstone the notification rather than hard-delete to preserve referential clarity.