What Is a Content Syndication Service?
A content syndication service allows content producers — publishers, data vendors, or platforms — to distribute articles, media, or structured data to a network of subscribers. Subscribers may pull content via Atom/RSS feeds on a schedule, or receive it via webhook push the moment new content is published. Designing this service at the low level involves feed generation, subscriber lifecycle management, push delivery with retry, and delivery tracking at scale.
Requirements
Functional Requirements
- Publishers create channels and push content items; the service generates Atom and RSS 2.0 feeds per channel.
- Subscribers register callback URLs (webhooks) or poll feed endpoints.
- New content triggers push notifications to all active webhook subscribers within 30 seconds.
- Track delivery status per subscriber per content item: pending, delivered, failed, retried.
- Support subscriber-level filtering by content category or custom tag.
Non-Functional Requirements
- Support 100,000 active channels and 10 million subscriber endpoints.
- Deliver push notifications to 95% of healthy subscribers within 60 seconds of publication.
- Feed XML generation p99 latency under 200 ms.
- Retry failed deliveries with exponential backoff up to 24 hours.
Data Model
The Channel table stores channel ID, publisher ID, title, description, base URL, feed format preferences, and signing secret for payload HMAC. The ContentItem table stores item ID, channel ID, title, body or media URL, GUID (used as Atom entry ID), publication timestamp, and tag array. The Subscription table stores subscription ID, channel ID, subscriber endpoint URL, filter expression (JSONB), status (active/paused/failed), last-delivery timestamp, and failure streak count. The DeliveryAttempt table is an append-only log of every push attempt with outcome, HTTP response code, latency, and retry count — partitioned by date for efficient archival.
Core Algorithms
Feed Generation
Feed XML is generated on demand from a materialized view of the 100 most recent ContentItems per channel, ordered by publication timestamp descending. The view is refreshed on each new item insertion via a database trigger writing to a Redis-cached serialized XML blob. Requests for the feed endpoint check the Redis cache first (TTL 60 seconds), falling back to real-time generation from the database. Conditional GET headers (ETag based on last-item timestamp, Last-Modified) reduce bandwidth for polling subscribers by returning 304 Not Modified when nothing has changed.
Subscriber Fanout on Publish
When a publisher submits a new ContentItem, the API service writes it to the database and publishes a content.published event to a Kafka topic partitioned by channel ID. A fanout consumer reads the event, queries all active subscriptions for the channel that match the item tags, and enqueues one delivery task per matching subscription into a priority queue backed by Redis Streams. Delivery workers pull tasks, POST the serialized item payload to the subscriber endpoint with an HMAC-SHA256 signature in the X-Hub-Signature-256 header, and record the outcome in the DeliveryAttempt table.
Retry and Backoff
On delivery failure (non-2xx response or connection timeout), the worker schedules a retry at delay = base * 2^attempt seconds (base = 30 seconds, max delay = 3600 seconds) using a delayed queue implemented with Redis sorted sets scored by next-attempt Unix timestamp. After 10 consecutive failures spanning 24 hours the subscription is automatically paused and the subscriber is notified by email. A separate reactivation endpoint allows subscribers to resume after fixing their endpoint.
API Design
Publishers use POST /v1/channels/{id}/items to publish content and GET /v1/channels/{id}/feed.atom or feed.rss for pull subscribers. Subscribers register via POST /v1/subscriptions with their channel ID, callback URL, and optional filter. A GET /v1/subscriptions/{id}/deliveries endpoint returns paginated delivery history. A POST /v1/subscriptions/{id}/test endpoint triggers a test payload to verify endpoint reachability before going live. All write endpoints require OAuth 2.0 bearer tokens; feed endpoints are public but rate-limited by IP.
Scalability and Infrastructure
The fanout layer is the primary scaling challenge. Channels with millions of subscribers (mega-publishers) use a two-tier fanout: the first tier writes bulk delivery batches to object storage (S3), and worker pools stream from those batches rather than querying the database per notification. This keeps the database out of the hot path for large fanouts. Delivery workers autoscale horizontally based on queue depth. Each worker is stateless and claims tasks with a distributed lock (Redis SET NX with TTL) to prevent duplicate delivery. The DeliveryAttempt table is partitioned monthly and old partitions are archived to cold storage after 90 days, keeping the hot table small and fast for recent-delivery queries.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Atlassian Interview Guide