What Is a Content Syndication Service?
A content syndication service allows content producers — publishers, data vendors, or platforms — to distribute articles, media, or structured data to a network of subscribers. Subscribers may pull content via Atom/RSS feeds on a schedule, or receive it via webhook push the moment new content is published. Designing this service at the low level involves feed generation, subscriber lifecycle management, push delivery with retry, and delivery tracking at scale.
Requirements
Functional Requirements
- Publishers create channels and push content items; the service generates Atom and RSS 2.0 feeds per channel.
- Subscribers register callback URLs (webhooks) or poll feed endpoints.
- New content triggers push notifications to all active webhook subscribers within 30 seconds.
- Track delivery status per subscriber per content item: pending, delivered, failed, retried.
- Support subscriber-level filtering by content category or custom tag.
Non-Functional Requirements
- Support 100,000 active channels and 10 million subscriber endpoints.
- Deliver push notifications to 95% of healthy subscribers within 60 seconds of publication.
- Feed XML generation p99 latency under 200 ms.
- Retry failed deliveries with exponential backoff up to 24 hours.
Data Model
The Channel table stores channel ID, publisher ID, title, description, base URL, feed format preferences, and signing secret for payload HMAC. The ContentItem table stores item ID, channel ID, title, body or media URL, GUID (used as Atom entry ID), publication timestamp, and tag array. The Subscription table stores subscription ID, channel ID, subscriber endpoint URL, filter expression (JSONB), status (active/paused/failed), last-delivery timestamp, and failure streak count. The DeliveryAttempt table is an append-only log of every push attempt with outcome, HTTP response code, latency, and retry count — partitioned by date for efficient archival.
Core Algorithms
Feed Generation
Feed XML is generated on demand from a materialized view of the 100 most recent ContentItems per channel, ordered by publication timestamp descending. The view is refreshed on each new item insertion via a database trigger writing to a Redis-cached serialized XML blob. Requests for the feed endpoint check the Redis cache first (TTL 60 seconds), falling back to real-time generation from the database. Conditional GET headers (ETag based on last-item timestamp, Last-Modified) reduce bandwidth for polling subscribers by returning 304 Not Modified when nothing has changed.
Subscriber Fanout on Publish
When a publisher submits a new ContentItem, the API service writes it to the database and publishes a content.published event to a Kafka topic partitioned by channel ID. A fanout consumer reads the event, queries all active subscriptions for the channel that match the item tags, and enqueues one delivery task per matching subscription into a priority queue backed by Redis Streams. Delivery workers pull tasks, POST the serialized item payload to the subscriber endpoint with an HMAC-SHA256 signature in the X-Hub-Signature-256 header, and record the outcome in the DeliveryAttempt table.
Retry and Backoff
On delivery failure (non-2xx response or connection timeout), the worker schedules a retry at delay = base * 2^attempt seconds (base = 30 seconds, max delay = 3600 seconds) using a delayed queue implemented with Redis sorted sets scored by next-attempt Unix timestamp. After 10 consecutive failures spanning 24 hours the subscription is automatically paused and the subscriber is notified by email. A separate reactivation endpoint allows subscribers to resume after fixing their endpoint.
API Design
Publishers use POST /v1/channels/{id}/items to publish content and GET /v1/channels/{id}/feed.atom or feed.rss for pull subscribers. Subscribers register via POST /v1/subscriptions with their channel ID, callback URL, and optional filter. A GET /v1/subscriptions/{id}/deliveries endpoint returns paginated delivery history. A POST /v1/subscriptions/{id}/test endpoint triggers a test payload to verify endpoint reachability before going live. All write endpoints require OAuth 2.0 bearer tokens; feed endpoints are public but rate-limited by IP.
Scalability and Infrastructure
The fanout layer is the primary scaling challenge. Channels with millions of subscribers (mega-publishers) use a two-tier fanout: the first tier writes bulk delivery batches to object storage (S3), and worker pools stream from those batches rather than querying the database per notification. This keeps the database out of the hot path for large fanouts. Delivery workers autoscale horizontally based on queue depth. Each worker is stateless and claims tasks with a distributed lock (Redis SET NX with TTL) to prevent duplicate delivery. The DeliveryAttempt table is partitioned monthly and old partitions are archived to cold storage after 90 days, keeping the hot table small and fast for recent-delivery queries.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you generate Atom and RSS feeds in a content syndication system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The system maintains a feed metadata table (title, description, last-build-date) and an items table keyed by content ID. A templating layer serializes the latest N items into Atom 1.0 or RSS 2.0 XML on demand, or caches the rendered XML in an object store and invalidates it on publish events. ETags and Last-Modified headers enable conditional GET, reducing bandwidth for subscribers that poll frequently.”
}
},
{
“@type”: “Question”,
“name”: “How is subscriber management handled in a content syndication platform?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Subscribers are stored in a relational table with columns for endpoint URL, preferred format (RSS/Atom/JSON Feed), subscription status, and last-successful-delivery timestamp. A subscription lifecycle state machine handles confirmation (double opt-in for WebSub), pausing on repeated failures, and automatic unsubscription after a configurable number of consecutive 4xx responses from the subscriber's endpoint.”
}
},
{
“@type”: “Question”,
“name”: “How does webhook push delivery work in a content syndication system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “On new content publication, an event is enqueued into a durable message queue (e.g., Kafka or SQS). A delivery worker dequeues the event, looks up all active subscribers, and POSTs a signed payload (HMAC-SHA256 signature in the X-Hub-Signature header) to each subscriber's registered callback URL. Failed deliveries are retried with exponential back-off up to a configured maximum, after which the subscriber is marked degraded.”
}
},
{
“@type”: “Question”,
“name”: “How do you ensure idempotency in delivery tracking for content syndication?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each delivery attempt is assigned a unique delivery ID that is stored in a delivery-log table with status (pending, success, failed). Before dispatching, the worker checks whether a successful delivery record already exists for (content_id, subscriber_id). Including the delivery ID in the webhook payload lets the receiving system deduplicate re-deliveries. Idempotency keys in the queue message prevent duplicate enqueue on producer retries.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Atlassian Interview Guide