Activity Feed Aggregator Low-Level Design: Fan-Out Strategy, Celebrity Problem, Redis Sorted Sets, and Pagination

Activity Feed Aggregator: Low-Level Design

An activity feed aggregator collects events from multiple services — post created, user followed, payment completed, comment added — and assembles them into a personalized, time-ordered stream for each user. The central design challenge is fan-out: when a user with 1 million followers posts, should you write to 1 million feeds immediately (fan-out on write) or compute the feed at read time (fan-out on read)? This design covers the hybrid approach used by large social platforms, feed storage in Redis sorted sets, and pagination.

Core Data Model

CREATE TABLE ActivityEvent (
    event_id       BIGSERIAL PRIMARY KEY,
    actor_id       BIGINT NOT NULL,        -- user who performed the action
    verb           VARCHAR(50) NOT NULL,   -- 'posted', 'followed', 'liked', 'commented'
    object_type    VARCHAR(50) NOT NULL,   -- 'post', 'user', 'comment', 'payment'
    object_id      BIGINT NOT NULL,
    target_id      BIGINT,                 -- e.g. the post being commented on
    metadata       JSONB NOT NULL DEFAULT '{}',
    created_at     TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Monthly partitions
CREATE TABLE ActivityEvent_2026_04 PARTITION OF ActivityEvent
    FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');

CREATE INDEX ON ActivityEvent(actor_id, created_at DESC);
CREATE INDEX ON ActivityEvent(object_type, object_id, created_at DESC);

Fan-Out Strategy: Hybrid Push/Pull

import redis, json, time

r = redis.Redis(decode_responses=True)

CELEBRITY_THRESHOLD = 10_000   # followers; above this, use pull instead of push
FEED_MAX_SIZE       = 1_000    # max events per user feed in Redis
FEED_TTL_SECONDS    = 86400 * 7  # 7-day feed retention

def publish_event(event: dict):
    """
    Called by any service when a notable action occurs.
    Decides fan-out strategy based on actor's follower count.
    """
    actor_id = event['actor_id']
    follower_count = _get_follower_count(actor_id)

    if follower_count <= CELEBRITY_THRESHOLD:
        # Push to all followers' feeds immediately (fan-out on write)
        _fan_out_to_followers(event, actor_id)
    else:
        # Celebrity: only write to own timeline (fan-out on read)
        # Followers' feeds will merge this at read time
        _write_to_actor_timeline(event, actor_id)

    # Always write to the durable event store
    _persist_event(event)

def _fan_out_to_followers(event: dict, actor_id: int):
    """Push event to every follower's feed sorted set."""
    follower_ids = db.fetchall(
        "SELECT follower_id FROM Follow WHERE followee_id=%s", (actor_id,)
    )
    event_json = json.dumps(event)
    score = event['created_at']  # Unix timestamp as sorted set score

    pipe = r.pipeline(transaction=False)
    for row in follower_ids:
        feed_key = f"feed:{row['follower_id']}"
        pipe.zadd(feed_key, {event_json: score})
        pipe.zremrangebyrank(feed_key, 0, -(FEED_MAX_SIZE + 1))  # trim to max size
        pipe.expire(feed_key, FEED_TTL_SECONDS)
    pipe.execute()

def _write_to_actor_timeline(event: dict, actor_id: int):
    """For celebrity accounts, write only to the actor's own sorted set."""
    tl_key = f"timeline:{actor_id}"
    r.zadd(tl_key, {json.dumps(event): event['created_at']})
    r.zremrangebyrank(tl_key, 0, -(FEED_MAX_SIZE + 1))
    r.expire(tl_key, FEED_TTL_SECONDS)

def _persist_event(event: dict):
    db.execute("""
        INSERT INTO ActivityEvent
            (actor_id, verb, object_type, object_id, target_id, metadata, created_at)
        VALUES (%s,%s,%s,%s,%s,%s, to_timestamp(%s))
    """, (event['actor_id'], event['verb'], event['object_type'],
          event['object_id'], event.get('target_id'),
          json.dumps(event.get('metadata', {})), event['created_at']))

Feed Read: Merge Push Feed with Celebrity Timelines

def get_feed(user_id: int, limit: int = 20, before_score: float = None) -> list:
    """
    Retrieve paginated feed for a user.
    Merges: (1) pre-computed push feed + (2) celebrity timelines pulled on read.
    """
    max_score = before_score or '+inf'
    min_score = '-inf'

    # 1. Fetch from user's push feed
    feed_key = f"feed:{user_id}"
    push_items = r.zrevrangebyscore(
        feed_key, max_score, min_score,
        start=0, num=limit * 2,  # fetch extra to allow merging
        withscores=True
    )

    # 2. Fetch from celebrity timelines followed by this user
    celebrity_ids = db.fetchall("""
        SELECT followee_id FROM Follow f
        JOIN User u ON u.user_id=f.followee_id
        WHERE f.follower_id=%s
          AND (SELECT COUNT(*) FROM Follow WHERE followee_id=f.followee_id) > %s
    """, (user_id, CELEBRITY_THRESHOLD))

    celebrity_items = []
    for row in celebrity_ids[:20]:  # cap celebrity timelines merged per request
        tl_key = f"timeline:{row['followee_id']}"
        items = r.zrevrangebyscore(tl_key, max_score, min_score,
                                   start=0, num=limit, withscores=True)
        celebrity_items.extend(items)

    # 3. Merge all items, sort by score (timestamp) descending, deduplicate
    all_items = push_items + celebrity_items
    all_items.sort(key=lambda x: x[1], reverse=True)

    seen_ids = set()
    results = []
    for event_json, score in all_items:
        event = json.loads(event_json)
        event_key = f"{event['actor_id']}:{event['verb']}:{event['object_id']}"
        if event_key not in seen_ids:
            seen_ids.add(event_key)
            event['_score'] = score
            results.append(event)
        if len(results) >= limit:
            break

    return results

def get_feed_cursor(results: list) -> float:
    """Returns score of last item for next-page pagination."""
    return results[-1]['_score'] if results else None

Key Design Decisions

Hybrid fan-out eliminates the celebrity problem: pure fan-out-on-write for a user with 50M followers means 50M Redis writes per post — a 10-second operation. Pure fan-out-on-read means merging 1,000 followed users’ timelines on every feed load — 1,000 Redis reads. The hybrid uses fan-out-on-write for normal users (<10K followers) and fan-out-on-read for celebrities. The threshold (10K) balances write amplification vs. read-time merge cost.
Redis sorted set with timestamp score: ZREVRANGEBYSCORE provides cursor-based pagination over time — pass the last seen score as before_score for the next page. This is O(log N + M) where M is items returned, regardless of total feed size. Sorted sets are more efficient than lists for time-ordered feeds because they support range queries by score.
Feed TTL of 7 days: inactive users’ feeds expire and are rebuilt from the database (ActivityEvent table) on their next login. The cold-load rebuild queries the actor_id index on ActivityEvent for the user’s followed accounts and re-populates the Redis feed. This keeps Redis memory usage bounded.
Monthly partitions on ActivityEvent: the durable store accumulates events indefinitely. Monthly partitions allow old data to be detached and archived to cold storage without impacting the current month’s partition, which is the hot read path for cold-feed rebuilds.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do you decide the celebrity threshold for switching from fan-out-on-write to fan-out-on-read?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The threshold balances write amplification against read-time merge cost. Fan-out-on-write: cost = follower_count × (Redis ZADD cost) per post. At 10,000 followers, one post writes to 10,000 feeds — taking roughly 500ms with pipelining. At 1M followers, it takes 50 seconds — unacceptable. Fan-out-on-read: cost = N celebrity timelines merged at read time. If a user follows 20 celebrities, their feed load merges 20 sorted sets — adding ~20ms of Redis reads. At <50 celebrities followed, this is fast. The break-even: when fan-out-on-write takes longer than acceptable (e.g., >1 second), switch to pull. Common thresholds: Twitter uses ~100K–500K followers; Instagram reportedly ~1M. For a smaller platform: 10K is a safe starting point. Tune based on profiling: measure actual fan-out latency at P99 and set the threshold where it consistently exceeds your write SLA.”}},{“@type”:”Question”,”name”:”How do you handle a user who follows 5,000 accounts in the activity feed?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A user following 5,000 accounts gets 5,000 feed entries per event from those accounts — their feed fills up and older entries are evicted quickly. More importantly, at read time, they may need to merge 5,000 timelines (if many are celebrities). Cap the merge to the top N most recent followed accounts (e.g., 500 by last-posted timestamp) and paginate from those. For the rare power follower: pre-compute their feed via a daily batch job that re-populates their Redis feed key from the database, rather than relying on real-time fan-out. Another approach: for users following more than 1,000 accounts, switch them to fan-out-on-read entirely (no push feed) — at that scale, the merged read is more efficient than maintaining a pre-computed feed that would require 1,000+ writes per popular event.”}},{“@type”:”Question”,”name”:”How do you implement "mute" and "unfollow" in the activity feed without rewriting the entire feed?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Mute and unfollow should take effect immediately — the user should not see new content from the muted account. Two approaches: (1) Filter at read time: maintain a MutedAccounts set per user in Redis (SET or HSET). When assembling the feed, filter out events where actor_id is in the muted set. This is O(1) per event for set membership check. Existing pushed events are never removed from the feed key — they are simply skipped during read. (2) Retroactive cleanup: when a user unfollows, delete the unfollowed account’s events from the user’s push feed. This requires scanning the feed sorted set for events where actor_id = unfollowed_id, which is O(N) and not efficient for large feeds. Approach 1 (filter at read) is strongly preferred — it is O(1), doesn’t require modifying the stored feed, and handles both mute and unfollow identically.”}},{“@type”:”Question”,”name”:”How do you rank feed items by relevance rather than pure recency?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Pure chronological ordering (newest first) can bury high-engagement content. A relevance feed score incorporates: (1) recency (time decay): score = base_score / (1 + age_hours^1.8) — Facebook-style time decay where items lose relevance exponentially; (2) engagement signals: items with many likes, comments, or reshares get a base_score boost: base_score = 1 + 0.5 * log(1 + like_count) + 1.0 * log(1 + comment_count); (3) relationship strength: content from users you interact with frequently gets a multiplier (fetched from an interaction frequency score updated in Redis); (4) content type: video posts may be boosted over text-only posts if video engagement is higher. Implementation: store the computed relevance score (not the raw timestamp) as the sorted set score. Recompute scores for recent items every 15 minutes (a background job that updates scores for items posted in the last 24 hours).”}},{“@type”:”Question”,”name”:”How do you build a "seen" state so users don’t see the same feed items on every load?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Without a "seen" cursor, paginating with offset=0&limit=20 always returns the same newest 20 items — if the user reads 20 items and refreshes, they see the same 20 again. Cursor-based pagination: return the score (timestamp or relevance score) of the last item as a cursor. Next page: zrevrangebyscore(feed_key, cursor – epsilon, ‘-inf’, limit=20). This skips already-seen items. Persistent "last read" position: store the user’s last_read_score in Redis (HSET user_feed_cursors {user_id} {score}). On each feed load, items newer than last_read_score get an "unread" badge. Update last_read_score to the most recent item when the user opens the feed. This enables unread count badges (ZCOUNT feed_key last_read_score +inf) and the "jump to new" feature that skips to the user’s last position.”}}]}

Activity feed aggregator and social timeline system design is discussed in Meta system design interview questions.

Activity feed aggregator and social media feed design is covered in Twitter system design interview preparation.

Activity feed aggregator and professional activity stream design is discussed in LinkedIn system design interview guide.