Activity Feed Aggregator Low-Level Design: Fan-Out Strategy, Celebrity Problem, Redis Sorted Sets, and Pagination

Activity Feed Aggregator: Low-Level Design

An activity feed aggregator collects events from multiple services — post created, user followed, payment completed, comment added — and assembles them into a personalized, time-ordered stream for each user. The central design challenge is fan-out: when a user with 1 million followers posts, should you write to 1 million feeds immediately (fan-out on write) or compute the feed at read time (fan-out on read)? This design covers the hybrid approach used by large social platforms, feed storage in Redis sorted sets, and pagination.

Core Data Model

CREATE TABLE ActivityEvent (
    event_id       BIGSERIAL PRIMARY KEY,
    actor_id       BIGINT NOT NULL,        -- user who performed the action
    verb           VARCHAR(50) NOT NULL,   -- 'posted', 'followed', 'liked', 'commented'
    object_type    VARCHAR(50) NOT NULL,   -- 'post', 'user', 'comment', 'payment'
    object_id      BIGINT NOT NULL,
    target_id      BIGINT,                 -- e.g. the post being commented on
    metadata       JSONB NOT NULL DEFAULT '{}',
    created_at     TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Monthly partitions
CREATE TABLE ActivityEvent_2026_04 PARTITION OF ActivityEvent
    FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');

CREATE INDEX ON ActivityEvent(actor_id, created_at DESC);
CREATE INDEX ON ActivityEvent(object_type, object_id, created_at DESC);

Fan-Out Strategy: Hybrid Push/Pull

import redis, json, time

r = redis.Redis(decode_responses=True)

CELEBRITY_THRESHOLD = 10_000   # followers; above this, use pull instead of push
FEED_MAX_SIZE       = 1_000    # max events per user feed in Redis
FEED_TTL_SECONDS    = 86400 * 7  # 7-day feed retention

def publish_event(event: dict):
    """
    Called by any service when a notable action occurs.
    Decides fan-out strategy based on actor's follower count.
    """
    actor_id = event['actor_id']
    follower_count = _get_follower_count(actor_id)

    if follower_count <= CELEBRITY_THRESHOLD:
        # Push to all followers' feeds immediately (fan-out on write)
        _fan_out_to_followers(event, actor_id)
    else:
        # Celebrity: only write to own timeline (fan-out on read)
        # Followers' feeds will merge this at read time
        _write_to_actor_timeline(event, actor_id)

    # Always write to the durable event store
    _persist_event(event)

def _fan_out_to_followers(event: dict, actor_id: int):
    """Push event to every follower's feed sorted set."""
    follower_ids = db.fetchall(
        "SELECT follower_id FROM Follow WHERE followee_id=%s", (actor_id,)
    )
    event_json = json.dumps(event)
    score = event['created_at']  # Unix timestamp as sorted set score

    pipe = r.pipeline(transaction=False)
    for row in follower_ids:
        feed_key = f"feed:{row['follower_id']}"
        pipe.zadd(feed_key, {event_json: score})
        pipe.zremrangebyrank(feed_key, 0, -(FEED_MAX_SIZE + 1))  # trim to max size
        pipe.expire(feed_key, FEED_TTL_SECONDS)
    pipe.execute()

def _write_to_actor_timeline(event: dict, actor_id: int):
    """For celebrity accounts, write only to the actor's own sorted set."""
    tl_key = f"timeline:{actor_id}"
    r.zadd(tl_key, {json.dumps(event): event['created_at']})
    r.zremrangebyrank(tl_key, 0, -(FEED_MAX_SIZE + 1))
    r.expire(tl_key, FEED_TTL_SECONDS)

def _persist_event(event: dict):
    db.execute("""
        INSERT INTO ActivityEvent
            (actor_id, verb, object_type, object_id, target_id, metadata, created_at)
        VALUES (%s,%s,%s,%s,%s,%s, to_timestamp(%s))
    """, (event['actor_id'], event['verb'], event['object_type'],
          event['object_id'], event.get('target_id'),
          json.dumps(event.get('metadata', {})), event['created_at']))

Feed Read: Merge Push Feed with Celebrity Timelines

def get_feed(user_id: int, limit: int = 20, before_score: float = None) -> list:
    """
    Retrieve paginated feed for a user.
    Merges: (1) pre-computed push feed + (2) celebrity timelines pulled on read.
    """
    max_score = before_score or '+inf'
    min_score = '-inf'

    # 1. Fetch from user's push feed
    feed_key = f"feed:{user_id}"
    push_items = r.zrevrangebyscore(
        feed_key, max_score, min_score,
        start=0, num=limit * 2,  # fetch extra to allow merging
        withscores=True
    )

    # 2. Fetch from celebrity timelines followed by this user
    celebrity_ids = db.fetchall("""
        SELECT followee_id FROM Follow f
        JOIN User u ON u.user_id=f.followee_id
        WHERE f.follower_id=%s
          AND (SELECT COUNT(*) FROM Follow WHERE followee_id=f.followee_id) > %s
    """, (user_id, CELEBRITY_THRESHOLD))

    celebrity_items = []
    for row in celebrity_ids[:20]:  # cap celebrity timelines merged per request
        tl_key = f"timeline:{row['followee_id']}"
        items = r.zrevrangebyscore(tl_key, max_score, min_score,
                                   start=0, num=limit, withscores=True)
        celebrity_items.extend(items)

    # 3. Merge all items, sort by score (timestamp) descending, deduplicate
    all_items = push_items + celebrity_items
    all_items.sort(key=lambda x: x[1], reverse=True)

    seen_ids = set()
    results = []
    for event_json, score in all_items:
        event = json.loads(event_json)
        event_key = f"{event['actor_id']}:{event['verb']}:{event['object_id']}"
        if event_key not in seen_ids:
            seen_ids.add(event_key)
            event['_score'] = score
            results.append(event)
        if len(results) >= limit:
            break

    return results

def get_feed_cursor(results: list) -> float:
    """Returns score of last item for next-page pagination."""
    return results[-1]['_score'] if results else None

Key Design Decisions

  • Hybrid fan-out eliminates the celebrity problem: pure fan-out-on-write for a user with 50M followers means 50M Redis writes per post — a 10-second operation. Pure fan-out-on-read means merging 1,000 followed users’ timelines on every feed load — 1,000 Redis reads. The hybrid uses fan-out-on-write for normal users (<10K followers) and fan-out-on-read for celebrities. The threshold (10K) balances write amplification vs. read-time merge cost.
  • Redis sorted set with timestamp score: ZREVRANGEBYSCORE provides cursor-based pagination over time — pass the last seen score as before_score for the next page. This is O(log N + M) where M is items returned, regardless of total feed size. Sorted sets are more efficient than lists for time-ordered feeds because they support range queries by score.
  • Feed TTL of 7 days: inactive users’ feeds expire and are rebuilt from the database (ActivityEvent table) on their next login. The cold-load rebuild queries the actor_id index on ActivityEvent for the user’s followed accounts and re-populates the Redis feed. This keeps Redis memory usage bounded.
  • Monthly partitions on ActivityEvent: the durable store accumulates events indefinitely. Monthly partitions allow old data to be detached and archived to cold storage without impacting the current month’s partition, which is the hot read path for cold-feed rebuilds.

Activity feed aggregator and social timeline system design is discussed in Meta system design interview questions.

Activity feed aggregator and social media feed design is covered in Twitter system design interview preparation.

Activity feed aggregator and professional activity stream design is discussed in LinkedIn system design interview guide.

Scroll to Top