System Design: Feed Ranking and Personalization — Candidate Generation, Scoring, and Real-Time Updates (2025)

Feed Architecture Overview

A social feed shows users content from people and pages they follow, ranked by predicted engagement. The pipeline has three stages: Candidate generation: retrieve the pool of candidate posts (from follows, interest graphs, trending topics). Ranking: score each candidate with an ML model predicting user engagement (click, like, share, comment, dwell time). Filtering and serving: apply business rules (diversity, safety, deduplication), truncate to top N, serve. At Instagram/TikTok scale: billions of posts, hundreds of millions of users, < 200ms feed load SLA. The key trade-off: coverage (seeing all relevant content) vs. latency vs. freshness.

Fan-Out Strategies: Push vs. Pull

Fan-out on write (push model): when a user posts, push the post ID to the feed inbox of all followers. Read: just read your pre-computed inbox. Write: O(followers) — expensive for celebrities (Cristiano Ronaldo has 600M followers). Fan-out on read (pull model): on feed load, fetch posts from all accounts the user follows, merge and rank. Read: O(accounts_followed * posts_per_account) — expensive for users following thousands. Hybrid (Instagram’s approach): For users with 10K followers): fan-out on read — inject celebrity posts at feed serving time. This caps write amplification while keeping reads fast for most users. Celebrity threshold is a tunable parameter.

Candidate Generation

class CandidateGenerator:
    def get_candidates(self, user_id: int, limit: int = 500) -> list[Post]:
        candidates = []

        # 1. Follow graph: posts from accounts the user follows
        follow_posts = self.follow_feed_store.get_recent(
            user_id, limit=200
        )  # Redis sorted set: ZREVRANGE feed:{user_id} 0 199

        # 2. Interest graph: posts from topics the user engages with
        interest_posts = self.interest_store.get_recommended(
            user_id, limit=150
        )  # Embedding similarity in vector DB

        # 3. Trending: globally trending content
        trending = self.trending_store.get_trending(
            region=user.region, limit=50
        )

        # 4. Re-engagement: posts the user liked/shared that have new replies
        reengagement = self.reengagement_store.get(user_id, limit=50)

        # Deduplicate by post_id
        seen = set()
        for post in follow_posts + interest_posts + trending + reengagement:
            if post.id not in seen:
                candidates.append(post)
                seen.add(post.id)

        return candidates[:limit]

Ranking Model

Features for the ranking model: Post features: content type (video/photo/text), post age, creator follower count, post engagement velocity (likes per hour), content embeddings. User features: historical engagement with this creator, content type preference, session context (time of day, device), recent interaction history. Cross features: user-creator affinity score, user-content-type affinity. Model: two-tower neural network (user tower + post tower → dot product for relevance score) or gradient-boosted trees for multi-task prediction (predict P(click), P(like), P(share), P(dwell > 10s) separately, combine into a single score). Multi-task loss prevents optimizing only for clicks (click-bait) — weight the combination based on business values.

Real-Time Feed Updates and Freshness

New posts should appear in feeds within seconds of publishing. Real-time injection: when a post is created, publish to a Kafka topic (new_posts). Feed service consumers process new_posts events and inject them into Redis sorted sets (fan-out on write). For pull-model users: the injection is skipped — the post is available for pull on next feed load. Freshness signals: recently posted content gets a “freshness boost” in the ranking score that decays over time (exponential decay: score * exp(-lambda * age_hours)). This ensures new posts from followed accounts surface even against older high-engagement posts. Seen-post filtering: track posts already shown to the user in a Bloom filter (per user, per session). Filter candidates against the Bloom filter before ranking to avoid showing the same post twice in consecutive feed loads.


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the difference between fan-out on write and fan-out on read for social feeds?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Fan-out on write (push): when a user posts, immediately write the post ID to each follower's feed inbox (a Redis sorted set). Reads are fast (just read your inbox), but writes are expensive for accounts with many followers (fan-out to 1M followers = 1M Redis writes). Fan-out on read (pull): when a user loads their feed, query the recent posts from all accounts they follow, merge, and rank. Reads are expensive (query N followed accounts, merge up to N*k posts), but writes are cheap (just store the post once). Hybrid: use push for regular users (<10K followers) and pull for celebrities (>10K followers). At feed load time, inject celebrity posts via pull and merge with the pre-computed push feed. This balances write amplification against read latency.”}},{“@type”:”Question”,”name”:”How does a two-tower neural network rank feed posts?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Two-tower architecture: a user tower encodes user features (demographics, historical engagement, session context) into a user embedding vector. A post tower encodes post features (content type, creator, engagement signals, content embeddings) into a post embedding vector. Relevance score = dot product (or cosine similarity) of the two embeddings. Training: treat positive samples as (user, post) pairs the user engaged with; negative samples as unengaged posts. Loss: contrastive loss or binary cross-entropy. Serving: pre-compute post embeddings offline and index in a vector database (Faiss, Pinecone). At ranking time: compute the user embedding (fast, < 5ms), retrieve top-K candidates via ANN search in the vector DB, then optionally re-rank the top-K with a heavier pointwise model for final ordering.”}},{“@type”:”Question”,”name”:”How do you prevent filter bubbles while still personalizing a feed?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Filter bubbles occur when the ranking model only shows content similar to what the user already engaged with, creating an echo chamber. Mitigation strategies: (1) Diversity injection: reserve 10-15% of feed slots for content outside the user's top interest clusters (exploration slots). (2) Topic diversity constraint: cap the number of consecutive posts from the same topic or creator (e.g., max 3 posts from creator X in the top 20). (3) Serendipity signals: include a "discovery" feature in the ranking model that rewards content the user's social network has engaged with but the user has not seen before. (4) Multi-objective optimization: rank on a combination of predicted engagement + diversity score + serendipity score. Business trade-off: pure engagement optimization tends toward filter bubbles and extreme content; diversity constraints trade a small engagement drop for healthier long-term user behavior.”}},{“@type”:”Question”,”name”:”How do you handle real-time feed freshness for posts from followed accounts?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”For push model: new posts are fan-outed to follower inboxes immediately via Kafka consumers. The Redis sorted set uses the post timestamp as the score, so new posts appear at the top on next feed load. Freshness boost in ranking: new posts get a score multiplier that decays exponentially (score *= exp(-0.1 * age_hours)). This allows a new post from a followed account to surface above older high-engagement posts for the first few hours. Seen-post deduplication: Redis sorted set or Bloom filter per user tracks seen post IDs. Feed requests check the Bloom filter and skip already-seen posts. For pull model: the "age" of the fetched posts is fresh on each pull, but the ranking model must still apply freshness signals to avoid surfacing 3-day-old posts above 1-hour-old posts from the same creator.”}},{“@type”:”Question”,”name”:”How do you handle the cold start problem for new users and new content?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”New user cold start: no engagement history, so personalization features are zero. Solutions: (1) Onboarding signals: ask new users to select interest topics and follow initial accounts. Use these explicit signals for the first feed loads. (2) Demographic-based priors: users similar in age, location, and signup context tend to have similar initial interests. Use a cluster-based default ranking for the user's demographic group. (3) Collaborative filtering bootstrap: find users with similar onboarding choices and use their early engagement patterns. New content cold start: a new post has no engagement data yet. Solutions: (1) Creator-quality prior: use the creator's historical engagement rate as a proxy for expected post quality. (2) Content-based ranking: use content embeddings (text/image/video analysis) to estimate relevance independently of engagement. (3) Early engagement signals: rapidly update post features as the first likes/views arrive in real time via streaming pipeline.”}}]}

See also: Meta Interview Prep

See also: Twitter/X Interview Prep

See also: LinkedIn Interview Prep

Scroll to Top