Q: How do you prevent filter bubbles while still personalizing a feed?

Filter bubbles occur when the ranking model only shows content similar to what the user already engaged with, creating an echo chamber. Mitigation strategies: (1) Diversity injection: reserve 10-15% of feed slots for content outside the user's top interest clusters (exploration slots). (2) Topic diversity constraint: cap the number of consecutive posts from the same topic or creator (e.g., max 3 posts from creator X in the top 20). (3) Serendipity signals: include a "discovery" feature in the ranking model that rewards content the user's social network has engaged with but the user has not seen before. (4) Multi-objective optimization: rank on a combination of predicted engagement + diversity score + serendipity score. Business trade-off: pure engagement optimization tends toward filter bubbles and extreme content; diversity constraints trade a small engagement drop for healthier long-term user behavior.

Q: How do you handle real-time feed freshness for posts from followed accounts?

For push model: new posts are fan-outed to follower inboxes immediately via Kafka consumers. The Redis sorted set uses the post timestamp as the score, so new posts appear at the top on next feed load. Freshness boost in ranking: new posts get a score multiplier that decays exponentially (score *= exp(-0.1 * age_hours)). This allows a new post from a followed account to surface above older high-engagement posts for the first few hours. Seen-post deduplication: Redis sorted set or Bloom filter per user tracks seen post IDs. Feed requests check the Bloom filter and skip already-seen posts. For pull model: the "age" of the fetched posts is fresh on each pull, but the ranking model must still apply freshness signals to avoid surfacing 3-day-old posts above 1-hour-old posts from the same creator.

Q: How do you handle the cold start problem for new users and new content?

New user cold start: no engagement history, so personalization features are zero. Solutions: (1) Onboarding signals: ask new users to select interest topics and follow initial accounts. Use these explicit signals for the first feed loads. (2) Demographic-based priors: users similar in age, location, and signup context tend to have similar initial interests. Use a cluster-based default ranking for the user's demographic group. (3) Collaborative filtering bootstrap: find users with similar onboarding choices and use their early engagement patterns. New content cold start: a new post has no engagement data yet. Solutions: (1) Creator-quality prior: use the creator's historical engagement rate as a proxy for expected post quality. (2) Content-based ranking: use content embeddings (text/image/video analysis) to estimate relevance independently of engagement. (3) Early engagement signals: rapidly update post features as the first likes/views arrive in real time via streaming pipeline.

Question 1

What is the difference between fan-out on write and fan-out on read for social feeds?

Accepted Answer

Fan-out on write (push): when a user posts, immediately write the post ID to each follower's feed inbox (a Redis sorted set). Reads are fast (just read your inbox), but writes are expensive for accounts with many followers (fan-out to 1M followers = 1M Redis writes). Fan-out on read (pull): when a user loads their feed, query the recent posts from all accounts they follow, merge, and rank. Reads are expensive (query N followed accounts, merge up to N*k posts), but writes are cheap (just store the post once). Hybrid: use push for regular users (<10K followers) and pull for celebrities (>10K followers). At feed load time, inject celebrity posts via pull and merge with the pre-computed push feed. This balances write amplification against read latency.

Question 2

How does a two-tower neural network rank feed posts?

Accepted Answer

Two-tower architecture: a user tower encodes user features (demographics, historical engagement, session context) into a user embedding vector. A post tower encodes post features (content type, creator, engagement signals, content embeddings) into a post embedding vector. Relevance score = dot product (or cosine similarity) of the two embeddings. Training: treat positive samples as (user, post) pairs the user engaged with; negative samples as unengaged posts. Loss: contrastive loss or binary cross-entropy. Serving: pre-compute post embeddings offline and index in a vector database (Faiss, Pinecone). At ranking time: compute the user embedding (fast, < 5ms), retrieve top-K candidates via ANN search in the vector DB, then optionally re-rank the top-K with a heavier pointwise model for final ordering.

Question 3

How do you prevent filter bubbles while still personalizing a feed?

Accepted Answer

Filter bubbles occur when the ranking model only shows content similar to what the user already engaged with, creating an echo chamber. Mitigation strategies: (1) Diversity injection: reserve 10-15% of feed slots for content outside the user's top interest clusters (exploration slots). (2) Topic diversity constraint: cap the number of consecutive posts from the same topic or creator (e.g., max 3 posts from creator X in the top 20). (3) Serendipity signals: include a "discovery" feature in the ranking model that rewards content the user's social network has engaged with but the user has not seen before. (4) Multi-objective optimization: rank on a combination of predicted engagement + diversity score + serendipity score. Business trade-off: pure engagement optimization tends toward filter bubbles and extreme content; diversity constraints trade a small engagement drop for healthier long-term user behavior.

Question 4

How do you handle real-time feed freshness for posts from followed accounts?

Accepted Answer

For push model: new posts are fan-outed to follower inboxes immediately via Kafka consumers. The Redis sorted set uses the post timestamp as the score, so new posts appear at the top on next feed load. Freshness boost in ranking: new posts get a score multiplier that decays exponentially (score *= exp(-0.1 * age_hours)). This allows a new post from a followed account to surface above older high-engagement posts for the first few hours. Seen-post deduplication: Redis sorted set or Bloom filter per user tracks seen post IDs. Feed requests check the Bloom filter and skip already-seen posts. For pull model: the "age" of the fetched posts is fresh on each pull, but the ranking model must still apply freshness signals to avoid surfacing 3-day-old posts above 1-hour-old posts from the same creator.

Question 5

How do you handle the cold start problem for new users and new content?

Accepted Answer

New user cold start: no engagement history, so personalization features are zero. Solutions: (1) Onboarding signals: ask new users to select interest topics and follow initial accounts. Use these explicit signals for the first feed loads. (2) Demographic-based priors: users similar in age, location, and signup context tend to have similar initial interests. Use a cluster-based default ranking for the user's demographic group. (3) Collaborative filtering bootstrap: find users with similar onboarding choices and use their early engagement patterns. New content cold start: a new post has no engagement data yet. Solutions: (1) Creator-quality prior: use the creator's historical engagement rate as a proxy for expected post quality. (2) Content-based ranking: use content embeddings (text/image/video analysis) to estimate relevance independently of engagement. (3) Early engagement signals: rapidly update post features as the first likes/views arrive in real time via streaming pipeline.

System Design: Feed Ranking and Personalization — Candidate Generation, Scoring, and Real-Time Updates (2025)

Feed Architecture Overview

Fan-Out Strategies: Push vs. Pull

Candidate Generation

Ranking Model

Real-Time Feed Updates and Freshness