System Design: Design Reddit — Social News, Upvote/Downvote, Ranking, Subreddits, Comment Threading, Moderation

Reddit is the “front page of the internet” with 50+ million daily active users across 100,000+ active communities (subreddits). Designing Reddit tests your understanding of content ranking algorithms, nested comment systems, community moderation, and scaling a platform where both content creation and consumption are high-volume. This guide covers the core architectural components.

Post Ranking: Hot, Best, New, Top

Reddit uses different ranking algorithms for different views: (1) Hot — balances recency with popularity. Reddit original algorithm: score = log10(max(|ups – downs|, 1)) + sign(ups – downs) * (post_timestamp – epoch) / 45000. The logarithm dampens the effect of votes (the first 10 upvotes matter as much as the next 100). The timestamp term ensures newer posts eventually overtake older ones regardless of votes. A post needs exponentially more votes to stay on the front page as it ages. (2) Best (Wilson score) — for comments. Estimates the “true” upvote ratio with a confidence interval. A comment with 1 upvote and 0 downvotes (100% positive) ranks lower than one with 100 upvotes and 10 downvotes (91% positive) because we have more confidence in the latter. The lower bound of the confidence interval is the ranking score. (3) Top — simple: sort by net score (upvotes – downvotes) within a time period (today, this week, this month, all time). (4) New — sort by timestamp descending. (5) Controversial — posts with roughly equal upvotes and downvotes (high total votes but low net score).

Vote System

Each user can upvote, downvote, or unvote on each post or comment. Data model: vote: user_id, target_id (post or comment), target_type, direction (+1 or -1), created_at. Unique constraint on (user_id, target_id, target_type) — one vote per user per item. Score tracking: each post/comment has a score field (net upvotes – downvotes). On vote: (1) Check if the user already voted. If changing direction: update the vote record and adjust score by 2 (removing old vote + adding new). If same direction: unvote (delete record, adjust score by 1). If new vote: insert record, adjust score by 1. (2) Atomic score update: UPDATE posts SET score = score + delta WHERE id = post_id. Score caching: the score is read on every post display (millions of reads per second for hot posts). Cache the score in Redis. On vote: update both the database and Redis atomically (Redis INCRBY for the delta). Redis serves the score for display; the database is the source of truth. Vote fuzzing: Reddit intentionally fuzzes displayed vote counts (adding random noise) to prevent vote manipulation detection. The actual score is accurate; the displayed upvote/downvote counts are approximate.

Comment Threading

Reddit comments are a tree: each comment can have replies, which can have replies, arbitrarily deep. Data model: comment: comment_id, post_id, parent_comment_id (null for top-level), user_id, text, score, created_at, depth. Storing and querying a comment tree: (1) Adjacency list (parent_comment_id foreign key) — simplest storage. Loading the full tree requires recursive queries or multiple round-trips. PostgreSQL recursive CTE: WITH RECURSIVE thread AS (SELECT * FROM comments WHERE post_id = X AND parent_comment_id IS NULL UNION ALL SELECT c.* FROM comments c JOIN thread t ON c.parent_comment_id = t.comment_id) SELECT * FROM thread. (2) Materialized path — store the full path from root: /comment1/comment5/comment12. Query all descendants with LIKE “/comment1/%”. Fast reads but complex updates when moving comments. (3) Pre-computed — for each post, pre-compute and cache the comment tree as a JSON structure. On new comment: update the cache (append to the correct position). Serve the cached tree on post load. Invalidate and rebuild periodically. Reddit uses approach (3) for hot posts: the comment tree is pre-built and cached. Loading a post with 10,000 comments reads one cached object rather than querying 10,000 rows. Comment collapsing: deeply nested threads are collapsed (“load more replies”) to limit the initial payload size.

Subreddit Architecture

Each subreddit is an independent community with its own posts, rules, moderators, and settings. Data model: subreddit: subreddit_id, name, description, rules, subscriber_count, created_at, settings (allow_images, require_flair, etc.). Subscription: user_id, subreddit_id, created_at. A user subscribes to subreddits; their home feed aggregates posts from all subscribed subreddits. Home feed generation: similar to Twitter feed (see our Twitter News Feed guide). Fanout-on-write for small subreddits: when a post is created, push it to subscriber timelines. For large subreddits (r/AskReddit with 40M+ subscribers): fanout-on-read. Hybrid approach like Twitter celebrity problem. Subreddit-specific feed: posts within a subreddit are ranked by the selected algorithm (hot, new, top). This is a simpler query: SELECT posts WHERE subreddit_id = X ORDER BY hot_score DESC LIMIT 25. Index on (subreddit_id, hot_score). Hot scores are pre-computed and updated periodically (every 5-15 minutes for active subreddits). Cross-posting: a post can appear in multiple subreddits. The post record has a list of subreddit_ids. Each subreddit displays it independently with separate vote counts.

Moderation and Anti-Abuse

Reddit moderation is decentralized: each subreddit has volunteer moderators with tools to manage content. Moderation tools: (1) Remove/approve posts and comments (removed content is hidden from the subreddit but visible to the author). (2) AutoModerator — rule-based automation. Subreddit-specific rules defined in YAML: automatically remove posts from new accounts, require flair on posts, filter posts containing certain keywords. Rules are evaluated on every new post/comment in the subreddit. (3) Ban users from the subreddit (temporary or permanent). (4) Lock posts (prevent new comments). (5) Mod queue — a queue of reported content and items caught by AutoModerator for manual review. Site-wide anti-abuse: (1) Spam detection — ML models identify spam posts (link farms, repetitive content, new accounts posting commercial links). (2) Vote manipulation — detect coordinated voting (multiple accounts from the same IP, accounts that only vote on the same posts). Shadow-ban suspicious accounts (they can still post but no one else sees their content). (3) Rate limiting — new accounts have posting limits (1 post per 10 minutes). Limits relax as the account builds karma. (4) Content policy enforcement — ML models flag content violating site-wide rules (harassment, violence, copyright). Flagged content is reviewed by paid admin staff.

Scroll to Top