How does Reddit Hot ranking algorithm work?

Reddit Hot balances recency with popularity: score = log10(max(|ups - downs|, 1)) + sign(ups - downs) * (timestamp - epoch) / 45000. The logarithm dampens vote impact: the first 10 upvotes matter as much as the next 100. The timestamp term ensures newer posts eventually overtake older ones regardless of votes -- a post needs exponentially more votes to stay on the front page as it ages. The /45000 constant means roughly every 12.5 hours, a post needs 10x more net votes to maintain the same ranking position. This creates natural content turnover while rewarding popular content. Other rankings: Best (Wilson score confidence interval for comments), Top (simple net score within time period), and Controversial (high total votes but low net score).

How does Reddit store and display nested comment threads?

Comments form a tree: each has a parent_comment_id (null for top-level). Three storage approaches: (1) Adjacency list -- simplest. Loading requires recursive CTE: WITH RECURSIVE thread AS (base UNION ALL join). Works but slow for deep threads. (2) Materialized path -- store /root/child/grandchild. Query descendants with LIKE prefix. Fast reads, complex moves. (3) Pre-computed cached tree (Reddit approach for hot posts) -- build the full comment tree as a JSON object and cache it. Loading a post with 10,000 comments reads one cached object instead of 10,000 rows. On new comment: update the cache incrementally. Rebuild periodically. Comment ranking within threads uses Wilson score (Best) -- the lower bound of a 95% confidence interval for the true upvote ratio. This ranks comments with more data higher than those with few but perfect votes.

System Design: Design Reddit — Social News, Upvote/Downvote, Ranking, Subreddits, Comment Threading, Moderation

⏱ 6 min read

Reddit is the “front page of the internet” with 50+ million daily active users across 100,000+ active communities (subreddits). Designing Reddit tests your understanding of content ranking algorithms, nested comment systems, community moderation, and scaling a platform where both content creation and consumption are high-volume. This guide covers the core architectural components.

Post Ranking: Hot, Best, New, Top

Reddit uses different ranking algorithms for different views: (1) Hot — balances recency with popularity. Reddit original algorithm: score = log10(max(|ups – downs|, 1)) + sign(ups – downs) * (post_timestamp – epoch) / 45000. The logarithm dampens the effect of votes (the first 10 upvotes matter as much as the next 100). The timestamp term ensures newer posts eventually overtake older ones regardless of votes. A post needs exponentially more votes to stay on the front page as it ages. (2) Best (Wilson score) — for comments. Estimates the “true” upvote ratio with a confidence interval. A comment with 1 upvote and 0 downvotes (100% positive) ranks lower than one with 100 upvotes and 10 downvotes (91% positive) because we have more confidence in the latter. The lower bound of the confidence interval is the ranking score. (3) Top — simple: sort by net score (upvotes – downvotes) within a time period (today, this week, this month, all time). (4) New — sort by timestamp descending. (5) Controversial — posts with roughly equal upvotes and downvotes (high total votes but low net score).

Vote System

Each user can upvote, downvote, or unvote on each post or comment. Data model: vote: user_id, target_id (post or comment), target_type, direction (+1 or -1), created_at. Unique constraint on (user_id, target_id, target_type) — one vote per user per item. Score tracking: each post/comment has a score field (net upvotes – downvotes). On vote: (1) Check if the user already voted. If changing direction: update the vote record and adjust score by 2 (removing old vote + adding new). If same direction: unvote (delete record, adjust score by 1). If new vote: insert record, adjust score by 1. (2) Atomic score update: UPDATE posts SET score = score + delta WHERE id = post_id. Score caching: the score is read on every post display (millions of reads per second for hot posts). Cache the score in Redis. On vote: update both the database and Redis atomically (Redis INCRBY for the delta). Redis serves the score for display; the database is the source of truth. Vote fuzzing: Reddit intentionally fuzzes displayed vote counts (adding random noise) to prevent vote manipulation detection. The actual score is accurate; the displayed upvote/downvote counts are approximate.

Comment Threading

Reddit comments are a tree: each comment can have replies, which can have replies, arbitrarily deep. Data model: comment: comment_id, post_id, parent_comment_id (null for top-level), user_id, text, score, created_at, depth. Storing and querying a comment tree: (1) Adjacency list (parent_comment_id foreign key) — simplest storage. Loading the full tree requires recursive queries or multiple round-trips. PostgreSQL recursive CTE: WITH RECURSIVE thread AS (SELECT * FROM comments WHERE post_id = X AND parent_comment_id IS NULL UNION ALL SELECT c.* FROM comments c JOIN thread t ON c.parent_comment_id = t.comment_id) SELECT * FROM thread. (2) Materialized path — store the full path from root: /comment1/comment5/comment12. Query all descendants with LIKE “/comment1/%”. Fast reads but complex updates when moving comments. (3) Pre-computed — for each post, pre-compute and cache the comment tree as a JSON structure. On new comment: update the cache (append to the correct position). Serve the cached tree on post load. Invalidate and rebuild periodically. Reddit uses approach (3) for hot posts: the comment tree is pre-built and cached. Loading a post with 10,000 comments reads one cached object rather than querying 10,000 rows. Comment collapsing: deeply nested threads are collapsed (“load more replies”) to limit the initial payload size.

Subreddit Architecture

Each subreddit is an independent community with its own posts, rules, moderators, and settings. Data model: subreddit: subreddit_id, name, description, rules, subscriber_count, created_at, settings (allow_images, require_flair, etc.). Subscription: user_id, subreddit_id, created_at. A user subscribes to subreddits; their home feed aggregates posts from all subscribed subreddits. Home feed generation: similar to Twitter feed (see our Twitter News Feed guide). Fanout-on-write for small subreddits: when a post is created, push it to subscriber timelines. For large subreddits (r/AskReddit with 40M+ subscribers): fanout-on-read. Hybrid approach like Twitter celebrity problem. Subreddit-specific feed: posts within a subreddit are ranked by the selected algorithm (hot, new, top). This is a simpler query: SELECT posts WHERE subreddit_id = X ORDER BY hot_score DESC LIMIT 25. Index on (subreddit_id, hot_score). Hot scores are pre-computed and updated periodically (every 5-15 minutes for active subreddits). Cross-posting: a post can appear in multiple subreddits. The post record has a list of subreddit_ids. Each subreddit displays it independently with separate vote counts.

Moderation and Anti-Abuse

Reddit moderation is decentralized: each subreddit has volunteer moderators with tools to manage content. Moderation tools: (1) Remove/approve posts and comments (removed content is hidden from the subreddit but visible to the author). (2) AutoModerator — rule-based automation. Subreddit-specific rules defined in YAML: automatically remove posts from new accounts, require flair on posts, filter posts containing certain keywords. Rules are evaluated on every new post/comment in the subreddit. (3) Ban users from the subreddit (temporary or permanent). (4) Lock posts (prevent new comments). (5) Mod queue — a queue of reported content and items caught by AutoModerator for manual review. Site-wide anti-abuse: (1) Spam detection — ML models identify spam posts (link farms, repetitive content, new accounts posting commercial links). (2) Vote manipulation — detect coordinated voting (multiple accounts from the same IP, accounts that only vote on the same posts). Shadow-ban suspicious accounts (they can still post but no one else sees their content). (3) Rate limiting — new accounts have posting limits (1 post per 10 minutes). Limits relax as the account builds karma. (4) Content policy enforcement — ML models flag content violating site-wide rules (harassment, violence, copyright). Flagged content is reviewed by paid admin staff.