Requirements
- Users can post comments on content items (posts, videos, products)
- Threaded replies (nested comments) up to N levels deep
- Voting (upvote/downvote) and sorting (top, new, controversial)
- Moderation: flag, hide, delete comments; auto-moderation for spam/hate speech
- 100M content items, 1B total comments, 10K comment writes/second during peaks
Data Model
Comment(comment_id UUID, content_item_id UUID, parent_comment_id UUID NULL,
author_id UUID, body TEXT, status ENUM(ACTIVE,DELETED,HIDDEN,FLAGGED),
upvotes INT, downvotes INT, score FLOAT,
depth INT, path LTREE, -- for efficient tree queries
created_at, updated_at, edit_count)
CommentVote(comment_id, user_id, vote ENUM(UP,DOWN), created_at)
-- UNIQUE constraint on (comment_id, user_id)
CommentFlag(flag_id, comment_id, reporter_id, reason, created_at)
Nested Comments: Closure Table vs Path Enumeration
Adjacency list (parent_comment_id): simplest, but fetching the entire subtree requires recursive queries. In PostgreSQL: WITH RECURSIVE cte AS (…). Efficient for simple trees.
Path enumeration (LTREE): store the full path from root to the comment as a string: “root.child1.grandchild2”. PostgreSQL’s LTREE extension supports efficient subtree queries and ordering. Path = concatenation of comment IDs: “a1b2.c3d4.e5f6”. To get all descendants: WHERE path ~ ‘a1b2.*’. Insert: path = parent.path + ‘.’ + new_comment_id.
Closure table: a separate table stores (ancestor_id, descendant_id, depth) for every ancestor-descendant pair. Efficient tree queries (fetch all descendants in one query), but writes insert O(depth) rows per comment. Best for deep trees with frequent traversal queries.
Comment Sorting
# Sort by "top" (Reddit-style Wilson score lower bound):
def wilson_score(upvotes, downvotes):
n = upvotes + downvotes
if n == 0: return 0
z = 1.96 # 95% confidence
p = upvotes / n
return (p + z*z/(2*n) - z*sqrt(p*(1-p)/n + z*z/(4*n*n))) / (1 + z*z/n)
# Sort by "hot" (time-decay):
score = (upvotes - downvotes) / (age_hours + 2) ** 1.5
# Sort by "new": ORDER BY created_at DESC
# Sort by "controversial": high total votes AND close ratio
score = (upvotes + downvotes) * min(upvotes, downvotes) / max(upvotes, downvotes)
Vote Handling and Race Conditions
Upvoting and downvoting require atomicity — two concurrent votes must not double-count. Use UPSERT to handle vote changes:
-- Cast or change a vote: INSERT INTO CommentVote (comment_id, user_id, vote) VALUES (:cid, :uid, 'UP') ON CONFLICT (comment_id, user_id) DO UPDATE SET vote = EXCLUDED.vote; -- Update comment score atomically (trigger or application logic): UPDATE Comment SET upvotes = upvotes + 1 WHERE comment_id = :cid;
For vote counts, use a counter cache (upvotes, downvotes columns) updated with each vote. Denormalized but avoids a COUNT(*) query on votes on every read. On vote change (up → down): upvotes -= 1, downvotes += 1 in a single transaction.
Comment Pagination (Cursor-Based)
Load top-level comments for a content item with cursor pagination. Cursor = (score, comment_id) for stable ordering. First page: ORDER BY score DESC, comment_id DESC LIMIT 20. Next page: WHERE (score, comment_id) < (cursor_score, cursor_id). This handles score ties without re-ordering. For lazy-loading replies: load top-level comments first, load replies on demand (click "N replies") using parent_comment_id = :parent_id ORDER BY score DESC LIMIT 10.
Auto-Moderation
Spam and hate speech detection: (1) Rule-based: block comments containing words from a banned word list, comments from recently-created accounts, duplicate comments (same text posted multiple times). (2) ML classifier: text classification model (BERT fine-tuned on moderation labels) scores each comment 0-1 for toxicity. If score > 0.9: auto-hide. If 0.7-0.9: flag for human review. (3) User reports: if a comment receives N flags from different users, auto-hide pending review. Rate limiting: limit users to K comments per hour to prevent spam floods.
Key Design Decisions
- LTREE path enumeration for nested comments — efficient subtree queries in PostgreSQL
- Wilson score for top sorting — statistically sound, handles low-vote comments correctly
- UPSERT for votes — atomic, handles vote changes without duplicates
- Cursor-based pagination — stable ordering even as new votes change scores
- Auto-moderation + human review — ML for speed, humans for edge cases
Twitter system design covers comment and reply threading. See common questions for Twitter interview: comment and threading system design.
Snap system design covers comments and user engagement. Review patterns for Snap interview: comment and engagement system design.
Amazon system design covers product reviews and comment systems. See design patterns for Amazon interview: product review and comment system design.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Atlassian Interview Guide