Requirements
- Users can post comments on content items (posts, videos, products)
- Threaded replies (nested comments) up to N levels deep
- Voting (upvote/downvote) and sorting (top, new, controversial)
- Moderation: flag, hide, delete comments; auto-moderation for spam/hate speech
- 100M content items, 1B total comments, 10K comment writes/second during peaks
Data Model
Comment(comment_id UUID, content_item_id UUID, parent_comment_id UUID NULL,
author_id UUID, body TEXT, status ENUM(ACTIVE,DELETED,HIDDEN,FLAGGED),
upvotes INT, downvotes INT, score FLOAT,
depth INT, path LTREE, -- for efficient tree queries
created_at, updated_at, edit_count)
CommentVote(comment_id, user_id, vote ENUM(UP,DOWN), created_at)
-- UNIQUE constraint on (comment_id, user_id)
CommentFlag(flag_id, comment_id, reporter_id, reason, created_at)
Nested Comments: Closure Table vs Path Enumeration
Adjacency list (parent_comment_id): simplest, but fetching the entire subtree requires recursive queries. In PostgreSQL: WITH RECURSIVE cte AS (…). Efficient for simple trees.
Path enumeration (LTREE): store the full path from root to the comment as a string: “root.child1.grandchild2”. PostgreSQL’s LTREE extension supports efficient subtree queries and ordering. Path = concatenation of comment IDs: “a1b2.c3d4.e5f6”. To get all descendants: WHERE path ~ ‘a1b2.*’. Insert: path = parent.path + ‘.’ + new_comment_id.
Closure table: a separate table stores (ancestor_id, descendant_id, depth) for every ancestor-descendant pair. Efficient tree queries (fetch all descendants in one query), but writes insert O(depth) rows per comment. Best for deep trees with frequent traversal queries.
Comment Sorting
# Sort by "top" (Reddit-style Wilson score lower bound):
def wilson_score(upvotes, downvotes):
n = upvotes + downvotes
if n == 0: return 0
z = 1.96 # 95% confidence
p = upvotes / n
return (p + z*z/(2*n) - z*sqrt(p*(1-p)/n + z*z/(4*n*n))) / (1 + z*z/n)
# Sort by "hot" (time-decay):
score = (upvotes - downvotes) / (age_hours + 2) ** 1.5
# Sort by "new": ORDER BY created_at DESC
# Sort by "controversial": high total votes AND close ratio
score = (upvotes + downvotes) * min(upvotes, downvotes) / max(upvotes, downvotes)
Vote Handling and Race Conditions
Upvoting and downvoting require atomicity — two concurrent votes must not double-count. Use UPSERT to handle vote changes:
-- Cast or change a vote: INSERT INTO CommentVote (comment_id, user_id, vote) VALUES (:cid, :uid, 'UP') ON CONFLICT (comment_id, user_id) DO UPDATE SET vote = EXCLUDED.vote; -- Update comment score atomically (trigger or application logic): UPDATE Comment SET upvotes = upvotes + 1 WHERE comment_id = :cid;
For vote counts, use a counter cache (upvotes, downvotes columns) updated with each vote. Denormalized but avoids a COUNT(*) query on votes on every read. On vote change (up → down): upvotes -= 1, downvotes += 1 in a single transaction.
Comment Pagination (Cursor-Based)
Load top-level comments for a content item with cursor pagination. Cursor = (score, comment_id) for stable ordering. First page: ORDER BY score DESC, comment_id DESC LIMIT 20. Next page: WHERE (score, comment_id) < (cursor_score, cursor_id). This handles score ties without re-ordering. For lazy-loading replies: load top-level comments first, load replies on demand (click "N replies") using parent_comment_id = :parent_id ORDER BY score DESC LIMIT 10.
Auto-Moderation
Spam and hate speech detection: (1) Rule-based: block comments containing words from a banned word list, comments from recently-created accounts, duplicate comments (same text posted multiple times). (2) ML classifier: text classification model (BERT fine-tuned on moderation labels) scores each comment 0-1 for toxicity. If score > 0.9: auto-hide. If 0.7-0.9: flag for human review. (3) User reports: if a comment receives N flags from different users, auto-hide pending review. Rate limiting: limit users to K comments per hour to prevent spam floods.
Key Design Decisions
- LTREE path enumeration for nested comments — efficient subtree queries in PostgreSQL
- Wilson score for top sorting — statistically sound, handles low-vote comments correctly
- UPSERT for votes — atomic, handles vote changes without duplicates
- Cursor-based pagination — stable ordering even as new votes change scores
- Auto-moderation + human review — ML for speed, humans for edge cases
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do you store and query nested (threaded) comments efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Three data models for hierarchical comments: (1) Adjacency list (parent_comment_id column): simplest. Fetching the full thread requires recursive SQL (WITH RECURSIVE). Good for shallow trees (2-3 levels). O(depth) queries. (2) Path enumeration (LTREE in PostgreSQL): each comment stores its full path from root: "root_id.child_id.grandchild_id". Fetch all descendants with path ~ 'root_id.*'. Insert requires knowing parent's path. O(1) query for subtree. Path string grows with depth. (3) Closure table: a separate table with (ancestor_id, descendant_id, depth) rows for every ancestor-descendant pair. Fetching all descendants: SELECT descendant_id FROM closure WHERE ancestor_id = X. Inserting a new comment at depth D inserts D+1 rows. Best for arbitrary depth with frequent subtree queries. For most production comment systems (Reddit, YouTube), path enumeration or closure table is preferred. Limit maximum nesting depth (e.g., 8 levels) to prevent degenerate trees and simplify rendering.”}},{“@type”:”Question”,”name”:”How does Reddit-style sorting (hot, top, new, controversial) work for comments?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”New: ORDER BY created_at DESC. Simple but buries popular older comments. Top: ORDER BY (upvotes – downvotes) DESC. Problem: a comment with 100 upvotes and 0 downvotes ranks same as one with 200 upvotes and 100 downvotes. Wilson score lower bound: statistically sound ranking for items with positive/negative votes. Formula: (p + z²/(2n) – z*sqrt(p*(1-p)/n + z²/(4n²))) / (1 + z²/n), where p = upvotes/(upvotes+downvotes), z = 1.96 (95% CI), n = total votes. Low-vote comments rank conservatively. Hot (time-decay): score = (upvotes – downvotes) / (age_hours + 2)^1.5. Recent comments with votes rank higher than old comments with the same votes. Controversial: score = (total_votes) * min(ups, downs) / max(ups, downs). High total votes AND close up/down ratio signals controversy. Precompute scores and store them in the score column; recompute on every vote. Build a covering index on (content_item_id, score DESC, comment_id) for efficient pagination.”}},{“@type”:”Question”,”name”:”How do you handle vote race conditions in a comment system?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Two users voting on the same comment simultaneously. Without atomicity: both reads see upvotes=5, both increment to 6, one write is lost. Solutions: (1) UPSERT for the vote record: INSERT INTO CommentVote (comment_id, user_id, vote) VALUES … ON CONFLICT (comment_id, user_id) DO UPDATE SET vote = EXCLUDED.vote. The unique constraint prevents duplicate votes atomically. (2) Atomic counter update: UPDATE Comment SET upvotes = upvotes + 1 WHERE comment_id = X. This is an atomic increment in PostgreSQL — no read-modify-write race. (3) Changing vote (up → down): UPDATE Comment SET upvotes = upvotes – 1, downvotes = downvotes + 1 WHERE comment_id = X AND (SELECT vote FROM CommentVote WHERE …) = 'UP'. Wrap in transaction. (4) Redis atomic: HINCRBY comment:{id} upvotes 1 with Lua script for check-and-increment. Use the DB approach (options 1+2) for correctness; add Redis caching of vote counts for read performance.”}},{“@type”:”Question”,”name”:”How does cursor-based pagination work for comment loading?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Offset pagination (LIMIT 20 OFFSET 40) is unstable for comments: new comments or votes can shift the ordering, causing duplicates or missed items between pages. Cursor pagination: the cursor encodes the position in the sorted order. For comments sorted by (score DESC, comment_id DESC): cursor = (last_score, last_comment_id). Next page query: WHERE (score, comment_id) < (cursor_score, cursor_comment_id) ORDER BY score DESC, comment_id DESC LIMIT 20. This is a keyset pagination query — it skips exactly the items already seen without re-reading them. The composite cursor (score, comment_id) handles ties in score: if two comments have score=0.8, use comment_id as the tiebreaker to ensure a stable ordering. Encode the cursor as base64 for the client. Index required: CREATE INDEX ON Comment (content_item_id, score DESC, comment_id DESC) for efficient keyset pagination.”}},{“@type”:”Question”,”name”:”How does auto-moderation work for a comment system at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Auto-moderation prevents spam and harmful content without requiring manual review of every comment. Multi-layer approach: (1) Rule-based pre-filters (synchronous, O(1)): blocked word list, URL pattern matching (detect spam links), rate limiting (max K comments per hour per user), duplicate detection (same comment posted multiple times in the last N minutes — hash the normalized body). Reject immediately if any rule triggers. (2) ML toxicity classifier (asynchronous): after the comment is posted (status=PENDING_REVIEW), send to a text classification model (Perspective API or fine-tuned BERT). If toxicity score > 0.9: auto-hide (status=HIDDEN). If 0.7-0.9: flag for human review (status=FLAGGED). If < 0.7: publish (status=ACTIVE). (3) Community reporting: if a comment receives N distinct user reports, auto-hide pending review. Escalate to human moderators. Human moderators review FLAGGED and HIDDEN comments, can override the ML decision. Track false positive rate — too many false positives erode trust.”}}]}
Twitter system design covers comment and reply threading. See common questions for Twitter interview: comment and threading system design.
Snap system design covers comments and user engagement. Review patterns for Snap interview: comment and engagement system design.
Amazon system design covers product reviews and comment systems. See design patterns for Amazon interview: product review and comment system design.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering