Comment System Low-Level Design

Requirements

  • Users can post comments on content items (posts, videos, products)
  • Threaded replies (nested comments) up to N levels deep
  • Voting (upvote/downvote) and sorting (top, new, controversial)
  • Moderation: flag, hide, delete comments; auto-moderation for spam/hate speech
  • 100M content items, 1B total comments, 10K comment writes/second during peaks

Data Model

Comment(comment_id UUID, content_item_id UUID, parent_comment_id UUID NULL,
        author_id UUID, body TEXT, status ENUM(ACTIVE,DELETED,HIDDEN,FLAGGED),
        upvotes INT, downvotes INT, score FLOAT,
        depth INT, path LTREE,  -- for efficient tree queries
        created_at, updated_at, edit_count)

CommentVote(comment_id, user_id, vote ENUM(UP,DOWN), created_at)
-- UNIQUE constraint on (comment_id, user_id)

CommentFlag(flag_id, comment_id, reporter_id, reason, created_at)

Nested Comments: Closure Table vs Path Enumeration

Adjacency list (parent_comment_id): simplest, but fetching the entire subtree requires recursive queries. In PostgreSQL: WITH RECURSIVE cte AS (…). Efficient for simple trees.

Path enumeration (LTREE): store the full path from root to the comment as a string: “root.child1.grandchild2”. PostgreSQL’s LTREE extension supports efficient subtree queries and ordering. Path = concatenation of comment IDs: “a1b2.c3d4.e5f6”. To get all descendants: WHERE path ~ ‘a1b2.*’. Insert: path = parent.path + ‘.’ + new_comment_id.

Closure table: a separate table stores (ancestor_id, descendant_id, depth) for every ancestor-descendant pair. Efficient tree queries (fetch all descendants in one query), but writes insert O(depth) rows per comment. Best for deep trees with frequent traversal queries.

Comment Sorting

# Sort by "top" (Reddit-style Wilson score lower bound):
def wilson_score(upvotes, downvotes):
    n = upvotes + downvotes
    if n == 0: return 0
    z = 1.96  # 95% confidence
    p = upvotes / n
    return (p + z*z/(2*n) - z*sqrt(p*(1-p)/n + z*z/(4*n*n))) / (1 + z*z/n)

# Sort by "hot" (time-decay):
score = (upvotes - downvotes) / (age_hours + 2) ** 1.5

# Sort by "new": ORDER BY created_at DESC
# Sort by "controversial": high total votes AND close ratio
score = (upvotes + downvotes) * min(upvotes, downvotes) / max(upvotes, downvotes)

Vote Handling and Race Conditions

Upvoting and downvoting require atomicity — two concurrent votes must not double-count. Use UPSERT to handle vote changes:

-- Cast or change a vote:
INSERT INTO CommentVote (comment_id, user_id, vote)
VALUES (:cid, :uid, 'UP')
ON CONFLICT (comment_id, user_id)
DO UPDATE SET vote = EXCLUDED.vote;

-- Update comment score atomically (trigger or application logic):
UPDATE Comment SET upvotes = upvotes + 1 WHERE comment_id = :cid;

For vote counts, use a counter cache (upvotes, downvotes columns) updated with each vote. Denormalized but avoids a COUNT(*) query on votes on every read. On vote change (up → down): upvotes -= 1, downvotes += 1 in a single transaction.

Comment Pagination (Cursor-Based)

Load top-level comments for a content item with cursor pagination. Cursor = (score, comment_id) for stable ordering. First page: ORDER BY score DESC, comment_id DESC LIMIT 20. Next page: WHERE (score, comment_id) < (cursor_score, cursor_id). This handles score ties without re-ordering. For lazy-loading replies: load top-level comments first, load replies on demand (click "N replies") using parent_comment_id = :parent_id ORDER BY score DESC LIMIT 10.

Auto-Moderation

Spam and hate speech detection: (1) Rule-based: block comments containing words from a banned word list, comments from recently-created accounts, duplicate comments (same text posted multiple times). (2) ML classifier: text classification model (BERT fine-tuned on moderation labels) scores each comment 0-1 for toxicity. If score > 0.9: auto-hide. If 0.7-0.9: flag for human review. (3) User reports: if a comment receives N flags from different users, auto-hide pending review. Rate limiting: limit users to K comments per hour to prevent spam floods.

Key Design Decisions

  • LTREE path enumeration for nested comments — efficient subtree queries in PostgreSQL
  • Wilson score for top sorting — statistically sound, handles low-vote comments correctly
  • UPSERT for votes — atomic, handles vote changes without duplicates
  • Cursor-based pagination — stable ordering even as new votes change scores
  • Auto-moderation + human review — ML for speed, humans for edge cases

Twitter system design covers comment and reply threading. See common questions for Twitter interview: comment and threading system design.

Snap system design covers comments and user engagement. Review patterns for Snap interview: comment and engagement system design.

Amazon system design covers product reviews and comment systems. See design patterns for Amazon interview: product review and comment system design.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Atlassian Interview Guide

Scroll to Top