Comments System Low-Level Design

Comments System — Low-Level Design

A comments system supports threaded discussion on content: flat comments, nested replies, sorting, pagination, and moderation. This appears in interviews at Reddit, YouTube, and any content platform where user discussion is a core feature.

Core Data Model

Comment
  id              BIGSERIAL PK
  content_id      BIGINT NOT NULL     -- post, video, article being commented on
  content_type    TEXT NOT NULL       -- 'post', 'video', 'product'
  author_id       BIGINT NOT NULL
  parent_id       BIGINT              -- null = top-level, non-null = reply
  body            TEXT NOT NULL
  status          TEXT DEFAULT 'visible'  -- visible, hidden, deleted, flagged
  upvote_count    INT DEFAULT 0       -- denormalized
  reply_count     INT DEFAULT 0       -- denormalized
  depth           INT DEFAULT 0       -- 0=top-level, 1=reply, 2=reply-to-reply
  path            TEXT                -- materialized path: '1/5/23/'
  created_at      TIMESTAMPTZ
  edited_at       TIMESTAMPTZ

-- Indexes
CREATE INDEX idx_comments_content ON Comment(content_id, content_type, status, created_at DESC);
CREATE INDEX idx_comments_parent  ON Comment(parent_id, created_at ASC) WHERE parent_id IS NOT NULL;
CREATE INDEX idx_comments_path    ON Comment(path text_pattern_ops);

Nested Comments: Materialized Path

-- Each comment stores the path from root to itself
-- path = '/' for top-level, '/1/', '/1/5/', '/1/5/23/' for nested

def create_comment(content_id, content_type, author_id, body, parent_id=None):
    if parent_id:
        parent = db.get(Comment, parent_id)
        if parent.depth >= 5:  # Limit nesting depth
            parent_id = parent.id  # Reply to this level but don't go deeper
        path = parent.path + str(parent_id) + '/'
        depth = parent.depth + 1
    else:
        path = '/'
        depth = 0

    comment = Comment.create(
        content_id=content_id,
        content_type=content_type,
        author_id=author_id,
        body=body,
        parent_id=parent_id,
        path=path,
        depth=depth,
    )

    if parent_id:
        # Increment parent's reply count
        db.execute("""
            UPDATE Comment SET reply_count=reply_count+1
            WHERE id=%(id)s
        """, {'id': parent_id})

    return comment

Fetching a Thread

-- Fetch all comments in a thread rooted at a top-level comment
-- Using materialized path: all descendants have path LIKE '/123/%'
SELECT * FROM Comment
WHERE (id = %(root_id)s OR path LIKE %(root_path)s)
  AND status = 'visible'
ORDER BY path ASC, created_at ASC;

-- Reconstruct tree in application code:
def build_tree(flat_comments):
    by_id = {c.id: {**c, 'replies': []} for c in flat_comments}
    roots = []
    for c in flat_comments:
        if c.parent_id and c.parent_id in by_id:
            by_id[c.parent_id]['replies'].append(by_id[c.id])
        else:
            roots.append(by_id[c.id])
    return roots

Top-Level Comment Pagination

-- Sort options: newest, oldest, top (by upvote_count)
-- Cursor pagination for stable results under concurrent new comments

def get_comments(content_id, content_type, sort='newest', cursor=None, limit=20):
    if sort == 'newest':
        order = 'created_at DESC'
        cursor_clause = 'AND created_at < %(cursor)s' if cursor else ''
    elif sort == 'top':
        order = 'upvote_count DESC, created_at DESC'
        cursor_clause = 'AND (upvote_count, created_at) < (%(cursor_votes)s, %(cursor_ts)s)' if cursor else ''

    return db.execute(f"""
        SELECT * FROM Comment
        WHERE content_id=%(cid)s AND content_type=%(ctype)s
          AND parent_id IS NULL
          AND status='visible'
          {cursor_clause}
        ORDER BY {order}
        LIMIT %(limit)s
    """, {'cid': content_id, 'ctype': content_type, 'limit': limit})

Upvoting

CommentVote
  comment_id      BIGINT
  user_id         BIGINT
  PRIMARY KEY (comment_id, user_id)

def upvote_comment(comment_id, user_id):
    # Atomic insert-or-delete toggle
    existing = db.execute("""
        SELECT 1 FROM CommentVote WHERE comment_id=%(cid)s AND user_id=%(uid)s
    """, {'cid': comment_id, 'uid': user_id}).first()

    if existing:
        db.execute("DELETE FROM CommentVote WHERE comment_id=%(cid)s AND user_id=%(uid)s",
                   {'cid': comment_id, 'uid': user_id})
        db.execute("UPDATE Comment SET upvote_count=upvote_count-1 WHERE id=%(id)s",
                   {'id': comment_id})
        return {'action': 'removed'}
    else:
        db.execute("INSERT INTO CommentVote VALUES (%(cid)s, %(uid)s)",
                   {'cid': comment_id, 'uid': user_id})
        db.execute("UPDATE Comment SET upvote_count=upvote_count+1 WHERE id=%(id)s",
                   {'id': comment_id})
        return {'action': 'added'}

Soft Delete

def delete_comment(comment_id, user_id):
    comment = db.get(Comment, comment_id)
    if comment.author_id != user_id:
        raise Forbidden()

    if comment.reply_count > 0:
        # Has replies: soft delete — show "[deleted]" but keep thread structure
        db.execute("""
            UPDATE Comment SET status='deleted', body='[deleted]', author_id=NULL
            WHERE id=%(id)s
        """, {'id': comment_id})
    else:
        # No replies: fully hide
        db.execute("UPDATE Comment SET status='hidden' WHERE id=%(id)s",
                   {'id': comment_id})
        # Decrement parent reply count
        if comment.parent_id:
            db.execute("UPDATE Comment SET reply_count=reply_count-1 WHERE id=%(id)s",
                       {'id': comment.parent_id})

Key Interview Points

  • Materialized path vs adjacency list: Adjacency list (just parent_id) requires recursive CTEs to fetch a thread. Materialized path allows fetching a whole subtree with a simple LIKE query and is much faster for read-heavy threads.
  • Limit nesting depth: Infinite nesting creates UX problems and recursive query risks. Cap at 3-5 levels; collapse deeper replies as flat continuations.
  • Denormalize counts: reply_count and upvote_count on the Comment row avoid COUNT(*) queries on hot paths. Accept occasional drift; reconcile nightly.
  • Sort “top” requires a compound cursor: Sorting by upvote_count is unstable (many comments may have the same count). Add created_at as a tiebreaker in both ORDER BY and cursor to produce stable pagination.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why use a materialized path over a recursive CTE for threaded comments?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A recursive CTE (WITH RECURSIVE) to fetch a thread must traverse the adjacency list level by level — N rounds trips to the database for a thread N levels deep, or a single recursive query that PostgreSQL executes internally with a worktable scan. A materialized path (each comment stores its full ancestry path: ‘/1/5/23/’) allows fetching the entire subtree with a single non-recursive query: WHERE path LIKE ‘/1/%’. This is a B-tree prefix scan — fast and O(subtree size). Trade-off: updating paths when a comment is moved is expensive, but comments are rarely moved.”}},{“@type”:”Question”,”name”:”How do you implement cursor pagination for comments sorted by top votes?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Sort by (upvote_count DESC, created_at DESC) — two columns because many comments have the same vote count. The cursor encodes both values as a tuple. Next page query: WHERE (upvote_count, created_at) < (cursor_votes, cursor_ts). This keyset scan is O(log N) regardless of page depth. Without the two-column cursor, comments with equal vote counts produce non-deterministic ordering, causing items to appear on multiple pages or be skipped entirely. Encode the cursor as base64(JSON) in the API response so clients treat it opaquely.”}},{“@type”:”Question”,”name”:”How do you handle deleting a comment that has replies?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Never fully remove a comment that has child replies — this would leave orphaned replies with no parent in the tree (breaking thread structure). Instead: soft-delete by setting body="[deleted]", status=deleted, and author_id=NULL. The comment node remains in the tree as a placeholder, preserving the path for its children. Only fully hide a comment (and remove its path entry) if it has zero replies. Track reply_count on each comment as a denormalized count to make this check O(1) without a COUNT(*) query.”}},{“@type”:”Question”,”name”:”How do you prevent excessive nesting in threaded comments?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Cap nesting depth at a maximum (3-5 levels is typical for UX). When a user replies to a comment at the maximum depth, attach the new comment as a sibling of the deepest comment rather than a child. Example: max depth=3, user tries to reply to a depth-3 comment — set the new comment’s parent_id to the depth-3 comment’s parent (depth-2 comment) instead. Store the depth on each comment and check it at creation time. Display UI: "Replying to @username" to indicate context even though the nesting was flattened.”}},{“@type”:”Question”,”name”:”How do you denormalize comment counts on a post?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Maintain a comment_count column on the Post table, updated with UPDATE Post SET comment_count=comment_count+1 WHERE id=%(id)s every time a comment is inserted. For deletions: decrement by 1 if the comment was visible (don’t decrement for already-deleted comments). Accept occasional drift — if a transaction fails after inserting the comment but before updating the count, the count will be off by one. Run a reconciliation job nightly: UPDATE Post SET comment_count=(SELECT COUNT(*) FROM Comment WHERE content_id=Post.id AND status=’visible’).”}}]}

Comments system and threaded discussion design is discussed in Meta system design interview questions.

Comments system and reply threading design is covered in Twitter system design interview preparation.

Comments system and document discussion design is discussed in Atlassian system design interview guide.

Scroll to Top