Comments System — Low-Level Design
A comments system supports threaded discussion on content: flat comments, nested replies, sorting, pagination, and moderation. This appears in interviews at Reddit, YouTube, and any content platform where user discussion is a core feature.
Core Data Model
Comment
id BIGSERIAL PK
content_id BIGINT NOT NULL -- post, video, article being commented on
content_type TEXT NOT NULL -- 'post', 'video', 'product'
author_id BIGINT NOT NULL
parent_id BIGINT -- null = top-level, non-null = reply
body TEXT NOT NULL
status TEXT DEFAULT 'visible' -- visible, hidden, deleted, flagged
upvote_count INT DEFAULT 0 -- denormalized
reply_count INT DEFAULT 0 -- denormalized
depth INT DEFAULT 0 -- 0=top-level, 1=reply, 2=reply-to-reply
path TEXT -- materialized path: '1/5/23/'
created_at TIMESTAMPTZ
edited_at TIMESTAMPTZ
-- Indexes
CREATE INDEX idx_comments_content ON Comment(content_id, content_type, status, created_at DESC);
CREATE INDEX idx_comments_parent ON Comment(parent_id, created_at ASC) WHERE parent_id IS NOT NULL;
CREATE INDEX idx_comments_path ON Comment(path text_pattern_ops);
Nested Comments: Materialized Path
-- Each comment stores the path from root to itself
-- path = '/' for top-level, '/1/', '/1/5/', '/1/5/23/' for nested
def create_comment(content_id, content_type, author_id, body, parent_id=None):
if parent_id:
parent = db.get(Comment, parent_id)
if parent.depth >= 5: # Limit nesting depth
parent_id = parent.id # Reply to this level but don't go deeper
path = parent.path + str(parent_id) + '/'
depth = parent.depth + 1
else:
path = '/'
depth = 0
comment = Comment.create(
content_id=content_id,
content_type=content_type,
author_id=author_id,
body=body,
parent_id=parent_id,
path=path,
depth=depth,
)
if parent_id:
# Increment parent's reply count
db.execute("""
UPDATE Comment SET reply_count=reply_count+1
WHERE id=%(id)s
""", {'id': parent_id})
return comment
Fetching a Thread
-- Fetch all comments in a thread rooted at a top-level comment
-- Using materialized path: all descendants have path LIKE '/123/%'
SELECT * FROM Comment
WHERE (id = %(root_id)s OR path LIKE %(root_path)s)
AND status = 'visible'
ORDER BY path ASC, created_at ASC;
-- Reconstruct tree in application code:
def build_tree(flat_comments):
by_id = {c.id: {**c, 'replies': []} for c in flat_comments}
roots = []
for c in flat_comments:
if c.parent_id and c.parent_id in by_id:
by_id[c.parent_id]['replies'].append(by_id[c.id])
else:
roots.append(by_id[c.id])
return roots
Top-Level Comment Pagination
-- Sort options: newest, oldest, top (by upvote_count)
-- Cursor pagination for stable results under concurrent new comments
def get_comments(content_id, content_type, sort='newest', cursor=None, limit=20):
if sort == 'newest':
order = 'created_at DESC'
cursor_clause = 'AND created_at < %(cursor)s' if cursor else ''
elif sort == 'top':
order = 'upvote_count DESC, created_at DESC'
cursor_clause = 'AND (upvote_count, created_at) < (%(cursor_votes)s, %(cursor_ts)s)' if cursor else ''
return db.execute(f"""
SELECT * FROM Comment
WHERE content_id=%(cid)s AND content_type=%(ctype)s
AND parent_id IS NULL
AND status='visible'
{cursor_clause}
ORDER BY {order}
LIMIT %(limit)s
""", {'cid': content_id, 'ctype': content_type, 'limit': limit})
Upvoting
CommentVote
comment_id BIGINT
user_id BIGINT
PRIMARY KEY (comment_id, user_id)
def upvote_comment(comment_id, user_id):
# Atomic insert-or-delete toggle
existing = db.execute("""
SELECT 1 FROM CommentVote WHERE comment_id=%(cid)s AND user_id=%(uid)s
""", {'cid': comment_id, 'uid': user_id}).first()
if existing:
db.execute("DELETE FROM CommentVote WHERE comment_id=%(cid)s AND user_id=%(uid)s",
{'cid': comment_id, 'uid': user_id})
db.execute("UPDATE Comment SET upvote_count=upvote_count-1 WHERE id=%(id)s",
{'id': comment_id})
return {'action': 'removed'}
else:
db.execute("INSERT INTO CommentVote VALUES (%(cid)s, %(uid)s)",
{'cid': comment_id, 'uid': user_id})
db.execute("UPDATE Comment SET upvote_count=upvote_count+1 WHERE id=%(id)s",
{'id': comment_id})
return {'action': 'added'}
Soft Delete
def delete_comment(comment_id, user_id):
comment = db.get(Comment, comment_id)
if comment.author_id != user_id:
raise Forbidden()
if comment.reply_count > 0:
# Has replies: soft delete — show "[deleted]" but keep thread structure
db.execute("""
UPDATE Comment SET status='deleted', body='[deleted]', author_id=NULL
WHERE id=%(id)s
""", {'id': comment_id})
else:
# No replies: fully hide
db.execute("UPDATE Comment SET status='hidden' WHERE id=%(id)s",
{'id': comment_id})
# Decrement parent reply count
if comment.parent_id:
db.execute("UPDATE Comment SET reply_count=reply_count-1 WHERE id=%(id)s",
{'id': comment.parent_id})
Key Interview Points
- Materialized path vs adjacency list: Adjacency list (just parent_id) requires recursive CTEs to fetch a thread. Materialized path allows fetching a whole subtree with a simple LIKE query and is much faster for read-heavy threads.
- Limit nesting depth: Infinite nesting creates UX problems and recursive query risks. Cap at 3-5 levels; collapse deeper replies as flat continuations.
- Denormalize counts: reply_count and upvote_count on the Comment row avoid COUNT(*) queries on hot paths. Accept occasional drift; reconcile nightly.
- Sort “top” requires a compound cursor: Sorting by upvote_count is unstable (many comments may have the same count). Add created_at as a tiebreaker in both ORDER BY and cursor to produce stable pagination.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why use a materialized path over a recursive CTE for threaded comments?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A recursive CTE (WITH RECURSIVE) to fetch a thread must traverse the adjacency list level by level — N rounds trips to the database for a thread N levels deep, or a single recursive query that PostgreSQL executes internally with a worktable scan. A materialized path (each comment stores its full ancestry path: ‘/1/5/23/’) allows fetching the entire subtree with a single non-recursive query: WHERE path LIKE ‘/1/%’. This is a B-tree prefix scan — fast and O(subtree size). Trade-off: updating paths when a comment is moved is expensive, but comments are rarely moved.”}},{“@type”:”Question”,”name”:”How do you implement cursor pagination for comments sorted by top votes?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Sort by (upvote_count DESC, created_at DESC) — two columns because many comments have the same vote count. The cursor encodes both values as a tuple. Next page query: WHERE (upvote_count, created_at) < (cursor_votes, cursor_ts). This keyset scan is O(log N) regardless of page depth. Without the two-column cursor, comments with equal vote counts produce non-deterministic ordering, causing items to appear on multiple pages or be skipped entirely. Encode the cursor as base64(JSON) in the API response so clients treat it opaquely.”}},{“@type”:”Question”,”name”:”How do you handle deleting a comment that has replies?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Never fully remove a comment that has child replies — this would leave orphaned replies with no parent in the tree (breaking thread structure). Instead: soft-delete by setting body="[deleted]", status=deleted, and author_id=NULL. The comment node remains in the tree as a placeholder, preserving the path for its children. Only fully hide a comment (and remove its path entry) if it has zero replies. Track reply_count on each comment as a denormalized count to make this check O(1) without a COUNT(*) query.”}},{“@type”:”Question”,”name”:”How do you prevent excessive nesting in threaded comments?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Cap nesting depth at a maximum (3-5 levels is typical for UX). When a user replies to a comment at the maximum depth, attach the new comment as a sibling of the deepest comment rather than a child. Example: max depth=3, user tries to reply to a depth-3 comment — set the new comment’s parent_id to the depth-3 comment’s parent (depth-2 comment) instead. Store the depth on each comment and check it at creation time. Display UI: "Replying to @username" to indicate context even though the nesting was flattened.”}},{“@type”:”Question”,”name”:”How do you denormalize comment counts on a post?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Maintain a comment_count column on the Post table, updated with UPDATE Post SET comment_count=comment_count+1 WHERE id=%(id)s every time a comment is inserted. For deletions: decrement by 1 if the comment was visible (don’t decrement for already-deleted comments). Accept occasional drift — if a transaction fails after inserting the comment but before updating the count, the count will be off by one. Run a reconciliation job nightly: UPDATE Post SET comment_count=(SELECT COUNT(*) FROM Comment WHERE content_id=Post.id AND status=’visible’).”}}]}
Comments system and threaded discussion design is discussed in Meta system design interview questions.
Comments system and reply threading design is covered in Twitter system design interview preparation.
Comments system and document discussion design is discussed in Atlassian system design interview guide.