Threaded Comment System: Low-Level Design
Threaded comment systems power discussion on news sites, forums, Reddit-style boards, and product pages. The core challenge is representing an arbitrarily deep tree in a relational database while keeping read queries fast and writes cheap. This article walks through the adjacency list with materialized path approach, vote aggregation that prevents double-voting, spam scoring with auto-hide, and soft-delete that preserves reply chains.
Data Model: Adjacency List + Materialized Path
A pure adjacency list stores only parent_id. Fetching a full subtree requires either recursive CTEs or multiple round-trips. Adding a materialized path column (e.g., /1/42/107/) lets you fetch any subtree with a single prefix query and order comments in tree order without post-processing.
SQL Schema
CREATE TABLE Comment (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
post_id BIGINT UNSIGNED NOT NULL,
parent_id BIGINT UNSIGNED NULL, -- NULL = root comment
path VARCHAR(1000) NOT NULL, -- materialized path e.g. /1/42/107/
depth TINYINT UNSIGNED NOT NULL DEFAULT 0,
author_id BIGINT UNSIGNED NOT NULL,
body TEXT NOT NULL,
vote_score INT NOT NULL DEFAULT 0,
spam_score DECIMAL(5,4) NOT NULL DEFAULT 0.0000,
deleted_at DATETIME NULL,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (id),
INDEX idx_post_path (post_id, path),
INDEX idx_parent (parent_id),
INDEX idx_post_score (post_id, vote_score DESC),
INDEX idx_spam (spam_score, deleted_at)
) ENGINE=InnoDB;
CREATE TABLE CommentVote (
comment_id BIGINT UNSIGNED NOT NULL,
user_id BIGINT UNSIGNED NOT NULL,
value TINYINT NOT NULL, -- +1 or -1
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (comment_id, user_id),
INDEX idx_user_votes (user_id)
) ENGINE=InnoDB;
The path column is computed at insert time and never changes. Depth is stored redundantly to avoid counting slashes at query time. The CommentVote composite primary key enforces exactly one vote per user per comment at the database level.
Depth Limit Enforcement
Most platforms cap nesting at 6–10 levels to prevent UI collapse and runaway path lengths. Depth is checked before insert; if exceeded, the reply is re-parented to the deepest allowed ancestor.
Python Implementation
import re
from datetime import datetime
from decimal import Decimal
import db # thin wrapper around mysql-connector-python
MAX_DEPTH = 6
SPAM_AUTO_HIDE_THRESHOLD = 0.75
def post_comment(post_id: int, author_id: int, parent_id: int | None, body: str) -> int:
"""Insert a comment and return its new ID."""
if parent_id is None:
parent_path = "/"
parent_depth = -1
else:
row = db.fetchone(
"SELECT path, depth, deleted_at FROM Comment WHERE id = %s AND post_id = %s",
(parent_id, post_id)
)
if not row:
raise ValueError("Parent comment not found")
if row["deleted_at"] is not None:
raise ValueError("Cannot reply to a deleted comment")
parent_path = row["path"]
parent_depth = row["depth"]
new_depth = parent_depth + 1
if new_depth > MAX_DEPTH:
# flatten: re-parent to the root of the branch at MAX_DEPTH
parts = parent_path.strip("/").split("/")
ancestor_id = int(parts[MAX_DEPTH - 1])
return post_comment(post_id, author_id, ancestor_id, body)
spam = _score_spam(body, author_id)
with db.transaction() as conn:
cursor = conn.cursor()
cursor.execute(
"""INSERT INTO Comment (post_id, parent_id, path, depth, author_id, body, spam_score)
VALUES (%s, %s, %s, %s, %s, %s, %s)""",
(post_id, parent_id, parent_path, new_depth, author_id, body, spam)
)
new_id = cursor.lastrowid
new_path = f"{parent_path}{new_id}/"
cursor.execute("UPDATE Comment SET path = %s WHERE id = %s", (new_path, new_id))
return new_id
def get_thread(post_id: int, sort: str = "best", page: int = 1, page_size: int = 50) -> list[dict]:
"""Return a flat list of comments for a post in tree order."""
order = "vote_score DESC, created_at ASC" if sort == "best" else "path ASC"
offset = (page - 1) * page_size
rows = db.fetchall(
f"""SELECT id, parent_id, depth, author_id, body, vote_score, spam_score,
deleted_at, created_at
FROM Comment
WHERE post_id = %s
ORDER BY {order}
LIMIT %s OFFSET %s""",
(post_id, page_size, offset)
)
result = []
for r in rows:
visible_body = r["body"]
if r["deleted_at"]:
visible_body = "[deleted]"
elif r["spam_score"] >= SPAM_AUTO_HIDE_THRESHOLD:
visible_body = "[flagged as spam — click to reveal]"
result.append({**r, "body": visible_body})
return result
def cast_vote(comment_id: int, user_id: int, value: int) -> dict:
"""Cast or change a vote. value must be +1 or -1."""
if value not in (1, -1):
raise ValueError("Vote value must be +1 or -1")
with db.transaction() as conn:
cursor = conn.cursor(dictionary=True)
cursor.execute(
"SELECT value FROM CommentVote WHERE comment_id = %s AND user_id = %s FOR UPDATE",
(comment_id, user_id)
)
existing = cursor.fetchone()
if existing:
if existing["value"] == value:
# undo vote
cursor.execute(
"DELETE FROM CommentVote WHERE comment_id = %s AND user_id = %s",
(comment_id, user_id)
)
delta = -value
else:
# flip vote
cursor.execute(
"UPDATE CommentVote SET value = %s WHERE comment_id = %s AND user_id = %s",
(value, comment_id, user_id)
)
delta = value * 2
else:
cursor.execute(
"INSERT INTO CommentVote (comment_id, user_id, value) VALUES (%s, %s, %s)",
(comment_id, user_id, value)
)
delta = value
cursor.execute(
"UPDATE Comment SET vote_score = vote_score + %s WHERE id = %s",
(delta, comment_id)
)
cursor.execute("SELECT vote_score FROM Comment WHERE id = %s", (comment_id,))
new_score = cursor.fetchone()["vote_score"]
return {"comment_id": comment_id, "new_score": new_score}
def soft_delete_comment(comment_id: int, actor_id: int) -> None:
"""Soft-delete a comment; replies remain visible."""
db.execute(
"UPDATE Comment SET deleted_at = %s WHERE id = %s",
(datetime.utcnow(), comment_id)
)
def _score_spam(body: str, author_id: int) -> float:
"""Placeholder: real implementation calls an ML service."""
score = 0.0
if re.search(r"(https?://){2,}", body):
score += 0.4
if len(body) < 10:
score += 0.2
return min(score, 1.0)
Soft-Delete Semantics
Setting deleted_at hides the body but preserves the row so that child comments keep a valid parent_id reference. The application substitutes [deleted] at read time. A hard-delete cascade would orphan the entire subtree.
Vote Aggregation and Double-Vote Prevention
The CommentVote primary key (comment_id, user_id) is the single source of truth. The vote_score on Comment is a denormalized counter updated transactionally with the vote row. Using SELECT ... FOR UPDATE prevents concurrent vote flips from racing. On high-traffic comments, a Redis sorted set can absorb vote bursts with periodic flush-back to MySQL.
Spam Scoring and Auto-Hide
Each comment receives a spam_score in [0, 1] from a lightweight classifier (URL count, account age, text entropy). Comments above a threshold (e.g., 0.75) are stored but rendered as collapsed/hidden. Moderators see the raw score in the admin panel and can override. A background job re-scores comments when the model is retrained and updates spam_score in bulk using a partition scan on the index.
Collapse/Expand and Rendering
The front end receives the flat list ordered by path. Each item’s depth tells the renderer how far to indent. Comments with vote_score below a per-depth threshold are collapsed by default. The path prefix query WHERE path LIKE '/1/42/%' fetches any subtree for lazy-load on expand.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between adjacency list and nested sets for threaded comments?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An adjacency list stores only parent_id per row, making writes cheap but subtree reads require recursive CTEs. Nested sets store left/right boundary values enabling fast subtree reads with a range query but making inserts expensive because boundaries must be renumbered. A materialized path (storing the full ancestor chain as a string) is a common middle ground: inserts are O(1) and subtree reads are a single prefix query.”
}
},
{
“@type”: “Question”,
“name”: “How do you prevent double-voting on comments?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Store one row per (comment_id, user_id) pair in a CommentVote table with a composite primary key. The database enforces uniqueness. On a second vote for the same direction, delete the row (undo). On a flip, update the existing row and adjust the denormalized score by 2. Use SELECT FOR UPDATE inside a transaction to prevent race conditions on concurrent vote changes.”
}
},
{
“@type”: “Question”,
“name”: “How does soft-delete work in a threaded comment system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Soft-delete sets a deleted_at timestamp instead of removing the row. Child comments keep valid parent_id references, so the tree structure is preserved. At read time the application substitutes the body with ‘[deleted]’. Hard deletes would orphan the entire subtree and break the path hierarchy.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle spam in a comment system at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Assign each comment a spam_score in [0,1] at write time using a lightweight classifier (URL density, account age, text entropy). Auto-hide comments above a threshold (e.g., 0.75) without deleting them. Store the score in the Comment table with an index so moderators can scan flagged comments efficiently. Re-score in bulk when the model is retrained using a partition scan.”
}
}
]
}
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why use a materialized path alongside adjacency list for comments?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Adjacency list (parent_id) enables easy single-level queries; materialized path enables efficient subtree retrieval with a single LIKE ‘path/%’ query without recursion.”}},{“@type”:”Question”,”name”:”How is double-voting prevented?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”CommentVote has a unique constraint on (comment_id, user_id); an INSERT ON CONFLICT DO NOTHING prevents duplicate votes atomically.”}},{“@type”:”Question”,”name”:”How does soft-delete preserve reply context?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The deleted_at timestamp is set but the row remains; the UI renders “[deleted]” for body/author while replies remain visible with their original content.”}},{“@type”:”Question”,”name”:”How are spam scores computed and thresholds applied?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A background classifier updates spam_score on new comments; comments exceeding a threshold are auto-hidden (visible only to the author and moderators) pending review.”}}]}
See also: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering