Problem Statement
Design a content moderation system for a social platform. User-generated content (text, images, video) must be screened before or immediately after publishing. The system must handle millions of items per day with minimal false positives (blocking legitimate content) while catching genuine policy violations.
Core Entities
from enum import Enum
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
class ContentType(Enum):
TEXT = "TEXT"
IMAGE = "IMAGE"
VIDEO = "VIDEO"
class ContentStatus(Enum):
PENDING = "PENDING" # awaiting automated review
APPROVED = "APPROVED" # passed all checks, visible
REJECTED = "REJECTED" # violated policy, hidden
APPEALED = "APPEALED" # user contested rejection
@dataclass
class ContentItem:
content_id: str
type: ContentType
author_id: str
raw_content: str # text or storage URL for media
status: ContentStatus
created_at: datetime
ml_score: Optional[float] = None # 0.0 (clean) to 1.0 (toxic)
reject_reason: Optional[str] = None
@dataclass
class ModerationDecision:
content_id: str
decision: ContentStatus # APPROVED / REJECTED
reason: str
confidence: float
reviewer_id: Optional[str] = None # None = automated
Moderation Pipeline
Content flows through layers from fast/cheap to slow/expensive. Most content is resolved automatically; borderline cases escalate to humans.
Content Submitted
|
v
[1] Rule Engine (sync, REJECTED immediately
|-- ALLOW -> continue to ML
|-- FLAG -> bump priority in ML queue
|
v
[2] ML Scoring (async, 100-500ms)
|-- score APPROVED (auto-publish)
|-- score > 0.7 -> REJECTED (auto-reject)
|-- score 0.3-0.7 -> Human Review Queue
|
v
[3] Human Review Queue
|-- Reviewer claims item
|-- Decision: APPROVED / REJECTED + reason
|
v
[4] Appeals (for rejected content)
|-- User submits appeal
|-- Senior reviewer assigned
|-- Outcome: REINSTATE or UPHOLD
Rule Engine
The rule engine is synchronous and must run before content is stored or visible. It handles clear-cut violations that do not require ML inference.
from dataclasses import dataclass
from typing import Callable
import re
class RuleAction(Enum):
BLOCK = "BLOCK" # immediate rejection
FLAG = "FLAG" # send to human review with high priority
ALLOW = "ALLOW" # pass to next stage
@dataclass
class Rule:
rule_id: str
description: str
action: RuleAction
evaluate: Callable # function(content_item, user_context) -> bool
class RuleEngine:
def __init__(self):
self.rules = [
Rule("keyword_blocklist", "Matches blocked keywords", RuleAction.BLOCK,
lambda item, ctx: any(kw in item.raw_content.lower()
for kw in KEYWORD_BLOCKLIST)),
Rule("url_pattern", "Matches known malicious URL patterns", RuleAction.BLOCK,
lambda item, ctx: bool(re.search(MALICIOUS_URL_PATTERN, item.raw_content))),
Rule("new_account_auto_review",
"Accounts < 7 days old get human review",
RuleAction.FLAG,
lambda item, ctx: ctx.account_age_days 5 times by users",
RuleAction.FLAG,
lambda item, ctx: ctx.report_count > 5),
Rule("repeat_offender",
"Author had 3+ rejections in past 30 days",
RuleAction.FLAG,
lambda item, ctx: ctx.author_rejections_30d >= 3),
]
def evaluate(self, content_item, user_context) -> RuleAction:
for rule in self.rules:
if rule.evaluate(content_item, user_context):
return rule.action
return RuleAction.ALLOW
Rules are stored in a database and loaded at startup (or hot-reloaded). Trust and Safety teams can add/modify rules without a code deploy – this is essential for responding quickly to new abuse patterns.
ML Scoring Service
APPROVE_THRESHOLD = 0.3 # below this: auto-approve
REJECT_THRESHOLD = 0.7 # above this: auto-reject
# between: human review queue
class MLScoringService:
def __init__(self, model_client, threshold_approve=0.3, threshold_reject=0.7):
self.model = model_client
self.t_approve = threshold_approve
self.t_reject = threshold_reject
def score_and_route(self, content_item) -> ModerationDecision:
score = self.model.predict_toxicity(
content=content_item.raw_content,
content_type=content_item.type.value
)
content_item.ml_score = score
if score self.t_reject:
return ModerationDecision(
content_id=content_item.content_id,
decision=ContentStatus.REJECTED,
reason="ML score above rejection threshold",
confidence=score
)
else:
# Route to human review - do not make a decision
human_review_queue.enqueue(content_item, priority=score)
return ModerationDecision(
content_id=content_item.content_id,
decision=ContentStatus.PENDING,
reason="Borderline score - human review required",
confidence=score
)
Model Considerations
- Text: fine-tuned BERT/RoBERTa for toxicity classification
- Images: CNN-based classifier, perceptual hash matching against known CSAM/spam hash databases
- Video: sample frames at 1fps, run image classifier on each
- Threshold tuning: adjust based on precision/recall tradeoff for the platform – lower t_approve catches more borderline content, higher t_reject reduces false positives
Human Review Queue
Queue Design
@dataclass
class ReviewQueueItem:
content_id: str
priority: float # higher = more urgent (based on ML score, report count)
claimed_by: Optional[str] # reviewer_id or None
claimed_at: Optional[datetime]
expires_at: Optional[datetime] # lock TTL - released if reviewer abandons
created_at: datetime
class HumanReviewQueue:
LOCK_TTL_SECONDS = 300 # 5 minutes to complete review
def claim_next(self, reviewer_id: str) -> Optional[ReviewQueueItem]:
"""Atomically claim the highest-priority unclaimed item."""
with db.transaction():
item = db.query_one(
"""SELECT * FROM review_queue
WHERE claimed_by IS NULL
OR expires_at item.expires_at:
raise LockExpiredError("Lock expired - re-claim item")
# Persist decision
db.execute(
"UPDATE content_items SET status=%s, reject_reason=%s WHERE content_id=%s",
(decision.value, reason, content_id)
)
db.execute("DELETE FROM review_queue WHERE content_id=%s", content_id)
notify_author(content_id, decision)
The FOR UPDATE SKIP LOCKED Pattern
This is the key to the queue. FOR UPDATE locks the selected row; SKIP LOCKED makes other concurrent claims skip that row instead of blocking. Multiple reviewers can work simultaneously without interfering with each other – each gets a different item.
Appeals Workflow
@dataclass
class Appeal:
appeal_id: str
content_id: str
appellant_id: str # user submitting appeal
reason: str # user's explanation
status: str # PENDING / UPHELD / OVERTURNED
senior_reviewer_id: Optional[str]
created_at: datetime
resolved_at: Optional[datetime]
class AppealsService:
MAX_APPEALS_PER_USER_PER_DAY = 3
def submit_appeal(self, user_id: str, content_id: str, reason: str) -> Appeal:
# Validate content is actually rejected and owned by user
content = db.get_content(content_id)
assert content.status == ContentStatus.REJECTED
assert content.author_id == user_id
# Rate limit appeals
today_count = db.count_appeals_today(user_id)
if today_count >= self.MAX_APPEALS_PER_USER_PER_DAY:
raise RateLimitError("Daily appeal limit reached")
# Set content to APPEALED (visible to senior reviewers only)
db.execute(
"UPDATE content_items SET status='APPEALED' WHERE content_id=%s", content_id
)
appeal = Appeal(
appeal_id=generate_uuid(),
content_id=content_id,
appellant_id=user_id,
reason=reason,
status="PENDING",
senior_reviewer_id=None,
created_at=datetime.utcnow(),
resolved_at=None
)
db.save(appeal)
senior_review_queue.enqueue(appeal)
return appeal
def resolve_appeal(self, senior_reviewer_id: str, appeal_id: str, uphold_rejection: bool):
appeal = db.get_appeal(appeal_id)
new_status = "UPHELD" if uphold_rejection else "OVERTURNED"
content_status = ContentStatus.REJECTED if uphold_rejection else ContentStatus.APPROVED
db.execute(
"UPDATE content_items SET status=%s WHERE content_id=%s",
(content_status.value, appeal.content_id)
)
db.execute(
"UPDATE appeals SET status=%s, senior_reviewer_id=%s, resolved_at=NOW() WHERE appeal_id=%s",
(new_status, senior_reviewer_id, appeal_id)
)
notify_user(appeal.appellant_id, content_status)
Schema Design
CREATE TABLE content_items (
content_id VARCHAR(36) PRIMARY KEY,
type VARCHAR(10) NOT NULL, -- TEXT / IMAGE / VIDEO
author_id VARCHAR(36) NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
ml_score DECIMAL(4,3),
reject_reason TEXT,
created_at TIMESTAMP NOT NULL,
reviewed_at TIMESTAMP,
INDEX idx_author_status (author_id, status),
INDEX idx_status_created (status, created_at)
);
CREATE TABLE review_queue (
content_id VARCHAR(36) PRIMARY KEY,
priority DECIMAL(5,4) NOT NULL,
claimed_by VARCHAR(36),
claimed_at TIMESTAMP,
expires_at TIMESTAMP,
created_at TIMESTAMP NOT NULL,
INDEX idx_priority_unclaimed (priority DESC, created_at) WHERE claimed_by IS NULL
);
CREATE TABLE appeals (
appeal_id VARCHAR(36) PRIMARY KEY,
content_id VARCHAR(36) NOT NULL,
appellant_id VARCHAR(36) NOT NULL,
reason TEXT NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
senior_reviewer_id VARCHAR(36),
created_at TIMESTAMP NOT NULL,
resolved_at TIMESTAMP,
INDEX idx_status (status, created_at)
);
Scale Considerations
- Volume: 10M posts/day = ~116/sec. Rule engine handles this synchronously; ML scoring is async via Kafka
- ML latency: batch inference for backlog, real-time inference only for high-priority content (verified accounts, trending)
- Human review capacity: tune thresholds so only 1-5% of content reaches human review. At 10M/day, 1% = 100k items – need significant reviewer headcount
- Queue SLA: high-severity content (CSAM detection, credible threats) gets instant escalation outside normal queue
- Audit trail: every status transition logged immutably for regulatory compliance
Interview Checklist
- Define the entities and status state machine first
- Explain the three-layer pipeline: rules (sync) -> ML (async) -> human
- Cover the claim/lock pattern for review queue (FOR UPDATE SKIP LOCKED)
- Mention threshold tuning as a business decision, not just a technical one
- Address appeals as a requirement – gives users recourse, legally important
- Discuss false positive rate explicitly – over-moderation is also a product failure
Frequently Asked Questions
A three-layer pipeline: (1) Synchronous rule engine that runs in under 10ms – catches clear violations like keyword blocklists, known malicious URLs, and new account flags. (2) Async ML scoring that produces a toxicity probability (0-1) – content below 0.3 is auto-approved, above 0.7 is auto-rejected, and borderline 0.3-0.7 goes to human review. (3) Human review queue with claim/lock mechanics for borderline cases. Most platforms aim for 95-99% of content resolved automatically without human review.
Use the SELECT … FOR UPDATE SKIP LOCKED pattern in PostgreSQL (or equivalent in other databases). FOR UPDATE locks the selected row; SKIP LOCKED causes other concurrent transactions to skip already-locked rows instead of blocking. This lets multiple reviewers query the queue simultaneously, each getting a different item instantly without waiting. Combine with a lock TTL (e.g., 5 minutes): if a reviewer abandons the item, the lock expires and another reviewer can claim it.
Threshold tuning is a business decision, not just a technical one. Lowering the approval threshold (e.g., from 0.3 to 0.2) catches more borderline content but increases false positives – legitimate content gets flagged. Raising the rejection threshold reduces false positives but lets more harmful content through. Monitor precision (of rejections, how many were truly violations) and recall (of actual violations, how many did you catch). Also track appeal overturn rate – high overturn rate means your thresholds or model are too aggressive.
An appeal record needs: the content ID, the appellant user ID, the user’s stated reason for appeal, timestamp, current status (PENDING/UPHELD/OVERTURNED), and the senior reviewer ID and resolution timestamp once resolved. Rate limit appeals per user per day (e.g., 3 per day) to prevent abuse. Route appeals to senior reviewers separate from the standard queue. Log the original rejection reason alongside the appeal reason so reviewers have full context. Notify the user of the outcome regardless of result.
A configurable rule engine stores rules in a database rather than in code. Trust and Safety teams can add, modify, or disable rules through an admin interface without a code deploy. This is critical for responding to emerging abuse patterns – a new spam campaign or coordinated harassment attack can be blocked within minutes by adding a rule, not hours or days waiting for a deploy. Rules include the condition (keyword match, regex, account signal), the action (BLOCK, FLAG, ALLOW), and metadata like creator, creation date, and a disable toggle.
See also: Meta Software Engineer Interview Guide – News Feed and Trust & Safety
See also: Twitter/X Software Engineer Interview Guide – Trust & Safety
See also: Snap Software Engineer Interview Guide – Content Safety
{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “What are the core components of a content moderation system design?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A content moderation system typically includes an ingestion pipeline to receive user-generated content, a multi-stage classifier (heuristic rules, ML models, human review queues), an action engine to enforce policy decisions (remove, label, restrict), and an appeals workflow. Scalability requires async processing via message queues such as Kafka, with separate fast-path automated decisions and slower human review paths.” } }, { “@type”: “Question”, “name”: “How do companies like Meta and Google handle content moderation at scale?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Meta and Google use layered approaches combining automated classifiers trained on billions of examples, hash-matching databases like PhotoDNA for known harmful content, and large contractor networks for human review. Both companies prioritize low-latency pre-publication checks for high-risk signals and rely on post-publication async review for borderline content to avoid blocking user flows.” } }, { “@type”: “Question”, “name”: “How do you measure the effectiveness of a content moderation system?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Key metrics include prevalence (the share of violating content seen before action), actioned content rate, false positive rate (legitimate content incorrectly removed), recall on severe policy violations, and mean time to action. Interviewer discussions often focus on the precision-recall tradeoff and how thresholds are tuned differently for hate speech versus spam versus CSAM.” } }, { “@type”: “Question”, “name”: “What data storage considerations apply to content moderation systems?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Content moderation systems store raw content in blob storage, classifier outputs and metadata in a relational or document database, and audit logs immutably for legal compliance. Hash indexes of known violating content enable O(1) lookup. Review queues are typically backed by a durable message broker. Retention policies must balance legal hold requirements with user privacy obligations such as GDPR right-to-erasure.” } } ] }See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Snap Interview Guide