Overview
An ML-powered content moderation service classifies user-submitted content (text, images, video) at scale, routes borderline cases to human reviewers, and supports an appeal workflow with a feedback loop to improve models over time.
Moderation Pipeline
Content submitted
--> ML Classifier (score 0.0 - 1.0)
--> Threshold decision: allow / human-review / block
--> [if review] Human reviewer: allow / block / escalate
--> Final action applied + audit log written
Data Model
ModerationJob Table
ModerationJob (
id UUID PRIMARY KEY,
content_id TEXT NOT NULL,
content_type ENUM('text','image','video'),
ml_scores JSONB, -- per-classifier scores
ml_action ENUM('allow','review','block'),
human_action ENUM('allow','block','escalate') NULLABLE,
final_action ENUM('allow','block') NOT NULL,
reviewer_id INT NULLABLE,
reviewed_at TIMESTAMP NULLABLE,
appeal_status ENUM('none','pending','resolved') DEFAULT 'none',
created_at TIMESTAMP DEFAULT NOW()
)
ML Classifiers
Text Classifiers
- Toxicity detector
- Spam classifier
- Hate speech detector
- Adult content classifier
Each is an independent binary classifier returning a score in [0, 1]. The highest score across classifiers drives the threshold decision.
Image Classifiers
- NSFW classifier (adult content)
- Violence detector
- Logo / trademark detector
Score Thresholds
score > 0.95 -- auto-block (high confidence violation)
score 0.30-0.95 -- human review queue
score < 0.30 -- auto-allow (high confidence clean)
Human Review Queue
Items in the review band are inserted into the review queue ordered by score DESC so reviewers see the worst violations first. Each item shows the content, ML scores per classifier, and suggested action.
Reviewer options: allow, block, escalate (to senior reviewer or legal).
SLA: human review completed within 24 hours.
Appeal Workflow
User appeals blocked content
--> AppealCase created (links to ModerationJob)
--> Secondary reviewer assigned
--> Optional ML re-score with latest model
--> Decision: uphold block / reverse to allow
--> User notified of outcome
Feedback Loop
Reviewer decisions (allow/block overrides) are collected as labeled training examples. Weekly retraining runs incorporate reviewer corrections. Model performance is tracked via precision/recall on a held-out validation set before each deployment.
Audit Trail
Every decision — ML or human — is logged with:
- Actor (model version or reviewer_id)
- Action taken
- Rationale / scores at decision time
- Timestamp
Audit records are immutable and retained per legal/compliance requirements.
Scale and Performance
- Auto-classification throughput: 1000+ items/second via batched GPU inference
- Human review SLA: 24 hours for borderline content
- Async processing: content submitted to SQS → ML worker classifies → result written to ModerationJob → downstream action triggered
Key Design Decisions
- Separate binary classifiers per category: independent thresholds, easier to tune and retrain per category
- Score-ordered review queue: worst content reviewed first; minimizes harm during backlog
- Feedback loop: reviewer corrections continuously improve model quality without manual dataset curation
- Appeal workflow: second-level review reduces false positive harm to legitimate users
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What score thresholds are used in ML content moderation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A common three-band threshold: score > 0.95 triggers auto-block (high confidence violation), score between 0.30 and 0.95 routes to the human review queue, and score < 0.30 is auto-allowed. Thresholds are tuned per classifier and content type to balance precision and recall.”
}
},
{
“@type”: “Question”,
“name”: “How is the human review queue prioritized in a content moderation system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Items in the review band are ordered by ML score descending so reviewers see the most likely violations first. This minimizes harm during backlog conditions. The SLA for human review is typically 24 hours.”
}
},
{
“@type”: “Question”,
“name”: “How does an appeal workflow work in content moderation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When a user appeals a block, an AppealCase is created linked to the original ModerationJob. A secondary reviewer is assigned and may trigger an ML re-score with the latest model version. The reviewer can uphold the block or reverse it to allow. The user is notified of the outcome.”
}
},
{
“@type”: “Question”,
“name”: “How do reviewer decisions improve ML content moderation models?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Reviewer allow/block decisions on ML-flagged content are collected as labeled training examples. A weekly retraining pipeline incorporates these corrections. Model performance is evaluated on a held-out validation set before each deployment to confirm improvement and catch regressions.”
}
}
]
}
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale