Overview
An ML-powered content moderation service classifies user-submitted content (text, images, video) at scale, routes borderline cases to human reviewers, and supports an appeal workflow with a feedback loop to improve models over time.
Moderation Pipeline
Content submitted
--> ML Classifier (score 0.0 - 1.0)
--> Threshold decision: allow / human-review / block
--> [if review] Human reviewer: allow / block / escalate
--> Final action applied + audit log written
Data Model
ModerationJob Table
ModerationJob (
id UUID PRIMARY KEY,
content_id TEXT NOT NULL,
content_type ENUM('text','image','video'),
ml_scores JSONB, -- per-classifier scores
ml_action ENUM('allow','review','block'),
human_action ENUM('allow','block','escalate') NULLABLE,
final_action ENUM('allow','block') NOT NULL,
reviewer_id INT NULLABLE,
reviewed_at TIMESTAMP NULLABLE,
appeal_status ENUM('none','pending','resolved') DEFAULT 'none',
created_at TIMESTAMP DEFAULT NOW()
)
ML Classifiers
Text Classifiers
- Toxicity detector
- Spam classifier
- Hate speech detector
- Adult content classifier
Each is an independent binary classifier returning a score in [0, 1]. The highest score across classifiers drives the threshold decision.
Image Classifiers
- NSFW classifier (adult content)
- Violence detector
- Logo / trademark detector
Score Thresholds
score > 0.95 -- auto-block (high confidence violation)
score 0.30-0.95 -- human review queue
score < 0.30 -- auto-allow (high confidence clean)
Human Review Queue
Items in the review band are inserted into the review queue ordered by score DESC so reviewers see the worst violations first. Each item shows the content, ML scores per classifier, and suggested action.
Reviewer options: allow, block, escalate (to senior reviewer or legal).
SLA: human review completed within 24 hours.
Appeal Workflow
User appeals blocked content
--> AppealCase created (links to ModerationJob)
--> Secondary reviewer assigned
--> Optional ML re-score with latest model
--> Decision: uphold block / reverse to allow
--> User notified of outcome
Feedback Loop
Reviewer decisions (allow/block overrides) are collected as labeled training examples. Weekly retraining runs incorporate reviewer corrections. Model performance is tracked via precision/recall on a held-out validation set before each deployment.
Audit Trail
Every decision — ML or human — is logged with:
- Actor (model version or reviewer_id)
- Action taken
- Rationale / scores at decision time
- Timestamp
Audit records are immutable and retained per legal/compliance requirements.
Scale and Performance
- Auto-classification throughput: 1000+ items/second via batched GPU inference
- Human review SLA: 24 hours for borderline content
- Async processing: content submitted to SQS → ML worker classifies → result written to ModerationJob → downstream action triggered
Key Design Decisions
- Separate binary classifiers per category: independent thresholds, easier to tune and retrain per category
- Score-ordered review queue: worst content reviewed first; minimizes harm during backlog
- Feedback loop: reviewer corrections continuously improve model quality without manual dataset curation
- Appeal workflow: second-level review reduces false positive harm to legitimate users
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale