Low Level Design: ML Content Moderation Service

Overview

An ML-powered content moderation service classifies user-submitted content (text, images, video) at scale, routes borderline cases to human reviewers, and supports an appeal workflow with a feedback loop to improve models over time.

Moderation Pipeline

Content submitted
  --> ML Classifier (score 0.0 - 1.0)
  --> Threshold decision: allow / human-review / block
  --> [if review] Human reviewer: allow / block / escalate
  --> Final action applied + audit log written

Data Model

ModerationJob Table

ModerationJob (
  id            UUID PRIMARY KEY,
  content_id    TEXT NOT NULL,
  content_type  ENUM('text','image','video'),
  ml_scores     JSONB,        -- per-classifier scores
  ml_action     ENUM('allow','review','block'),
  human_action  ENUM('allow','block','escalate') NULLABLE,
  final_action  ENUM('allow','block') NOT NULL,
  reviewer_id   INT NULLABLE,
  reviewed_at   TIMESTAMP NULLABLE,
  appeal_status ENUM('none','pending','resolved') DEFAULT 'none',
  created_at    TIMESTAMP DEFAULT NOW()
)

ML Classifiers

Text Classifiers

  • Toxicity detector
  • Spam classifier
  • Hate speech detector
  • Adult content classifier

Each is an independent binary classifier returning a score in [0, 1]. The highest score across classifiers drives the threshold decision.

Image Classifiers

  • NSFW classifier (adult content)
  • Violence detector
  • Logo / trademark detector

Score Thresholds

score > 0.95   -- auto-block (high confidence violation)
score 0.30-0.95 -- human review queue
score < 0.30   -- auto-allow (high confidence clean)

Human Review Queue

Items in the review band are inserted into the review queue ordered by score DESC so reviewers see the worst violations first. Each item shows the content, ML scores per classifier, and suggested action.

Reviewer options: allow, block, escalate (to senior reviewer or legal).

SLA: human review completed within 24 hours.

Appeal Workflow

User appeals blocked content
  --> AppealCase created (links to ModerationJob)
  --> Secondary reviewer assigned
  --> Optional ML re-score with latest model
  --> Decision: uphold block / reverse to allow
  --> User notified of outcome

Feedback Loop

Reviewer decisions (allow/block overrides) are collected as labeled training examples. Weekly retraining runs incorporate reviewer corrections. Model performance is tracked via precision/recall on a held-out validation set before each deployment.

Audit Trail

Every decision — ML or human — is logged with:

  • Actor (model version or reviewer_id)
  • Action taken
  • Rationale / scores at decision time
  • Timestamp

Audit records are immutable and retained per legal/compliance requirements.

Scale and Performance

  • Auto-classification throughput: 1000+ items/second via batched GPU inference
  • Human review SLA: 24 hours for borderline content
  • Async processing: content submitted to SQS → ML worker classifies → result written to ModerationJob → downstream action triggered

Key Design Decisions

  • Separate binary classifiers per category: independent thresholds, easier to tune and retrain per category
  • Score-ordered review queue: worst content reviewed first; minimizes harm during backlog
  • Feedback loop: reviewer corrections continuously improve model quality without manual dataset curation
  • Appeal workflow: second-level review reduces false positive harm to legitimate users

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What score thresholds are used in ML content moderation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A common three-band threshold: score > 0.95 triggers auto-block (high confidence violation), score between 0.30 and 0.95 routes to the human review queue, and score < 0.30 is auto-allowed. Thresholds are tuned per classifier and content type to balance precision and recall.”
}
},
{
“@type”: “Question”,
“name”: “How is the human review queue prioritized in a content moderation system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Items in the review band are ordered by ML score descending so reviewers see the most likely violations first. This minimizes harm during backlog conditions. The SLA for human review is typically 24 hours.”
}
},
{
“@type”: “Question”,
“name”: “How does an appeal workflow work in content moderation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When a user appeals a block, an AppealCase is created linked to the original ModerationJob. A secondary reviewer is assigned and may trigger an ML re-score with the latest model version. The reviewer can uphold the block or reverse it to allow. The user is notified of the outcome.”
}
},
{
“@type”: “Question”,
“name”: “How do reviewer decisions improve ML content moderation models?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Reviewer allow/block decisions on ML-flagged content are collected as labeled training examples. A weekly retraining pipeline incorporates these corrections. Model performance is evaluated on a held-out validation set before each deployment to confirm improvement and catch regressions.”
}
}
]
}

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale

Scroll to Top