Low Level Design: ML Content Moderation Service

Overview

An ML-powered content moderation service classifies user-submitted content (text, images, video) at scale, routes borderline cases to human reviewers, and supports an appeal workflow with a feedback loop to improve models over time.

Moderation Pipeline

Content submitted
  --> ML Classifier (score 0.0 - 1.0)
  --> Threshold decision: allow / human-review / block
  --> [if review] Human reviewer: allow / block / escalate
  --> Final action applied + audit log written

Data Model

ModerationJob Table

ModerationJob (
  id            UUID PRIMARY KEY,
  content_id    TEXT NOT NULL,
  content_type  ENUM('text','image','video'),
  ml_scores     JSONB,        -- per-classifier scores
  ml_action     ENUM('allow','review','block'),
  human_action  ENUM('allow','block','escalate') NULLABLE,
  final_action  ENUM('allow','block') NOT NULL,
  reviewer_id   INT NULLABLE,
  reviewed_at   TIMESTAMP NULLABLE,
  appeal_status ENUM('none','pending','resolved') DEFAULT 'none',
  created_at    TIMESTAMP DEFAULT NOW()
)

ML Classifiers

Text Classifiers

  • Toxicity detector
  • Spam classifier
  • Hate speech detector
  • Adult content classifier

Each is an independent binary classifier returning a score in [0, 1]. The highest score across classifiers drives the threshold decision.

Image Classifiers

  • NSFW classifier (adult content)
  • Violence detector
  • Logo / trademark detector

Score Thresholds

score > 0.95   -- auto-block (high confidence violation)
score 0.30-0.95 -- human review queue
score < 0.30   -- auto-allow (high confidence clean)

Human Review Queue

Items in the review band are inserted into the review queue ordered by score DESC so reviewers see the worst violations first. Each item shows the content, ML scores per classifier, and suggested action.

Reviewer options: allow, block, escalate (to senior reviewer or legal).

SLA: human review completed within 24 hours.

Appeal Workflow

User appeals blocked content
  --> AppealCase created (links to ModerationJob)
  --> Secondary reviewer assigned
  --> Optional ML re-score with latest model
  --> Decision: uphold block / reverse to allow
  --> User notified of outcome

Feedback Loop

Reviewer decisions (allow/block overrides) are collected as labeled training examples. Weekly retraining runs incorporate reviewer corrections. Model performance is tracked via precision/recall on a held-out validation set before each deployment.

Audit Trail

Every decision — ML or human — is logged with:

  • Actor (model version or reviewer_id)
  • Action taken
  • Rationale / scores at decision time
  • Timestamp

Audit records are immutable and retained per legal/compliance requirements.

Scale and Performance

  • Auto-classification throughput: 1000+ items/second via batched GPU inference
  • Human review SLA: 24 hours for borderline content
  • Async processing: content submitted to SQS → ML worker classifies → result written to ModerationJob → downstream action triggered

Key Design Decisions

  • Separate binary classifiers per category: independent thresholds, easier to tune and retrain per category
  • Score-ordered review queue: worst content reviewed first; minimizes harm during backlog
  • Feedback loop: reviewer corrections continuously improve model quality without manual dataset curation
  • Appeal workflow: second-level review reduces false positive harm to legitimate users

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Twitter/X Interview Guide 2026: Timeline Algorithms, Real-Time Search, and Content at Scale

Scroll to Top