Low-Level Design: Content Moderation System – Rules Engine, ML Scoring, and Appeals (2025)

Problem Statement

Design a content moderation system for a social platform. User-generated content (text, images, video) must be screened before or immediately after publishing. The system must handle millions of items per day with minimal false positives (blocking legitimate content) while catching genuine policy violations.

Core Entities

from enum import Enum
from dataclasses import dataclass
from datetime import datetime
from typing import Optional

class ContentType(Enum):
    TEXT  = "TEXT"
    IMAGE = "IMAGE"
    VIDEO = "VIDEO"

class ContentStatus(Enum):
    PENDING   = "PENDING"    # awaiting automated review
    APPROVED  = "APPROVED"   # passed all checks, visible
    REJECTED  = "REJECTED"   # violated policy, hidden
    APPEALED  = "APPEALED"   # user contested rejection

@dataclass
class ContentItem:
    content_id:   str
    type:         ContentType
    author_id:    str
    raw_content:  str           # text or storage URL for media
    status:       ContentStatus
    created_at:   datetime
    ml_score:     Optional[float] = None   # 0.0 (clean) to 1.0 (toxic)
    reject_reason: Optional[str] = None

@dataclass
class ModerationDecision:
    content_id: str
    decision:   ContentStatus    # APPROVED / REJECTED
    reason:     str
    confidence: float
    reviewer_id: Optional[str] = None  # None = automated

Moderation Pipeline

Content flows through layers from fast/cheap to slow/expensive. Most content is resolved automatically; borderline cases escalate to humans.

Content Submitted
      |
      v
[1] Rule Engine (sync,  REJECTED immediately
      |-- ALLOW -> continue to ML
      |-- FLAG  -> bump priority in ML queue
      |
      v
[2] ML Scoring (async, 100-500ms)
      |-- score  APPROVED (auto-publish)
      |-- score > 0.7  -> REJECTED (auto-reject)
      |-- score 0.3-0.7 -> Human Review Queue
      |
      v
[3] Human Review Queue
      |-- Reviewer claims item
      |-- Decision: APPROVED / REJECTED + reason
      |
      v
[4] Appeals (for rejected content)
      |-- User submits appeal
      |-- Senior reviewer assigned
      |-- Outcome: REINSTATE or UPHOLD

Rule Engine

The rule engine is synchronous and must run before content is stored or visible. It handles clear-cut violations that do not require ML inference.

from dataclasses import dataclass
from typing import Callable
import re

class RuleAction(Enum):
    BLOCK = "BLOCK"    # immediate rejection
    FLAG  = "FLAG"     # send to human review with high priority
    ALLOW = "ALLOW"    # pass to next stage

@dataclass
class Rule:
    rule_id:     str
    description: str
    action:      RuleAction
    evaluate:    Callable  # function(content_item, user_context) -> bool

class RuleEngine:
    def __init__(self):
        self.rules = [
            Rule("keyword_blocklist", "Matches blocked keywords", RuleAction.BLOCK,
                 lambda item, ctx: any(kw in item.raw_content.lower()
                                       for kw in KEYWORD_BLOCKLIST)),

            Rule("url_pattern", "Matches known malicious URL patterns", RuleAction.BLOCK,
                 lambda item, ctx: bool(re.search(MALICIOUS_URL_PATTERN, item.raw_content))),

            Rule("new_account_auto_review",
                 "Accounts < 7 days old get human review",
                 RuleAction.FLAG,
                 lambda item, ctx: ctx.account_age_days  5 times by users",
                 RuleAction.FLAG,
                 lambda item, ctx: ctx.report_count > 5),

            Rule("repeat_offender",
                 "Author had 3+ rejections in past 30 days",
                 RuleAction.FLAG,
                 lambda item, ctx: ctx.author_rejections_30d >= 3),
        ]

    def evaluate(self, content_item, user_context) -> RuleAction:
        for rule in self.rules:
            if rule.evaluate(content_item, user_context):
                return rule.action
        return RuleAction.ALLOW

Rules are stored in a database and loaded at startup (or hot-reloaded). Trust and Safety teams can add/modify rules without a code deploy – this is essential for responding quickly to new abuse patterns.

ML Scoring Service

APPROVE_THRESHOLD = 0.3   # below this: auto-approve
REJECT_THRESHOLD  = 0.7   # above this: auto-reject
# between: human review queue

class MLScoringService:
    def __init__(self, model_client, threshold_approve=0.3, threshold_reject=0.7):
        self.model       = model_client
        self.t_approve   = threshold_approve
        self.t_reject    = threshold_reject

    def score_and_route(self, content_item) -> ModerationDecision:
        score = self.model.predict_toxicity(
            content=content_item.raw_content,
            content_type=content_item.type.value
        )
        content_item.ml_score = score

        if score  self.t_reject:
            return ModerationDecision(
                content_id=content_item.content_id,
                decision=ContentStatus.REJECTED,
                reason="ML score above rejection threshold",
                confidence=score
            )
        else:
            # Route to human review - do not make a decision
            human_review_queue.enqueue(content_item, priority=score)
            return ModerationDecision(
                content_id=content_item.content_id,
                decision=ContentStatus.PENDING,
                reason="Borderline score - human review required",
                confidence=score
            )

Model Considerations

Text: fine-tuned BERT/RoBERTa for toxicity classification
Images: CNN-based classifier, perceptual hash matching against known CSAM/spam hash databases
Video: sample frames at 1fps, run image classifier on each
Threshold tuning: adjust based on precision/recall tradeoff for the platform – lower t_approve catches more borderline content, higher t_reject reduces false positives

Human Review Queue

Queue Design

@dataclass
class ReviewQueueItem:
    content_id:  str
    priority:    float       # higher = more urgent (based on ML score, report count)
    claimed_by:  Optional[str]   # reviewer_id or None
    claimed_at:  Optional[datetime]
    expires_at:  Optional[datetime]  # lock TTL - released if reviewer abandons
    created_at:  datetime

class HumanReviewQueue:
    LOCK_TTL_SECONDS = 300  # 5 minutes to complete review

    def claim_next(self, reviewer_id: str) -> Optional[ReviewQueueItem]:
        """Atomically claim the highest-priority unclaimed item."""
        with db.transaction():
            item = db.query_one(
                """SELECT * FROM review_queue
                   WHERE claimed_by IS NULL
                      OR expires_at  item.expires_at:
            raise LockExpiredError("Lock expired - re-claim item")

        # Persist decision
        db.execute(
            "UPDATE content_items SET status=%s, reject_reason=%s WHERE content_id=%s",
            (decision.value, reason, content_id)
        )
        db.execute("DELETE FROM review_queue WHERE content_id=%s", content_id)
        notify_author(content_id, decision)

The FOR UPDATE SKIP LOCKED Pattern

This is the key to the queue. FOR UPDATE locks the selected row; SKIP LOCKED makes other concurrent claims skip that row instead of blocking. Multiple reviewers can work simultaneously without interfering with each other – each gets a different item.

Appeals Workflow

@dataclass
class Appeal:
    appeal_id:       str
    content_id:      str
    appellant_id:    str        # user submitting appeal
    reason:          str        # user's explanation
    status:          str        # PENDING / UPHELD / OVERTURNED
    senior_reviewer_id: Optional[str]
    created_at:      datetime
    resolved_at:     Optional[datetime]

class AppealsService:
    MAX_APPEALS_PER_USER_PER_DAY = 3

    def submit_appeal(self, user_id: str, content_id: str, reason: str) -> Appeal:
        # Validate content is actually rejected and owned by user
        content = db.get_content(content_id)
        assert content.status == ContentStatus.REJECTED
        assert content.author_id == user_id

        # Rate limit appeals
        today_count = db.count_appeals_today(user_id)
        if today_count >= self.MAX_APPEALS_PER_USER_PER_DAY:
            raise RateLimitError("Daily appeal limit reached")

        # Set content to APPEALED (visible to senior reviewers only)
        db.execute(
            "UPDATE content_items SET status='APPEALED' WHERE content_id=%s", content_id
        )
        appeal = Appeal(
            appeal_id=generate_uuid(),
            content_id=content_id,
            appellant_id=user_id,
            reason=reason,
            status="PENDING",
            senior_reviewer_id=None,
            created_at=datetime.utcnow(),
            resolved_at=None
        )
        db.save(appeal)
        senior_review_queue.enqueue(appeal)
        return appeal

    def resolve_appeal(self, senior_reviewer_id: str, appeal_id: str, uphold_rejection: bool):
        appeal = db.get_appeal(appeal_id)
        new_status = "UPHELD" if uphold_rejection else "OVERTURNED"
        content_status = ContentStatus.REJECTED if uphold_rejection else ContentStatus.APPROVED

        db.execute(
            "UPDATE content_items SET status=%s WHERE content_id=%s",
            (content_status.value, appeal.content_id)
        )
        db.execute(
            "UPDATE appeals SET status=%s, senior_reviewer_id=%s, resolved_at=NOW() WHERE appeal_id=%s",
            (new_status, senior_reviewer_id, appeal_id)
        )
        notify_user(appeal.appellant_id, content_status)

Schema Design

CREATE TABLE content_items (
    content_id      VARCHAR(36) PRIMARY KEY,
    type            VARCHAR(10)  NOT NULL,  -- TEXT / IMAGE / VIDEO
    author_id       VARCHAR(36)  NOT NULL,
    status          VARCHAR(20)  NOT NULL DEFAULT 'PENDING',
    ml_score        DECIMAL(4,3),
    reject_reason   TEXT,
    created_at      TIMESTAMP NOT NULL,
    reviewed_at     TIMESTAMP,
    INDEX idx_author_status (author_id, status),
    INDEX idx_status_created (status, created_at)
);

CREATE TABLE review_queue (
    content_id      VARCHAR(36) PRIMARY KEY,
    priority        DECIMAL(5,4) NOT NULL,
    claimed_by      VARCHAR(36),
    claimed_at      TIMESTAMP,
    expires_at      TIMESTAMP,
    created_at      TIMESTAMP NOT NULL,
    INDEX idx_priority_unclaimed (priority DESC, created_at) WHERE claimed_by IS NULL
);

CREATE TABLE appeals (
    appeal_id           VARCHAR(36) PRIMARY KEY,
    content_id          VARCHAR(36) NOT NULL,
    appellant_id        VARCHAR(36) NOT NULL,
    reason              TEXT NOT NULL,
    status              VARCHAR(20) NOT NULL DEFAULT 'PENDING',
    senior_reviewer_id  VARCHAR(36),
    created_at          TIMESTAMP NOT NULL,
    resolved_at         TIMESTAMP,
    INDEX idx_status (status, created_at)
);

Scale Considerations

Volume: 10M posts/day = ~116/sec. Rule engine handles this synchronously; ML scoring is async via Kafka
ML latency: batch inference for backlog, real-time inference only for high-priority content (verified accounts, trending)
Human review capacity: tune thresholds so only 1-5% of content reaches human review. At 10M/day, 1% = 100k items – need significant reviewer headcount
Queue SLA: high-severity content (CSAM detection, credible threats) gets instant escalation outside normal queue
Audit trail: every status transition logged immutably for regulatory compliance

Interview Checklist

Define the entities and status state machine first
Explain the three-layer pipeline: rules (sync) -> ML (async) -> human
Cover the claim/lock pattern for review queue (FOR UPDATE SKIP LOCKED)
Mention threshold tuning as a business decision, not just a technical one
Address appeals as a requirement – gives users recourse, legally important
Discuss false positive rate explicitly – over-moderation is also a product failure

Frequently Asked Questions

What is the typical pipeline for automated content moderation?

A three-layer pipeline: (1) Synchronous rule engine that runs in under 10ms – catches clear violations like keyword blocklists, known malicious URLs, and new account flags. (2) Async ML scoring that produces a toxicity probability (0-1) – content below 0.3 is auto-approved, above 0.7 is auto-rejected, and borderline 0.3-0.7 goes to human review. (3) Human review queue with claim/lock mechanics for borderline cases. Most platforms aim for 95-99% of content resolved automatically without human review.

How do you prevent multiple reviewers from claiming the same content item?

Use the SELECT … FOR UPDATE SKIP LOCKED pattern in PostgreSQL (or equivalent in other databases). FOR UPDATE locks the selected row; SKIP LOCKED causes other concurrent transactions to skip already-locked rows instead of blocking. This lets multiple reviewers query the queue simultaneously, each getting a different item instantly without waiting. Combine with a lock TTL (e.g., 5 minutes): if a reviewer abandons the item, the lock expires and another reviewer can claim it.

How do you tune ML toxicity thresholds for content moderation?

Threshold tuning is a business decision, not just a technical one. Lowering the approval threshold (e.g., from 0.3 to 0.2) catches more borderline content but increases false positives – legitimate content gets flagged. Raising the rejection threshold reduces false positives but lets more harmful content through. Monitor precision (of rejections, how many were truly violations) and recall (of actual violations, how many did you catch). Also track appeal overturn rate – high overturn rate means your thresholds or model are too aggressive.

What information should a content moderation appeals system capture?

An appeal record needs: the content ID, the appellant user ID, the user’s stated reason for appeal, timestamp, current status (PENDING/UPHELD/OVERTURNED), and the senior reviewer ID and resolution timestamp once resolved. Rate limit appeals per user per day (e.g., 3 per day) to prevent abuse. Route appeals to senior reviewers separate from the standard queue. Log the original rejection reason alongside the appeal reason so reviewers have full context. Notify the user of the outcome regardless of result.

How does a configurable rule engine differ from hardcoded moderation rules?

A configurable rule engine stores rules in a database rather than in code. Trust and Safety teams can add, modify, or disable rules through an admin interface without a code deploy. This is critical for responding to emerging abuse patterns – a new spam campaign or coordinated harassment attack can be blocked within minutes by adding a rule, not hours or days waiting for a deploy. Rules include the condition (keyword match, regex, account signal), the action (BLOCK, FLAG, ALLOW), and metadata like creator, creation date, and a disable toggle.

{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “What are the core components of a content moderation system design?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A content moderation system typically includes an ingestion pipeline to receive user-generated content, a multi-stage classifier (heuristic rules, ML models, human review queues), an action engine to enforce policy decisions (remove, label, restrict), and an appeals workflow. Scalability requires async processing via message queues such as Kafka, with separate fast-path automated decisions and slower human review paths.” } }, { “@type”: “Question”, “name”: “How do companies like Meta and Google handle content moderation at scale?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Meta and Google use layered approaches combining automated classifiers trained on billions of examples, hash-matching databases like PhotoDNA for known harmful content, and large contractor networks for human review. Both companies prioritize low-latency pre-publication checks for high-risk signals and rely on post-publication async review for borderline content to avoid blocking user flows.” } }, { “@type”: “Question”, “name”: “How do you measure the effectiveness of a content moderation system?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Key metrics include prevalence (the share of violating content seen before action), actioned content rate, false positive rate (legitimate content incorrectly removed), recall on severe policy violations, and mean time to action. Interviewer discussions often focus on the precision-recall tradeoff and how thresholds are tuned differently for hate speech versus spam versus CSAM.” } }, { “@type”: “Question”, “name”: “What data storage considerations apply to content moderation systems?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Content moderation systems store raw content in blob storage, classifier outputs and metadata in a relational or document database, and audit logs immutably for legal compliance. Hash indexes of known violating content enable O(1) lookup. Review queues are typically backed by a durable message broker. Retention policies must balance legal hold requirements with user privacy obligations such as GDPR right-to-erasure.” } } ] }