What Is a Review Moderation Service?
A review moderation service enforces content policy on user-generated reviews before and after publication. It combines automated ML-based filtering with a confidence-gated human review queue, tracks policy enforcement decisions, and provides an appeals process for wrongly rejected content.
Requirements
Functional Requirements
- Automatically classify submitted reviews against content policy (spam, hate speech, irrelevant content, fake reviews).
- Auto-approve reviews with high-confidence clean classification; route borderline cases to a human moderation queue.
- Allow moderators to approve, reject, or flag reviews for policy category tagging.
- Notify authors of rejection with a policy reason code.
- Accept appeals from authors; route appeals to a senior moderation tier.
Non-Functional Requirements
- Human queue items must be actionable within four hours during business hours (SLA).
- ML model inference must complete within 200 ms per review.
- False positive rate (clean reviews wrongly rejected) must stay below 0.5%.
Data Model
- moderation_case: case_id, review_id, source (AUTO, REPORT, APPEAL), status (OPEN, APPROVED, REJECTED, ESCALATED), assigned_to, ml_scores (JSON: {spam, hate, irrelevant, fake}), policy_categories (array), decision_reason, opened_at, closed_at.
- moderation_action: action_id, case_id, actor_type (ML, HUMAN), actor_id, action (APPROVE, REJECT, ESCALATE, REQUEST_EDIT), policy_category, notes, created_at.
- appeal: appeal_id, review_id, case_id, user_id, appeal_text, status (PENDING, UPHELD, DENIED), assigned_to, created_at, resolved_at.
- policy_rule: rule_id, category, description, auto_reject_threshold, human_review_threshold, active.
Core Algorithms
Confidence-Gated Routing
The ML pipeline returns a score between 0 and 1 for each policy category. For each category the service compares the score against two thresholds from policy_rule: auto_reject_threshold (e.g. 0.9) and human_review_threshold (e.g. 0.5). If any category score exceeds the auto-reject threshold the review is immediately rejected and the author is notified. If any score falls between the two thresholds a moderation case is opened and queued for a human. If all scores fall below the human_review_threshold the review is auto-approved. These thresholds are tunable per category without a code deploy.
Human Queue Priority
The human moderation queue is ordered by a priority score that combines: time in queue (older cases rank higher), reporter credibility if the case was user-reported, and the max ML score across all categories (higher ML confidence in violation ranks higher). Moderators claim cases from the top of their assigned queue. Cases unclaimed for two hours are auto-escalated.
Appeal Routing
An appeal creates a new moderation case linked to the original case and the appeal text. It is routed to a senior moderator pool, not the original moderator who rejected the review. The senior moderator reviews the original case evidence, the appeal argument, and policy guidelines. Upheld appeals re-publish the review and flag the original rejection for moderator feedback. Denied appeals close the case with a final decision that cannot be appealed again.
ML Model Versioning
The classification pipeline supports multiple model versions via a feature flag. When a new model is deployed it runs in shadow mode alongside the production model for 48 hours. Shadow decisions are logged but do not affect routing. A comparison dashboard shows false positive rates and coverage metrics. Once confidence thresholds are met the new model is promoted to primary.
API Design
POST /moderation/cases— internal; creates a case from a review submission or user report. Called by reviews service.GET /moderation/queue?category=spam&limit=20— moderator fetches their work queue.POST /moderation/cases/{case_id}/decision— moderator submits approve/reject/escalate with policy category and optional notes.POST /reviews/{review_id}/report— user reports a published review; creates a moderation case if none exists.POST /reviews/{review_id}/appeal— author submits an appeal with explanation text.GET /moderation/metrics— ops dashboard: queue depth, SLA compliance, false positive rate, moderator throughput.
Scalability and Reliability
Async ML Inference
Review text is placed on a Kafka topic consumed by the ML inference service. Results are written back to the moderation_case row and the review status is updated. This decouples review submission from inference latency. For time-sensitive cases (user reports of live reviews) a high-priority topic is used with dedicated consumer instances to achieve sub-second routing.
Queue Load Balancing
The human moderation queue is partitioned by content category. Moderators are assigned to one or more categories based on their training. This prevents a spike in one category (e.g. a bot campaign producing spam) from starving the queue for other categories. Each partition has an independent SLA monitor that triggers on-call alerts when depth exceeds a threshold.
Policy Rule Hot Reload
Threshold changes and new policy categories are applied without a service restart. The service polls the policy_rule table every 60 seconds and rebuilds its in-memory rule set. This allows trust and safety teams to tighten thresholds immediately during an attack without waiting for a deployment.
Trade-offs and Interview Discussion Points
- Optimistic versus pessimistic gating: auto-publishing and retrospectively moderating reduces submission friction but means some policy violations are briefly live. Pre-publication gating eliminates this at the cost of higher human queue volume.
- Single ML model versus ensemble: a single model is simpler to maintain; an ensemble (one model per policy category) allows independent tuning and retraining on category-specific data, which typically improves precision where it matters most.
- Appeals process design: offering appeals increases author trust and catches false positives but creates a workload for senior moderators. Limiting appeals to one per review and requiring a minimum account age reduces frivolous appeals.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering