Q: How does perceptual hashing enable detection of modified copies of violating images?

Cryptographic hashes (MD5, SHA-256) change completely if even one pixel changes — resizing or adding a watermark produces a totally different hash. Perceptual hashes (PhotoDNA, pHash, dHash) hash the image's visual content. They are robust to minor modifications: resizing, cropping, color adjustments, and adding small watermarks produce similar perceptual hashes. Two perceptually similar images produce hashes with low Hamming distance (number of differing bits). Algorithm: dHash converts image to 8x9 grayscale, compares adjacent pixels to produce a 64-bit hash. Two images with Hamming distance <= 10 are considered similar. For CSAM detection: NCMEC maintains a hash database of known CSAM images. Platforms run PhotoDNA on all uploaded images and compare against this database. False positive risk: the low Hamming distance threshold may match legitimate images — human review handles any Bloom filter hits before taking action.

Q: How do you prioritize the human review queue in a content moderation system?

With 100K+ items/day in the review queue, reviewers cannot process everything immediately. Prioritization factors: (1) Content velocity — viral content reaching 1M views should be reviewed before content with 10 views; calculate views-per-hour and prioritize high-velocity items. (2) Violation severity — potential CSAM or credible violence threats are P0, hate speech is P1, spam is P2. (3) User report count — 100 users reporting the same content signals higher urgency than zero reports. (4) Account risk signals — content from accounts with prior violations ranks higher. Implementation: weighted priority score = (severity_weight * severity) + (velocity_weight * views_per_hour) + (report_weight * report_count) + (account_risk_weight * account_risk). Store in a priority queue (Redis Sorted Set keyed by score). Reviewers always pull from the top of the queue. SLA: P0 reviewed within 1 hour, P1 within 24 hours, P2 within 72 hours. Measure queue depth by priority tier and alert when P0 SLA is at risk.

Question 1

How do you build a multi-layer content moderation pipeline and why layer it?

Accepted Answer

A multi-layer pipeline applies cheap checks first and expensive checks last, short-circuiting as soon as a decision is made. Layer 1 — hash matching (O(1)): check the content hash against a Bloom filter of known-violating hashes. PhotoDNA for images, MD5 for exact text matches. Any hit triggers immediate block with zero ML cost. Layer 2 — rule-based filters (O(ms)): regex patterns for spam URLs, keyword blocklists, velocity checks (user posting 100 times/minute). Cheap and fast, catches obvious violations. Layer 3 — ML classifier (O(100ms)): BERT for text toxicity, CNN for image nudity. GPU inference. Returns a confidence score and violation categories. Layer 4 — human review: only borderline confidence scores (0.3-0.8) go here. High-confidence violations (> 0.8) are auto-removed; low-confidence (< 0.3) are auto-allowed. This cascade means 95%+ of content is resolved by layers 1-2, and ML only runs on the remaining 5%, dramatically reducing compute cost and latency.

Question 2

How does perceptual hashing enable detection of modified copies of violating images?

Accepted Answer

Cryptographic hashes (MD5, SHA-256) change completely if even one pixel changes — resizing or adding a watermark produces a totally different hash. Perceptual hashes (PhotoDNA, pHash, dHash) hash the image's visual content. They are robust to minor modifications: resizing, cropping, color adjustments, and adding small watermarks produce similar perceptual hashes. Two perceptually similar images produce hashes with low Hamming distance (number of differing bits). Algorithm: dHash converts image to 8x9 grayscale, compares adjacent pixels to produce a 64-bit hash. Two images with Hamming distance <= 10 are considered similar. For CSAM detection: NCMEC maintains a hash database of known CSAM images. Platforms run PhotoDNA on all uploaded images and compare against this database. False positive risk: the low Hamming distance threshold may match legitimate images — human review handles any Bloom filter hits before taking action.

Question 3

How do you prioritize the human review queue in a content moderation system?

Accepted Answer

With 100K+ items/day in the review queue, reviewers cannot process everything immediately. Prioritization factors: (1) Content velocity — viral content reaching 1M views should be reviewed before content with 10 views; calculate views-per-hour and prioritize high-velocity items. (2) Violation severity — potential CSAM or credible violence threats are P0, hate speech is P1, spam is P2. (3) User report count — 100 users reporting the same content signals higher urgency than zero reports. (4) Account risk signals — content from accounts with prior violations ranks higher. Implementation: weighted priority score = (severity_weight * severity) + (velocity_weight * views_per_hour) + (report_weight * report_count) + (account_risk_weight * account_risk). Store in a priority queue (Redis Sorted Set keyed by score). Reviewers always pull from the top of the queue. SLA: P0 reviewed within 1 hour, P1 within 24 hours, P2 within 72 hours. Measure queue depth by priority tier and alert when P0 SLA is at risk.

System Design Interview: Design a Content Moderation System

What Is a Content Moderation System?

System Requirements

Functional

Non-Functional

Multi-Layer Moderation Pipeline

Hash-Based Detection

ML Classification

Human Review Queue

Signals for Classification

Appeals and Feedback Loop

Scaling Human Review

Interview Tips