Low Level Design: Trust and Safety Platform

What Is a Trust and Safety Platform?

A Trust and Safety (T&S) platform is the umbrella system that governs user behavior, identity integrity, and policy enforcement across a product. It combines signals from content moderation, spam detection, fraud prevention, and account integrity into a unified risk engine. The output drives actions ranging from warnings and feature restrictions to full account termination and law enforcement reporting. Designing this platform well is critical because errors affect real people and carry legal and reputational risk.

Data Model


-- Unified user risk profile
CREATE TABLE user_risk_profiles (
    user_id           BIGINT PRIMARY KEY,
    overall_risk      ENUM('low', 'medium', 'high', 'critical') DEFAULT 'low',
    identity_verified BOOLEAN DEFAULT FALSE,
    account_age_days  INT,
    strike_count      INT DEFAULT 0,
    restricted_at     TIMESTAMP,
    suspended_at      TIMESTAMP,
    last_evaluated    TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Policy strikes log
CREATE TABLE strikes (
    id            BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id       BIGINT NOT NULL REFERENCES user_risk_profiles(user_id),
    policy_code   VARCHAR(64) NOT NULL,  -- e.g. HATE_SPEECH, CSAM, FRAUD
    severity      ENUM('minor', 'major', 'critical') NOT NULL,
    source        VARCHAR(64),            -- content_mod, spam, fraud, manual
    content_id    BIGINT,
    issued_at     TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at    TIMESTAMP
);

-- Enforcement actions
CREATE TABLE enforcement_actions (
    id            BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id       BIGINT NOT NULL,
    action_type   ENUM('warning', 'feature_restrict', 'shadowban', 'suspend', 'terminate', 'ler') NOT NULL,
    triggered_by  VARCHAR(64),   -- automated or analyst ID
    reason_codes  JSON,
    actioned_at   TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    reviewed_at   TIMESTAMP
);

Core Algorithm and Workflow

The T&S platform aggregates signals from multiple upstream systems and applies a policy decision engine:

Signal ingestion: Events from content moderation verdicts, spam detection scores, payment fraud flags, login anomaly detectors, and manual analyst reports are published to a central Kafka topic keyed by user_id.
Risk aggregation service: A stateful stream processor (Flink or Kafka Streams) consumes per-user events and maintains a rolling risk score. It weights signals by severity and recency using a decay function: recent violations count more than old ones. The result is written to user_risk_profiles.
Policy decision engine: A rules engine evaluates the current risk profile against a policy matrix (e.g., 3 minor strikes within 30 days = feature restriction; 1 critical strike = immediate suspension). Rules are stored in a config system (not hardcoded) so policy changes deploy without code releases.
Enforcement executor: Actions are written to enforcement_actions and fanned out to downstream services: account service, notification service, and if required, a Law Enforcement Response (LER) queue that generates compliant legal reports.
Analyst review layer: High-stakes actions (termination, LER) are gated by a mandatory senior analyst review before execution. Analysts have a case management UI with the full signal history for the user.

Failure Handling and False Positives

Strike expiry: Strikes have an expires_at field. A background job clears expired strikes and re-evaluates the user risk profile. This prevents permanent penalization for old, minor violations.
Appeals and reinstatement: Suspended users can file an appeal. Appeals route to a dedicated queue with an SLA. If overturned, the enforcement action is reversed, the strike is voided, and the signal that triggered it is flagged for model retraining.
Audit trail immutability: All strikes and enforcement actions are append-only. Reversals add a new row with a reversal_of reference rather than deleting the original, preserving the full audit trail for legal discovery.
Circuit breaker on automated enforcement: If the upstream content moderation system has a known classifier outage, the policy engine pauses automated enforcement and routes items to human review to avoid a wave of false positives.

Scalability Considerations

Read-path caching: user_risk_profiles is read on every user action (post, login, purchase). Cache the risk tier in Redis with a short TTL (60 seconds). A risk tier downgrade (e.g., suspension lifted) invalidates the cache immediately via a pub/sub invalidation message.
Policy rules as data: Store policy rules in a database or config management system. The rules engine loads them on startup and hot-reloads on change events, decoupling policy iteration from the release cycle.
Sharding by user segment: For platforms with hundreds of millions of users, shard the risk aggregation stream by user ID range. Each shard maintains its own stateful processor, and cross-shard lookups (e.g., coordinated inauthentic behavior detection) are handled by a separate graph-analysis batch job.
Tiered storage for audit logs: Recent strikes and actions are in the primary RDBMS. After 90 days, rows are archived to cold storage (S3 + Parquet) queryable via Athena for legal holds, reducing hot DB size.

Summary

A Trust and Safety platform is the policy backbone of a user-facing product. Its core design challenges are: aggregating diverse signals into a coherent risk score, expressing policy rules in a flexible and auditable way, and enforcing actions proportionally while preserving the ability to appeal and remediate errors. In interviews, emphasize the append-only audit trail for legal compliance, the config-driven policy engine for rapid iteration, and the circuit-breaker pattern to prevent classifier outages from causing automated enforcement mistakes at scale.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What systems make up a trust and safety platform at a large tech company?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A trust and safety platform integrates identity verification, fraud detection, content moderation, account integrity, and policy enforcement subsystems. These systems share a common risk signal bus so that a fraud signal on a payment can trigger elevated review of the same account’s content. Companies like Airbnb, Meta, and Google layer these signals to produce a unified account trust score used across products.”
}
},
{
“@type”: “Question”,
“name”: “How do you design an account integrity system to detect fake or compromised accounts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Account integrity systems analyze registration signals (device fingerprint, IP, email domain), login behavior (velocity, location anomalies), and ongoing activity patterns. Graph-based detection identifies clusters of accounts sharing infrastructure. Supervised classifiers trained on confirmed fake accounts provide risk scores. Step-up authentication challenges such as SMS verification are triggered when risk exceeds a threshold, balancing security with user friction.”
}
},
{
“@type”: “Question”,
“name”: “What are the main challenges when designing a trust and safety system for a marketplace?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Marketplace trust and safety must address both sides of a transaction: fraudulent buyers (chargebacks, payment fraud) and fraudulent sellers (counterfeit goods, scams, non-delivery). Systems need to evaluate listing authenticity, seller reputation history, and buyer payment risk in real time before a transaction commits. Airbnb’s system, for example, also models physical safety risk for in-person interactions, which requires different signals than purely digital fraud.”
}
},
{
“@type”: “Question”,
“name”: “How should appeals and false positive handling be designed in a trust and safety system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Appeals workflows require a separate review queue distinct from initial enforcement, clear communication to users about why action was taken and what evidence would support reinstatement, and an SLA to resolve appeals within a defined window. False positive rates must be monitored by enforcement action type and demographics to detect disparate impact. Automated reinstatement for high-confidence false positives reduces human review load while maintaining accuracy.”
}
}
]
}