Fraud Detection System Low-Level Design

Requirements

  • Detect fraudulent transactions in real time (<200ms per transaction decision)
  • Flag suspicious user behavior: account takeover, card testing, synthetic identities
  • False positive rate <0.5% (legitimate transactions must not be blocked)
  • 10K transactions/second

Architecture: Rule Engine + ML Scoring

Transaction → Rule Engine (fast, deterministic) → ML Scorer (probabilistic)
                ↓ block/allow               ↓ risk score 0-1
           Decisioning Service → ALLOW / REVIEW / BLOCK
                ↓
           Kafka (events for model training, analytics, manual review)

Rule Engine (First Layer)

Fast deterministic rules that catch obvious fraud. Evaluated in order, short-circuit on first match:

  • Velocity rules: INCR tx_count:{user_id}:{minute_bucket} in Redis. If >5 transactions in 60s: block. TTL=120s.
  • Amount rules: transaction amount >3x the user’s 30-day average → REVIEW
  • Geographic rules: billing address country != IP geolocation country → REVIEW
  • Device fingerprint: device_id associated with previous fraud → block
  • Card BIN check: card BIN (first 6 digits) on blocklist → block
  • Account age: account created <24h ago + high-value transaction → REVIEW

ML Risk Scorer (Second Layer)

For transactions not immediately blocked by rules, compute a risk score. Features:

  • User behavior features: average transaction amount, transaction frequency, typical hours of activity
  • Device features: is_new_device, device_age_days, device_risk_score
  • Network features: IP reputation score, is_VPN/proxy, IP-to-billing-address distance
  • Merchant features: merchant_risk_category, transaction_count_at_merchant
  • Velocity features: transactions in last 1h/24h/7d, amount in last 24h

Model: gradient boosting (XGBoost/LightGBM) trained on labeled fraud/non-fraud transactions. Feature store: Redis for real-time features (velocity counts), Cassandra for historical features (user’s 30-day stats). Model served via ONNX runtime in the fraud service for sub-50ms inference.

Risk Score → Decision

score  0.7:  BLOCK  (decline transaction)

Thresholds tuned based on business risk tolerance. REVIEW actions: send 3D Secure challenge (user verifies via SMS/app), add to manual review queue (agent reviews within 24h), or step-up authentication.

Feature Engineering

Real-time features computed at transaction time:

  • Amount z-score: (amount – user_avg_30d) / user_stddev_30d
  • Time since last transaction for this user
  • Count of distinct merchants in last 24h
  • Is the shipping address new (never used before)?
  • Card testing signal: multiple small (<$1) transactions in last hour (testing if card is valid)

Feedback Loop

Labels come from: (1) chargebacks (confirmed fraud, typically 60-90 days after transaction), (2) user-reported fraud, (3) manual review decisions. Publish labeled examples to Kafka → training pipeline → retrain model weekly. Feature drift detection: if feature distributions shift significantly (e.g., new fraud pattern emerges), trigger retraining. A/B test new model versions: shadow mode (run new model but don’t use its decision) before full deployment.

Data Model

Transaction(tx_id, user_id, merchant_id, amount, currency, device_id,
            ip_address, card_id, status ENUM(PENDING,ALLOWED,BLOCKED,REVIEW),
            risk_score, created_at)
FraudSignal(signal_id, tx_id, signal_type, signal_value, created_at)
UserRiskProfile(user_id, avg_amount_30d, tx_count_30d, last_tx_at, risk_tier)

Key Design Decisions

  • Rule engine runs first (fast) — blocks obvious fraud without ML inference cost
  • ML model for ambiguous cases — probabilistic risk score, not binary
  • Feature store separates real-time features (Redis) from historical features (Cassandra)
  • False positive minimization: REVIEW instead of BLOCK for medium-risk scores
  • Feedback loop: chargebacks and manual review labels retrain the model continuously


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does a rule-based fraud detection system work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A rule engine applies deterministic, fast checks in sequence and short-circuits on the first match. Rules are evaluated in milliseconds and cover obvious fraud patterns: velocity rules (INCR tx_count:{user_id}:{minute} in Redis, block if > 5/min), geographic rules (billing country != IP country → review), amount rules (amount > 3x user's 30-day average → review), device blocklist (device fingerprint associated with prior fraud → block), card BIN blocklist (first 6 digits of known stolen cards → block). Rules are transparent, auditable, and easy to update when new fraud patterns emerge. Limitation: rule engines miss novel fraud patterns that don't match existing rules. Complement with ML scoring for ambiguous cases.”}},{“@type”:”Question”,”name”:”How does machine learning improve fraud detection over rule-only systems?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”ML models find complex non-linear patterns across many features simultaneously — patterns that would require hundreds of hand-crafted rules. Feature examples: amount z-score relative to user's history, time since last transaction, count of new devices in last 7 days, IP reputation score, is_VPN, merchant risk category, velocity across multiple time windows. Model: gradient boosting (XGBoost/LightGBM) trained on labeled transactions (fraud=1 based on chargebacks, non-fraud=0). Output: risk score 0-1. Decision: score > 0.7 = block, 0.2-0.7 = review (3DS challenge or manual review), < 0.2 = allow. ML catches fraud that evolved past existing rules. Key challenge: class imbalance (fraud is 0.1-1% of transactions) — oversample positives or use class weights.”}},{“@type”:”Question”,”name”:”How do you detect card testing fraud?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Card testing: fraudsters steal card numbers and verify them by making small ($0.01-$1) transactions. Signals: multiple small-amount transactions on different merchants in a short window, new account with high-value purchase immediately after small transaction, multiple declined transactions followed by a successful one (testing cards until one works). Detection: INCR small_tx:{device_id}:{hour} for transactions < $1. If count > 3 in 60 minutes, block the device and flag all associated cards. Also: if a new card has more than 2 declined transactions in 10 minutes, block the card. Store device fingerprints and IP addresses associated with card testing patterns in a blocklist (Redis SET, TTL=30 days).”}},{“@type”:”Question”,”name”:”How do you build a feature store for real-time fraud detection?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A feature store separates feature computation from model serving. Real-time features (computed at transaction time): velocity counts from Redis (transactions in last 1h/24h), amount of last transaction, time since last transaction. These are computed from the raw event stream and stored in Redis with TTL. Historical features (precomputed from batch jobs): user's 30-day average transaction amount, stddev, typical transaction hour distribution, country distribution. These are stored in Cassandra or a fast key-value store (DynamoDB), keyed by user_id. At inference time: fetch both real-time and historical features, combine into a feature vector, run through the model. Separate real-time features (Redis, sub-ms) from historical features (batch-computed, Cassandra lookup, ~5ms).”}},{“@type”:”Question”,”name”:”How do you minimize false positives in a fraud detection system?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”False positives (legitimate transactions blocked) are costly — lost revenue, frustrated customers, support tickets. Five strategies: (1) Use REVIEW instead of BLOCK for medium-risk scores — add friction (3DS challenge) rather than outright rejection. (2) Allow customers to whitelist trusted devices and merchants. (3) Contextual signals: if the user just changed their billing address AND made a large purchase, it's more suspicious than either signal alone. ML captures these combinations. (4) Tune thresholds: optimize the block threshold to minimize false positives while staying below the acceptable chargeback rate. (5) Feedback loop: when a customer disputes a block (calls support), label that as a false positive and use it to retrain the model. Track FPR (false positive rate) as a key SLA metric alongside fraud detection rate.”}}]}

Stripe system design interviews cover fraud detection and risk scoring. See common questions for Stripe interview: fraud detection system design.

Coinbase system design covers fraud detection and transaction risk scoring. Review patterns for Coinbase interview: fraud detection and risk system design.

Amazon system design covers fraud detection and seller trust. See design patterns for Amazon interview: fraud detection and trust and safety system design.

Scroll to Top