Low Level Design: Fraud Detection System

Real-Time Scoring Pipeline

Every transaction enters the fraud scoring pipeline synchronously before a payment decision is returned. The flow: transaction arrives → feature extraction → rule engine evaluation → ML model scoring → final decision (approve / decline / review). The total latency budget for card-present transactions is under 100ms end-to-end; card-not-present transactions can do additional async enrichment after an initial synchronous score.

The synchronous path keeps latency tight by caching all feature data and running the rule engine in memory. Async enrichment — device reputation lookups, network graph queries, third-party risk signals — runs in parallel and feeds into a secondary review queue for borderline scores. If enrichment finishes before the timeout, the score is updated; otherwise the conservative initial score stands.

Decisions fall into three buckets: approve (score below low threshold), decline (score above high threshold), and review (middle band, held for analyst or step-up auth). The thresholds are configurable per merchant category and transaction type without a code deploy.

Feature Extraction

Features are pulled from multiple sources and assembled for each transaction within the latency budget. Transaction features include amount, merchant category code (MCC), merchant country, currency, and time-of-day. User features include account age, historical average transaction amount, count of past declines, and linked payment instruments.

Velocity features come from Redis sliding-window counters: transactions in the last 1 hour, 24 hours, and 168 hours, keyed by user_id, card_number, device_id, and IP address. These counters are updated atomically using INCR with EXPIRE, giving O(1) read and write per dimension.

Behavioral features — typing cadence, mouse movement entropy, scroll behavior, clipboard paste detection — are captured in the browser and submitted as a structured fingerprint alongside the transaction. Mobile apps send hardware attestation tokens and device-level anomaly signals. These features feed both the rule engine and the ML model, and are stored in the feature store for model training.

Rule Engine

The rule engine evaluates a set of if-then rules authored by fraud analysts in a domain-specific language (DSL). Example rule: amount > 5000 AND device_age_days < 1 AND country != user_home_country → route_to_review. Rules are compiled to an in-memory decision tree and evaluated in O(1) per rule per transaction.

Hot rules — those that trigger most frequently — are pinned in L1/L2 cache to avoid memory stalls. Rules are versioned and deployed to the engine via a configuration push without a full application deploy. A/B testing is supported: new rules run in shadow mode, logging their outcomes without affecting live decisions, before being promoted to active.

The rule engine outputs a risk modifier (additive to the ML score), a set of triggered rule IDs for explainability, and a hard-block flag for rules that should always decline regardless of ML score. Hard blocks are used for known compromised card ranges, sanctioned countries, and accounts under fraud hold.

ML Model Scoring

The primary model is a gradient boosted tree (XGBoost or LightGBM) trained on labeled transaction data with fraud/not-fraud labels. Features from the feature store — velocity counts, behavioral signals, transaction attributes — are assembled into a fixed-width vector and scored. The model outputs a probability between 0 and 1. Training runs daily on new labeled data; the model artifact is deployed to a low-latency REST endpoint or embedded directly in the scoring service to avoid a network hop.

For card-not-present e-commerce flows, a secondary deep neural network processes sequential transaction history as a time series, capturing behavioral patterns across sessions. Its score is blended with the tree model score via a weighted average calibrated on a held-out validation set.

Online learning updates model weights in near-real-time using confirmed fraud labels from case management. Feedback loops are carefully monitored to prevent concept drift from analyst bias. Model performance is tracked on precision, recall, and false positive rate by transaction type; degradation beyond a threshold triggers rollback to the previous artifact.

Velocity Checks

Velocity checks detect anomalous transaction frequency or volume across multiple dimensions. Each Redis key encodes a dimension and a time bucket: vel:user:{user_id}:1h, vel:card:{card_number}:24h, vel:ip:{ip}:1h, vel:merchant:{merchant_id}:device:{device_id}:24h. On each transaction, all relevant keys are incremented with INCR and the expiry is refreshed with EXPIRE equal to the window size.

Counters tracked per dimension include: count of transactions in last 1h/24h/168h, count of unique merchants in last 24h, count of unique IPs in last 24h, and sum of transaction amounts in last 7 days. Any counter exceeding a configured threshold adds a weighted increment to the overall risk score. Multiple counters firing simultaneously increase the score non-linearly to catch burst patterns.

Cross-account velocity is computed for device_id and IP: if a single device places 20 transactions across 10 different user accounts in an hour, each of those accounts’ risk scores is elevated. This catches account takeover rings that cycle through stolen credentials from a small set of devices.

Device Fingerprinting

Browser fingerprints are assembled from: User-Agent string, screen resolution and color depth, timezone, installed fonts (via canvas measurement), WebGL renderer, audio context hash, and a canvas drawing hash. These attributes are hashed into a stable fingerprint ID that persists across sessions even in incognito mode. The fingerprint is linked to account history: a fingerprint associated with past fraud is itself a high-risk signal for new accounts using the same device.

Mobile apps use OS-provided device identifiers (Android ID, iOS IDFV) combined with hardware attestation (Play Integrity API, DeviceCheck) to produce a tamper-resistant device token. Emulator and rooted device signals are extracted and flagged. Device age — time since first seen in the system — is a key feature; a brand-new device on a high-value transaction triggers step-up authentication (OTP, biometric re-verification).

Device reputation data is shared across tenants in aggregate form: a device seen committing fraud on one merchant is flagged when it appears on another. Privacy is preserved by hashing device identifiers before sharing; only the hash and fraud verdict are exchanged, not raw device data.

Case Management

Transactions that score in the review band are routed to a case queue. Each case record contains: transaction details, rule triggers and ML score with feature attribution (SHAP values), device fingerprint, account history, and links to similar past cases based on shared attributes. Analysts work cases in order of risk score, with SLA timers ensuring high-risk cases are reviewed within minutes.

Analyst actions: approve (release transaction), decline (block transaction, optionally block card), block account (suspend for investigation), escalate (route to senior analyst or law enforcement liaison). Each action is recorded with analyst_id, timestamp, and a mandatory reason code. Reason codes feed directly into model training labels — a declined case becomes a fraud label; an approved case becomes a negative label for that feature set.

Case outcomes close the feedback loop: analyst verdicts update the training dataset for the next daily model run. Analyst agreement rate (cases where multiple analysts reach the same verdict) is tracked to measure label quality. Systematic disagreements surface ambiguous cases for policy review.

Network Analysis

Fraudsters rarely operate alone — they create clusters of accounts sharing a device, IP address, bank account routing number, email domain, or phone number prefix. A graph database (Neo4j or a custom adjacency list in Redis) models these shared-attribute relationships. On each new account creation and each transaction, edges are added linking the new entity to existing nodes that share attributes.

A connected component query finds the full fraud ring: if user A shares a device with user B, and user B shares a bank account with user C, all three are in the same component. Velocity limits are applied at the component level: total transaction volume across the entire ring, not just per account. When one node in the ring is confirmed fraud, risk scores for all connected nodes are elevated immediately.

Graph queries run asynchronously and feed into the review queue rather than the synchronous scoring path, since graph traversal latency is too high for sub-100ms decisions. A pre-computed risk propagation score — updated in batch every few minutes — is stored per account and read synchronously as a feature by the ML model.

{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “How do you design a real-time fraud scoring pipeline under 100ms latency?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Keep the critical path in-memory: load pre-computed feature vectors from Redis, run a lightweight gradient-boosted model (XGBoost or a small neural net exported to ONNX) in the same process, and return a score synchronously. Async enrichment (IP reputation, device graph lookups) writes back to a feature store for the next request. Use sub-10ms p99 targets for each hop—feature fetch, model inference, threshold check—and enforce hard timeouts so a slow lookup never blocks the payment.” } }, { “@type”: “Question”, “name”: “How do you design velocity checks with Redis sliding window counters?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Use a sorted set per key (e.g., user_id:txn_count) where each member is a transaction ID and the score is the Unix timestamp in milliseconds. On each event: ZREMRANGEBYSCORE to drop entries outside the window, ZADD the new event, ZCARD to get the current count, then EXPIRE to cap memory. Wrap the three writes in a Lua script for atomicity. Maintain separate keys for different dimensions (card, IP, device) and different windows (1 min, 15 min, 24 hr) to catch burst and slow-burn patterns.” } }, { “@type”: “Question”, “name”: “What are the tradeoffs between ML models and rule engines for fraud detection?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Rule engines give deterministic, auditable decisions and are trivial to update (deploy a config change, not a model), but they miss novel patterns and require constant manual tuning. ML models generalize to unseen fraud vectors and scale feature space cheaply, but they are opaque, need labeled data pipelines, and can degrade silently under distribution shift. Production systems layer both: hard rules for known high-confidence signals (block specific BINs, velocity tripwires) and an ML score for the grey zone, with the rule engine able to override the model in either direction.” } }, { “@type”: “Question”, “name”: “How does device fingerprinting work for anonymous fraud detection?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Collect stable browser or app attributes—canvas fingerprint, WebGL renderer, installed fonts, screen geometry, timezone, user-agent, and network characteristics—and hash them into a device ID. Store a risk profile per device ID in a feature store: age, associated accounts, historical fraud rate. For mobile, supplement with hardware attestation (Android SafetyNet / Apple DeviceCheck) to detect emulators and rooted devices. Treat the device ID as a soft signal; link it probabilistically to other devices sharing sub-attributes to surface fraud rings operating at scale.” } }, { “@type”: “Question”, “name”: “How do you detect fraud rings using graph network analysis?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Model entities (users, devices, cards, IPs, emails, phone numbers) as nodes and observed relationships (shared attribute, same session, same transaction) as edges in a graph database like Neptune or TigerGraph. Run community detection algorithms (Louvain, label propagation) periodically to surface densely connected clusters. Flag clusters where fraud rate exceeds a threshold and apply risk scores to all members. For real-time scoring, do a bounded neighborhood traversal (2–3 hops) from the incoming entity and aggregate known-bad signals from its neighbors.” } } ] }

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

See also: Coinbase Interview Guide

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

Scroll to Top