Bot Detection Service: Overview and Requirements
A bot detection service distinguishes automated traffic from legitimate human users in real time. It combines behavioral signal analysis, device fingerprinting, and challenge-response gating to assign a risk score and route suspicious sessions through friction or blocking. Getting this right matters for fraud prevention, scraping defense, and account takeover protection.
Functional Requirements
- Collect behavioral signals per session: mouse movement entropy, keystroke timing, scroll patterns, click velocity.
- Generate a device fingerprint from browser attributes: user agent, screen resolution, installed fonts, WebGL renderer, canvas hash, audio context fingerprint.
- Compute a risk score (0.0 low risk to 1.0 high risk) within 100 ms of a request arriving.
- Gate high-risk sessions through CAPTCHA or JS proof-of-work challenges.
- Allow operators to configure score thresholds per endpoint (login vs. checkout vs. API).
- Provide a review dashboard showing bot traffic breakdown by signal cluster.
Non-Functional Requirements
- Evaluate at least 10,000 requests per second per node.
- False positive rate under 0.5% to avoid blocking legitimate users.
- Signal collection must add under 5 ms overhead to page load via an async JS snippet.
Data Model
Session Record
- session_id — UUID generated at first page touch.
- fingerprint_hash — SHA-256 of the normalized fingerprint vector.
- ip_address, asn, datacenter_flag — network metadata enriched at ingest.
- risk_score — float computed by the scoring engine.
- challenge_status — NONE, ISSUED, PASSED, FAILED.
- signals_json — compressed raw signal payload for auditing.
- created_at, last_seen_at.
Fingerprint Reputation Table
- fingerprint_hash — primary key.
- seen_count, bot_verdict_count — counters.
- reputation_score — exponential moving average of risk scores.
- first_seen, last_seen.
Core Algorithms
Behavioral Signal Scoring
Each collected signal contributes a weighted sub-score. Key heuristics:
- Mouse movement entropy below a threshold (too linear) adds +0.3 to risk score.
- Time-to-first-interaction under 80 ms (faster than human reaction) adds +0.4.
- Uniform keystroke intervals (variance below 10 ms) adds +0.35.
- Missing touch or pointer events on a mobile user agent adds +0.25.
- The aggregated score is clamped to [0, 1]. Weights are tunable per deployment.
Device Fingerprinting
Collect 40+ browser attributes and normalize them into a canonical vector. Hash with SHA-256 to produce the fingerprint. Use SimHash for approximate matching to catch bots that rotate minor attributes (e.g., random screen resolution offsets). A SimHash distance under 3 bits maps to the same fingerprint cluster.
Risk Score Fusion
Combine three signal sources using a weighted ensemble:
- Behavioral score (weight 0.4).
- Fingerprint reputation score (weight 0.35).
- Network signal score: datacenter IP, VPN flag, known bot ASN (weight 0.25).
The fused score drives the challenge decision. Scores above 0.7 trigger a CAPTCHA. Scores above 0.9 result in a hard block. Scores between 0.5 and 0.7 may trigger a lightweight JS proof-of-work challenge.
Challenge Flow
- The edge proxy (NGINX or CDN WAF) intercepts requests from flagged sessions and injects a challenge redirect.
- On challenge pass, issue a signed short-lived token (JWT with 15-minute TTL) that the client presents with subsequent requests to bypass re-evaluation.
- On challenge fail, increment the fingerprint bot_verdict_count and apply a backoff: block for 1 hour, then 24 hours on repeat failure.
Scalability Design
- Deploy the scoring engine as a sidecar or inline service at the edge to minimize network round trips.
- Store fingerprint reputation in a Redis cluster with TTL-based expiry (30-day inactivity window). Use pipeline batching to update seen_count and reputation_score atomically.
- Publish scored events to a Kafka topic for offline model retraining and anomaly review without blocking the hot path.
- Use a feature store to share precomputed ASN and IP reputation lookups across scoring nodes without repeated external API calls.
API Design
- POST /v1/sessions — initialize a session; returns session_id for the JS client to attach to subsequent events.
- POST /v1/sessions/{session_id}/signals — ingest a behavioral signal batch from the JS snippet.
- GET /v1/sessions/{session_id}/score — return current risk score and challenge decision; called by the edge proxy per request.
- POST /v1/sessions/{session_id}/challenge-result — record challenge pass or fail outcome.
- GET /v1/fingerprints/{fingerprint_hash}/reputation — query fingerprint reputation for offline analysis.
Observability
- Track challenge pass rate as a primary health metric; a sudden drop signals that bot operators have solved the challenge type.
- Monitor false positive rate via user-reported friction tickets correlated to sessions that received challenges.
- Alert when bot traffic fraction exceeds 20% of total sessions on any protected endpoint.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering