DDoS Mitigation Service: Overview and Requirements
A DDoS mitigation service absorbs volumetric and application-layer attacks by combining connection-rate limiting at the network edge, IP reputation scoring, anycast traffic scrubbing, and clean traffic forwarding to the origin. The goal is to keep legitimate traffic flowing during an attack with minimal latency overhead in steady state.
Functional Requirements
- Enforce per-source connection-rate limits and per-endpoint request-rate limits.
- Maintain an IP reputation store: classify IPs as CLEAN, SUSPECT, or BLOCKED based on historical attack participation.
- Route traffic through scrubbing centers during attacks using anycast BGP advertisement.
- Forward only verified clean traffic to the origin over a GRE or IPIP tunnel.
- Support on-demand and automatic mitigation activation with configurable thresholds.
Non-Functional Requirements
- Absorb volumetric attacks exceeding 1 Tbps without origin impact.
- Add under 5 ms latency to clean traffic in steady state (no active attack).
- Mitigation activation within 30 seconds of attack detection.
- False block rate for clean IPs under 0.1%.
Data Model
IP Reputation Record
- ip_cidr — /32 for single IPs, wider prefix for block-listed ranges.
- reputation_class — CLEAN, SUSPECT, BLOCKED.
- score — float 0-100; higher means more malicious.
- attack_participations — count of confirmed attack events.
- last_seen_attack — timestamp of most recent malicious activity.
- source — INTERNAL (self-observed), THREAT_FEED (external), MANUAL.
- expires_at — TTL for auto-expiry of block entries.
Mitigation Session
- session_id — UUID.
- target_prefix — the protected IP range under attack.
- attack_type — VOLUMETRIC, SYN_FLOOD, HTTP_FLOOD, AMPLIFICATION.
- status — DETECTING, MITIGATING, CLEARING, ENDED.
- started_at, ended_at, peak_gbps.
Core Algorithms
Connection-Rate Limiting
Apply a token bucket per source IP at the kernel level using eBPF XDP programs on edge routers for wire-speed enforcement. Configure bucket parameters: capacity (burst allowance) and refill rate (sustained limit). For SYN flood mitigation, use SYN cookies to validate TCP handshakes without allocating connection state for unverified sources.
IP Reputation Scoring
Score is a weighted combination of:
- Attack participation frequency in the trailing 30 days (weight 0.5).
- Threat feed match: listed in known botnet or amplifier feeds (weight 0.3).
- Behavioral anomaly: traffic pattern deviation from baseline (weight 0.2).
Score decays exponentially with time since last malicious event. An IP scoring above 75 is promoted to BLOCKED; below 20 after decay it reverts to CLEAN.
Attack Detection
Use IPFIX/NetFlow data sampled at 1:1000 from edge routers. Compute a baseline traffic model per protected prefix using an EWMA over a 7-day rolling window. Trigger mitigation when traffic volume exceeds 3 standard deviations above the baseline for two consecutive 10-second measurement windows.
Scrubbing Architecture
- Scrubbing centers are deployed in multiple anycast PoPs. Under attack, BGP advertisements shift the attacked prefix into the scrubbing center routing table.
- Inside the scrubbing center, traffic passes through layered filters: network ACLs drop BLOCKED IPs, rate limiters throttle SUSPECT IPs, and DPI rules catch application-layer attack patterns.
- Clean traffic is encapsulated in a GRE tunnel and forwarded to the origin data center.
- The origin router decapsulates GRE and applies a whitelist ACL accepting only packets from the scrubbing center tunnel endpoints.
Scalability Design
- Store IP reputation in a distributed key-value store (Redis Cluster) partitioned by IP prefix to allow O(1) lookup at line rate.
- Propagate reputation updates to all PoPs via a Kafka topic; edge nodes consume updates and reload their local in-memory block lists within 5 seconds.
- Use Bloom filters at each edge PoP as a fast pre-check for BLOCKED IPs before hitting Redis, reducing lookup latency to under 1 ms for the common case.
- Pre-provision scrubbing capacity at 200% of peak observed legitimate traffic for the protected prefix to ensure headroom during volumetric events.
API Design
- POST /v1/mitigations — manually activate mitigation for a prefix; returns session_id.
- DELETE /v1/mitigations/{session_id} — deactivate mitigation and restore normal routing.
- GET /v1/mitigations/{session_id}/stats — return real-time attack volume, dropped packet count, and clean throughput.
- POST /v1/reputation/ips — bulk-upsert IP reputation records from a threat feed integration.
- GET /v1/reputation/ips/{ip} — query current reputation class and score for an IP.
- POST /v1/reputation/ips/{ip}/block — manually block an IP with an optional TTL and reason.
Observability
- Emit mitigation activation and deactivation events to a time-series database for post-incident reporting.
- Track clean-to-dirty traffic ratio during mitigation sessions to measure scrubbing effectiveness.
- Alert the NOC when a mitigation session exceeds 30 minutes, indicating the attack is sustained and manual review is warranted.
- Monitor BGP convergence time after route advertisement changes to ensure traffic reroutes within the SLA window.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Atlassian Interview Guide