DDoS Mitigation Service Low-Level Design: Rate Limiting, IP Reputation, and Traffic Scrubbing

DDoS Mitigation Service: Overview and Requirements

A DDoS mitigation service absorbs volumetric and application-layer attacks by combining connection-rate limiting at the network edge, IP reputation scoring, anycast traffic scrubbing, and clean traffic forwarding to the origin. The goal is to keep legitimate traffic flowing during an attack with minimal latency overhead in steady state.

Functional Requirements

  • Enforce per-source connection-rate limits and per-endpoint request-rate limits.
  • Maintain an IP reputation store: classify IPs as CLEAN, SUSPECT, or BLOCKED based on historical attack participation.
  • Route traffic through scrubbing centers during attacks using anycast BGP advertisement.
  • Forward only verified clean traffic to the origin over a GRE or IPIP tunnel.
  • Support on-demand and automatic mitigation activation with configurable thresholds.

Non-Functional Requirements

  • Absorb volumetric attacks exceeding 1 Tbps without origin impact.
  • Add under 5 ms latency to clean traffic in steady state (no active attack).
  • Mitigation activation within 30 seconds of attack detection.
  • False block rate for clean IPs under 0.1%.

Data Model

IP Reputation Record

  • ip_cidr — /32 for single IPs, wider prefix for block-listed ranges.
  • reputation_class — CLEAN, SUSPECT, BLOCKED.
  • score — float 0-100; higher means more malicious.
  • attack_participations — count of confirmed attack events.
  • last_seen_attack — timestamp of most recent malicious activity.
  • source — INTERNAL (self-observed), THREAT_FEED (external), MANUAL.
  • expires_at — TTL for auto-expiry of block entries.

Mitigation Session

  • session_id — UUID.
  • target_prefix — the protected IP range under attack.
  • attack_type — VOLUMETRIC, SYN_FLOOD, HTTP_FLOOD, AMPLIFICATION.
  • status — DETECTING, MITIGATING, CLEARING, ENDED.
  • started_at, ended_at, peak_gbps.

Core Algorithms

Connection-Rate Limiting

Apply a token bucket per source IP at the kernel level using eBPF XDP programs on edge routers for wire-speed enforcement. Configure bucket parameters: capacity (burst allowance) and refill rate (sustained limit). For SYN flood mitigation, use SYN cookies to validate TCP handshakes without allocating connection state for unverified sources.

IP Reputation Scoring

Score is a weighted combination of:

  • Attack participation frequency in the trailing 30 days (weight 0.5).
  • Threat feed match: listed in known botnet or amplifier feeds (weight 0.3).
  • Behavioral anomaly: traffic pattern deviation from baseline (weight 0.2).

Score decays exponentially with time since last malicious event. An IP scoring above 75 is promoted to BLOCKED; below 20 after decay it reverts to CLEAN.

Attack Detection

Use IPFIX/NetFlow data sampled at 1:1000 from edge routers. Compute a baseline traffic model per protected prefix using an EWMA over a 7-day rolling window. Trigger mitigation when traffic volume exceeds 3 standard deviations above the baseline for two consecutive 10-second measurement windows.

Scrubbing Architecture

  • Scrubbing centers are deployed in multiple anycast PoPs. Under attack, BGP advertisements shift the attacked prefix into the scrubbing center routing table.
  • Inside the scrubbing center, traffic passes through layered filters: network ACLs drop BLOCKED IPs, rate limiters throttle SUSPECT IPs, and DPI rules catch application-layer attack patterns.
  • Clean traffic is encapsulated in a GRE tunnel and forwarded to the origin data center.
  • The origin router decapsulates GRE and applies a whitelist ACL accepting only packets from the scrubbing center tunnel endpoints.

Scalability Design

  • Store IP reputation in a distributed key-value store (Redis Cluster) partitioned by IP prefix to allow O(1) lookup at line rate.
  • Propagate reputation updates to all PoPs via a Kafka topic; edge nodes consume updates and reload their local in-memory block lists within 5 seconds.
  • Use Bloom filters at each edge PoP as a fast pre-check for BLOCKED IPs before hitting Redis, reducing lookup latency to under 1 ms for the common case.
  • Pre-provision scrubbing capacity at 200% of peak observed legitimate traffic for the protected prefix to ensure headroom during volumetric events.

API Design

  • POST /v1/mitigations — manually activate mitigation for a prefix; returns session_id.
  • DELETE /v1/mitigations/{session_id} — deactivate mitigation and restore normal routing.
  • GET /v1/mitigations/{session_id}/stats — return real-time attack volume, dropped packet count, and clean throughput.
  • POST /v1/reputation/ips — bulk-upsert IP reputation records from a threat feed integration.
  • GET /v1/reputation/ips/{ip} — query current reputation class and score for an IP.
  • POST /v1/reputation/ips/{ip}/block — manually block an IP with an optional TTL and reason.

Observability

  • Emit mitigation activation and deactivation events to a time-series database for post-incident reporting.
  • Track clean-to-dirty traffic ratio during mitigation sessions to measure scrubbing effectiveness.
  • Alert the NOC when a mitigation session exceeds 30 minutes, indicating the attack is sustained and manual review is warranted.
  • Monitor BGP convergence time after route advertisement changes to ensure traffic reroutes within the SLA window.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does connection-rate limiting protect against DDoS attacks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Connection-rate limiting enforces per-IP and per-subnet caps on new TCP connections or HTTP requests per second using token-bucket or leaky-bucket counters stored in a shared cache (Redis). When a source exceeds its quota, new connections are dropped or TCP SYN-ACKs are withheld, preventing resource exhaustion on backend servers while legitimate traffic stays within limits.”
}
},
{
“@type”: “Question”,
“name”: “What is IP reputation scoring and how is it used in DDoS mitigation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “IP reputation scores aggregate signals such as historical abuse reports, BGP prefix ownership (datacenter vs residential ASN), presence on threat-intel blocklists, and observed attack participation. High-reputation-risk IPs are rate-limited more aggressively or blocked outright at the edge. Scores decay over time to allow rehabilitation of previously abused addresses.”
}
},
{
“@type”: “Question”,
“name”: “How does anycast traffic scrubbing work during a DDoS attack?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Attack traffic is drawn to scrubbing centers via anycast routing — the same IP prefix is announced from multiple PoPs so BGP routes traffic to the nearest scrubbing node. Scrubbers apply stateful inspection, packet filters, and behavioral analysis to strip malicious packets. Only clean traffic is forwarded through a GRE tunnel or MPLS circuit to the origin.”
}
},
{
“@type”: “Question”,
“name”: “How is clean traffic forwarded to the origin after scrubbing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “After scrubbing, clean packets are encapsulated in a GRE tunnel or sent over a dedicated transit link from the scrubbing center to the origin data center. The origin's router decapsulates and routes normally. Tunneling overhead is accounted for in MTU sizing (typically 1476 bytes for GRE over IPv4) to avoid fragmentation performance penalties.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Atlassian Interview Guide

Scroll to Top