Ad Click Tracker System Low-Level Design

Requirements

  • Record every ad click event (ad_id, user_id, timestamp, ip, device_type)
  • Query click counts per ad for any time range in real time
  • Detect and filter invalid/bot clicks (same IP clicking same ad >3 times in 60s)
  • 1B clicks/day (11K/second), query latency <100ms

Data Flow Architecture

Browser/App → Click API → Kafka (raw clicks) → Stream Processor → Redis (real-time counters)
                       ↓                                        → ClickSummary DB (hourly aggregates)
                       → Raw Click Storage (S3/Cassandra)       → Fraud Filter

Click Ingestion

Click API: stateless, horizontally scaled. On each click:

  1. Validate: required fields present, ad_id exists (short-circuit from cache)
  2. Deduplicate: check Redis key click_dedup:{user_id}:{ad_id} (SET NX, TTL=60s). If exists, discard as duplicate.
  3. Publish to Kafka topic ad-clicks with key=ad_id (ensures ordering per ad)
  4. Return 200 immediately — do not wait for processing

Fraud Detection

Stream processor checks each click against fraud rules:

  • IP rate limit: INCR click_ip:{ip}:{ad_id}:{minute_bucket}. If count > 3 in 60s, mark click as INVALID. TTL=120s on the key.
  • User rate limit: INCR click_user:{user_id}:{ad_id}:{minute_bucket}. More than 3 clicks per ad per minute = suspicious.
  • Bot detection: headless browser fingerprints, missing user-agent, click timing analysis (too fast to be human).

Invalid clicks are written to a separate Kafka topic for analysis and not counted in billable metrics.

Real-Time Click Counting

Stream processor (Flink or Kafka Streams) maintains windowed counts:

  • Per-minute bucket: INCR click_count:{ad_id}:{YYYYMMDD_HH_MM}. TTL=48h.
  • Running total: INCR click_total:{ad_id}. No TTL (lifetime counter).

Query for ad_id clicks in time range [start, end]: enumerate all minute buckets in range, MGET all keys, sum. For a 24-hour query: 1440 MGET calls pipelined = very fast.

Data Model (Persistent Storage)

ClickEvent(click_id UUID, ad_id, user_id, ip, device_type, is_valid BOOL,
           created_at, campaign_id)  -- written to Cassandra, partitioned by ad_id+date

ClickSummary(ad_id, time_bucket TIMESTAMP, click_count INT, valid_count INT,
             unique_users INT)  -- hourly aggregates in PostgreSQL, kept indefinitely

Campaign(campaign_id, advertiser_id, budget_cents, spent_cents, start_date, end_date)

Click Aggregation Pipeline

Raw clicks (Cassandra) → hourly batch job → ClickSummary table. For billing: sum valid ClickSummary records per campaign per day. For advertiser dashboards: query ClickSummary directly (much smaller than raw clicks). Raw Cassandra records retained 90 days for fraud investigation; older data archived to S3.

Budget Cap and Click Throttling

Advertisers set daily budgets. When budget is exhausted, stop serving the ad. Implementation: INCR campaign_spend:{campaign_id} on each billable click (each click costs e.g., $0.01 = 1 cent). In the ad serving layer, before serving an ad, check if campaign_spend:{campaign_id} < campaign.budget_cents (read from cache, TTL=5s). If over budget, skip the ad. Slight over-spend is acceptable (eventually consistent). For strict budget enforcement: use a token bucket in Redis where tokens = remaining budget.

Key Design Decisions

  • Kafka decouples ingestion from processing — click API is never blocked by downstream processing
  • Per-minute Redis counters enable flexible time-range queries without expensive aggregation
  • Deduplication key (user+ad+60s TTL) prevents refresh-spamming before fraud detection
  • Separate valid/invalid click paths: billable metrics never include fraudulent clicks

Twitter system design covers ad click tracking and analytics pipelines. See common questions for Twitter/X interview: ad click tracking and analytics system design.

Snap system design covers ad click tracking and fraud detection. Review design patterns for Snap interview: ad click tracking system design.

LinkedIn system design covers ad click tracking and reporting. See design patterns for LinkedIn interview: ad tracking and analytics system design.

Scroll to Top