Requirements
- Record every ad click event (ad_id, user_id, timestamp, ip, device_type)
- Query click counts per ad for any time range in real time
- Detect and filter invalid/bot clicks (same IP clicking same ad >3 times in 60s)
- 1B clicks/day (11K/second), query latency <100ms
Data Flow Architecture
Browser/App → Click API → Kafka (raw clicks) → Stream Processor → Redis (real-time counters)
↓ → ClickSummary DB (hourly aggregates)
→ Raw Click Storage (S3/Cassandra) → Fraud Filter
Click Ingestion
Click API: stateless, horizontally scaled. On each click:
- Validate: required fields present, ad_id exists (short-circuit from cache)
- Deduplicate: check Redis key click_dedup:{user_id}:{ad_id} (SET NX, TTL=60s). If exists, discard as duplicate.
- Publish to Kafka topic ad-clicks with key=ad_id (ensures ordering per ad)
- Return 200 immediately — do not wait for processing
Fraud Detection
Stream processor checks each click against fraud rules:
- IP rate limit: INCR click_ip:{ip}:{ad_id}:{minute_bucket}. If count > 3 in 60s, mark click as INVALID. TTL=120s on the key.
- User rate limit: INCR click_user:{user_id}:{ad_id}:{minute_bucket}. More than 3 clicks per ad per minute = suspicious.
- Bot detection: headless browser fingerprints, missing user-agent, click timing analysis (too fast to be human).
Invalid clicks are written to a separate Kafka topic for analysis and not counted in billable metrics.
Real-Time Click Counting
Stream processor (Flink or Kafka Streams) maintains windowed counts:
- Per-minute bucket: INCR click_count:{ad_id}:{YYYYMMDD_HH_MM}. TTL=48h.
- Running total: INCR click_total:{ad_id}. No TTL (lifetime counter).
Query for ad_id clicks in time range [start, end]: enumerate all minute buckets in range, MGET all keys, sum. For a 24-hour query: 1440 MGET calls pipelined = very fast.
Data Model (Persistent Storage)
ClickEvent(click_id UUID, ad_id, user_id, ip, device_type, is_valid BOOL,
created_at, campaign_id) -- written to Cassandra, partitioned by ad_id+date
ClickSummary(ad_id, time_bucket TIMESTAMP, click_count INT, valid_count INT,
unique_users INT) -- hourly aggregates in PostgreSQL, kept indefinitely
Campaign(campaign_id, advertiser_id, budget_cents, spent_cents, start_date, end_date)
Click Aggregation Pipeline
Raw clicks (Cassandra) → hourly batch job → ClickSummary table. For billing: sum valid ClickSummary records per campaign per day. For advertiser dashboards: query ClickSummary directly (much smaller than raw clicks). Raw Cassandra records retained 90 days for fraud investigation; older data archived to S3.
Budget Cap and Click Throttling
Advertisers set daily budgets. When budget is exhausted, stop serving the ad. Implementation: INCR campaign_spend:{campaign_id} on each billable click (each click costs e.g., $0.01 = 1 cent). In the ad serving layer, before serving an ad, check if campaign_spend:{campaign_id} < campaign.budget_cents (read from cache, TTL=5s). If over budget, skip the ad. Slight over-spend is acceptable (eventually consistent). For strict budget enforcement: use a token bucket in Redis where tokens = remaining budget.
Key Design Decisions
- Kafka decouples ingestion from processing — click API is never blocked by downstream processing
- Per-minute Redis counters enable flexible time-range queries without expensive aggregation
- Deduplication key (user+ad+60s TTL) prevents refresh-spamming before fraud detection
- Separate valid/invalid click paths: billable metrics never include fraudulent clicks
Twitter system design covers ad click tracking and analytics pipelines. See common questions for Twitter/X interview: ad click tracking and analytics system design.
Snap system design covers ad click tracking and fraud detection. Review design patterns for Snap interview: ad click tracking system design.
LinkedIn system design covers ad click tracking and reporting. See design patterns for LinkedIn interview: ad tracking and analytics system design.