Requirements
- Record every ad click event (ad_id, user_id, timestamp, ip, device_type)
- Query click counts per ad for any time range in real time
- Detect and filter invalid/bot clicks (same IP clicking same ad >3 times in 60s)
- 1B clicks/day (11K/second), query latency <100ms
Data Flow Architecture
Browser/App → Click API → Kafka (raw clicks) → Stream Processor → Redis (real-time counters)
↓ → ClickSummary DB (hourly aggregates)
→ Raw Click Storage (S3/Cassandra) → Fraud Filter
Click Ingestion
Click API: stateless, horizontally scaled. On each click:
- Validate: required fields present, ad_id exists (short-circuit from cache)
- Deduplicate: check Redis key click_dedup:{user_id}:{ad_id} (SET NX, TTL=60s). If exists, discard as duplicate.
- Publish to Kafka topic ad-clicks with key=ad_id (ensures ordering per ad)
- Return 200 immediately — do not wait for processing
Fraud Detection
Stream processor checks each click against fraud rules:
- IP rate limit: INCR click_ip:{ip}:{ad_id}:{minute_bucket}. If count > 3 in 60s, mark click as INVALID. TTL=120s on the key.
- User rate limit: INCR click_user:{user_id}:{ad_id}:{minute_bucket}. More than 3 clicks per ad per minute = suspicious.
- Bot detection: headless browser fingerprints, missing user-agent, click timing analysis (too fast to be human).
Invalid clicks are written to a separate Kafka topic for analysis and not counted in billable metrics.
Real-Time Click Counting
Stream processor (Flink or Kafka Streams) maintains windowed counts:
- Per-minute bucket: INCR click_count:{ad_id}:{YYYYMMDD_HH_MM}. TTL=48h.
- Running total: INCR click_total:{ad_id}. No TTL (lifetime counter).
Query for ad_id clicks in time range [start, end]: enumerate all minute buckets in range, MGET all keys, sum. For a 24-hour query: 1440 MGET calls pipelined = very fast.
Data Model (Persistent Storage)
ClickEvent(click_id UUID, ad_id, user_id, ip, device_type, is_valid BOOL,
created_at, campaign_id) -- written to Cassandra, partitioned by ad_id+date
ClickSummary(ad_id, time_bucket TIMESTAMP, click_count INT, valid_count INT,
unique_users INT) -- hourly aggregates in PostgreSQL, kept indefinitely
Campaign(campaign_id, advertiser_id, budget_cents, spent_cents, start_date, end_date)
Click Aggregation Pipeline
Raw clicks (Cassandra) → hourly batch job → ClickSummary table. For billing: sum valid ClickSummary records per campaign per day. For advertiser dashboards: query ClickSummary directly (much smaller than raw clicks). Raw Cassandra records retained 90 days for fraud investigation; older data archived to S3.
Budget Cap and Click Throttling
Advertisers set daily budgets. When budget is exhausted, stop serving the ad. Implementation: INCR campaign_spend:{campaign_id} on each billable click (each click costs e.g., $0.01 = 1 cent). In the ad serving layer, before serving an ad, check if campaign_spend:{campaign_id} < campaign.budget_cents (read from cache, TTL=5s). If over budget, skip the ad. Slight over-spend is acceptable (eventually consistent). For strict budget enforcement: use a token bucket in Redis where tokens = remaining budget.
Key Design Decisions
- Kafka decouples ingestion from processing — click API is never blocked by downstream processing
- Per-minute Redis counters enable flexible time-range queries without expensive aggregation
- Deduplication key (user+ad+60s TTL) prevents refresh-spamming before fraud detection
- Separate valid/invalid click paths: billable metrics never include fraudulent clicks
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do you handle 11,000 ad clicks per second in a tracking system?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Key: decouple ingestion from processing with Kafka. The Click API accepts clicks and immediately publishes to a Kafka topic (sub-millisecond write) — it never waits for DB writes, fraud checks, or aggregation. Kafka buffers the stream. Multiple consumer groups read from the topic independently: fraud detection, real-time counter updates (Redis), raw event storage (Cassandra), and billing aggregation. Each consumer scales independently. The Click API is stateless and horizontally scalable — add more instances behind a load balancer. Redis handles real-time counter writes (INCR) at 100K+ ops/second. Cassandra handles raw event storage at high write throughput. The system absorbs spikes through Kafka's buffer rather than back-pressuring the ingestion path.”}},{“@type”:”Question”,”name”:”How do you detect and filter invalid/bot ad clicks?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Multi-layer fraud detection: (1) IP rate limiting: INCR click_ip:{ip}:{ad_id}:{minute_bucket} in Redis. If count exceeds threshold (e.g., 3) within 60 seconds, mark the click as invalid. TTL on the key prevents unbounded growth. (2) User rate limiting: same pattern per user_id. (3) Missing/invalid user agent: reject clicks from headless browsers or known bot user agents. (4) Click timing: human clicks have variance; perfectly periodic clicks indicate automation. (5) Conversion tracking: if a click never leads to any downstream action (page view, conversion), it may be fraudulent — use this signal for batch retrospective analysis. Invalid clicks are written to a separate Kafka topic, never counted in billable metrics, but retained for analysis and refund processing.”}},{“@type”:”Question”,”name”:”How do you count ad clicks for arbitrary time ranges efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Store per-minute counters in Redis: INCR click_count:{ad_id}:{YYYYMMDD_HH_MM}. TTL=48h. To query clicks for ad_id in time range [start, end]: enumerate all minute buckets covering the range, pipeline MGET for all keys, sum the values. A 24-hour query reads 1440 keys in one pipelined MGET round trip. For longer historical queries (30 days): use the hourly ClickSummary table in PostgreSQL (pre-aggregated from raw events). Query: SELECT SUM(valid_click_count) FROM click_summary WHERE ad_id=X AND time_bucket BETWEEN start AND end. Pre-aggregated hourly rows are much faster than scanning raw clicks. Redis is the real-time layer; DB is the historical layer.”}},{“@type”:”Question”,”name”:”How do you enforce advertiser budget caps in real time?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Token bucket per campaign in Redis: key=campaign_budget:{campaign_id}, value=remaining_cents. On each billable click event, the billing consumer decrements the bucket: DECRBY campaign_budget:{campaign_id} cost_per_click_cents. If result goes below 0, mark campaign as paused in a Redis hash (campaign_status:{campaign_id}=PAUSED) and publish a budget_exhausted event. In the ad serving layer (before showing an ad), check campaign_status — if PAUSED, skip the ad. The status check has a short TTL (5s) for caching. Slight over-spend is acceptable due to eventual consistency (a few clicks may be served between exhaustion and the status propagating). For stricter enforcement, decrement budget before serving and reject the ad if insufficient budget remains.”}},{“@type”:”Question”,”name”:”Why store raw click events in Cassandra rather than MySQL?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Ad click events are write-heavy (11K writes/second), time-series data (query by ad_id + time range), and append-only (clicks are never updated). Cassandra is optimized for exactly this pattern. Partition key = (ad_id, date) puts all clicks for an ad on the same day on the same node. Clustering key = click_id (or timestamp) provides efficient time-range queries within a partition. Write throughput: Cassandra handles tens of thousands of writes per second per node without write amplification (LSM-tree storage). MySQL would become a bottleneck at 11K writes/second and requires expensive sharding. Trade-offs: Cassandra does not support secondary indexes efficiently — cannot query by user_id across all ads without a separate index table.”}}]}
Twitter system design covers ad click tracking and analytics pipelines. See common questions for Twitter/X interview: ad click tracking and analytics system design.
Snap system design covers ad click tracking and fraud detection. Review design patterns for Snap interview: ad click tracking system design.
LinkedIn system design covers ad click tracking and reporting. See design patterns for LinkedIn interview: ad tracking and analytics system design.