Question 1

How does the click redirect flow work in ad tracking?

Accepted Answer

When a user clicks an ad: (1) The browser requests the tracking URL: track.example.com/click?ad_id=123&user_id=456. (2) The tracking server records the click event (ad_id, user_id, timestamp, referrer, user_agent, IP) and immediately returns an HTTP 302 redirect to the advertiser's landing page. (3) The browser follows the redirect and loads the landing page — the user reaches the destination without delay. (4) The tracking server asynchronously processes the click event: writes to Kafka for downstream processing (deduplication, fraud detection, billing aggregation). This redirect approach captures the click even if the advertiser's server is slow, because the tracking server responds immediately without waiting for the advertiser. The user experience is unaffected by tracking infrastructure latency.

Question 2

How do you deduplicate fraudulent ad clicks using Redis?

Accepted Answer

Fraudulent and accidental duplicate clicks must be detected before billing. Deduplication key: (user_id or fingerprint, ad_id, time_window). Redis atomic SET NX with TTL: SETNX dedup:{user_id}:{ad_id}:{hour_bucket} 1 EX 3600. If SETNX returns 0 (key already exists): duplicate click in this hour — mark as invalid. If returns 1: first click in this window — valid. The 1-hour window prevents the same user from clicking the same ad 100 times and being billed for each. More sophisticated fraud signals: IP rate limit (more than 50 clicks from the same IP in 1 minute is suspicious), user agent patterns (missing browser fingerprint, bot signatures), click velocity (sub-second repeated clicks are machine-generated). Combine Redis deduplication with an ML-based fraud scoring model for high-value ad accounts.

Question 3

How do you aggregate click counts for advertiser billing?

Accepted Answer

Two-tier aggregation: (1) Near-real-time stream processing: Kafka receives click events → Flink/Spark Streaming aggregates valid clicks per ad per minute → writes to a time-series dashboard store (ClickHouse, InfluxDB). Advertisers see click counts updating in near real-time on their dashboard. (2) Batch processing for authoritative billing: all click events stored in S3 as Parquet → daily Spark/BigQuery aggregation job computes final valid click counts per (ad_id, date). Stream counts may have minor inaccuracies (late-arriving events, deduplication race conditions); the batch job reprocesses all events with complete data for final billing. Never bill based on stream-only counts — stream processing optimizes for latency at the cost of some accuracy. The batch job corrects any discrepancies between real-time dashboard counts and final billing counts.

Question 4

What is click attribution and what models are commonly used?

Accepted Answer

Click attribution determines which ad click 'caused' a conversion (purchase, signup, install). Models: Last-click: 100% credit to the last ad clicked before converting. Simple to implement; biases toward bottom-of-funnel ads (retargeting). First-click: 100% credit to the first ad that introduced the user to the brand. Biases toward awareness campaigns. Linear: equal credit split across all touchpoints in the attribution window. Time-decay: more credit to touches closer to the conversion; decaying credit for earlier touches. Data-driven: ML model trained on conversion data to assign credit based on actual causal impact of each touchpoint. Implementation: store each user's click history (last 30 days, last 10 ads) in a user profile store. On conversion event: look up click history, apply attribution model, distribute credit to the relevant ad_ids. Attribution window (7-30 days): clicks older than the window don't receive conversion credit.

Ad Click Tracking System: Low-Level Design

Click Event Flow

Click Deduplication

Aggregation for Billing

Click Attribution

Privacy and GDPR