What Is a Click Tracking Service?
A click tracking service records user interactions — ad clicks, link taps, button presses — at high throughput, then attributes those events to sessions, campaigns, or conversion funnels. The design must handle write bursts without data loss, suppress duplicate events caused by browser retries or double-taps, and deliver clean attributed data to analytics consumers with low latency.
Requirements
Functional Requirements
- Accept click events from web and mobile clients via a lightweight HTTP endpoint.
- Deduplicate events with the same client-generated event_id within a configurable window (default 5 minutes).
- Attribute each click to a session, user, campaign, and page context.
- Expose aggregated click counts and funnel conversion rates via a query API.
- Support redirect tracking: resolve a tracked short URL, record the click, then redirect the browser.
Non-Functional Requirements
- Ingest endpoint p99 latency under 50 ms; redirect latency under 100 ms.
- Handle 100k events per second at peak with horizontal scaling.
- At-least-once event delivery to the storage layer.
Data Model
Raw Event
- event_id — client-generated UUID; used as the deduplication key.
- event_time — client timestamp (milliseconds); server records received_at separately for skew detection.
- user_id, session_id, anonymous_id — identity resolution chain.
- page_url, referrer, campaign_id, element_id — context fields.
- client_ip, user_agent — for geo enrichment and bot filtering.
Deduplication Store
A Redis key per event_id with a 5-minute TTL. Key format: dedup:click:{event_id}. SET NX atomically rejects duplicates without a read-before-write.
Attribution Record
- session_id, user_id, campaign_id, converted (bool), conversion_value.
- Stored in a columnar store (BigQuery, Redshift, ClickHouse) for analytical queries.
Core Algorithm: Event Ingestion Pipeline
Step 1 — Receive and Validate
The ingest endpoint accepts a JSON body or query-string parameters (for pixel tracking). Validate required fields (event_id, event_time, page_url). Reject malformed requests with 400. Respond 200 immediately after enqueueing to minimize client-perceived latency.
Step 2 — Deduplication
Before enqueueing, attempt SET dedup:click:{event_id} 1 NX PX 300000. If the key already exists, drop the event and return 200 (silent dedup, do not signal error to client). This prevents retry storms from inflating counts.
Step 3 — Enqueue to Stream
Publish the validated event to a Kafka topic partitioned by session_id. Partitioning by session ensures ordered processing for session reconstruction downstream without global ordering overhead.
Step 4 — Stream Processing and Attribution
A stream processor (Flink, Kafka Streams) consumes events, joins against a campaign lookup table (broadcast join), enriches with geo data, applies bot filtering rules (known data-center IP ranges, headless browser signatures), and emits enriched records to a columnar sink and a real-time aggregation store.
Step 5 — Funnel Analysis
Maintain per-session state: ordered list of page events within a configurable window. When a session ends (timeout or explicit close), emit a session summary containing the funnel steps traversed and whether a conversion event was observed.
API Design
- POST /events/click — ingest endpoint; returns 200 always (errors logged server-side).
- GET /1×1.gif?eid=&cid=&… — pixel endpoint for environments blocking JS; responds with a transparent GIF after recording.
- GET /r/{token} — redirect tracking; records click then issues 302 to destination URL.
- GET /stats/clicks?campaign_id=X&from=&to= — aggregated counts from the read store.
- GET /stats/funnel?funnel_id=X&from=&to= — conversion rates by funnel step.
Scalability Considerations
Deploy the ingest tier as stateless pods behind a load balancer; autoscale on request rate. Redis deduplication shards across a cluster by event_id hash slot. Kafka partitions absorb burst writes without back-pressure on the ingest API. The stream processor scales by adding consumer group members up to the partition count. For the analytical store, pre-aggregate hourly rollup tables so dashboard queries avoid full scans. Use a CDN edge worker for pixel and redirect tracking to reduce origin load for geographically distributed traffic.
Summary
A robust click tracking service combines a low-latency ingest path with Redis-based deduplication, Kafka-backed stream processing for enrichment and attribution, and a columnar store for analytical queries. Separating ingest from processing ensures that burst traffic never degrades query performance, and client-side event IDs make deduplication safe across retries.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering