Click Tracking Service Low-Level Design: Event Collection, Deduplication, and Attribution

What Is a Click Tracking Service?

A click tracking service records user interactions — ad clicks, link taps, button presses — at high throughput, then attributes those events to sessions, campaigns, or conversion funnels. The design must handle write bursts without data loss, suppress duplicate events caused by browser retries or double-taps, and deliver clean attributed data to analytics consumers with low latency.

Requirements

Functional Requirements

Accept click events from web and mobile clients via a lightweight HTTP endpoint.
Deduplicate events with the same client-generated event_id within a configurable window (default 5 minutes).
Attribute each click to a session, user, campaign, and page context.
Expose aggregated click counts and funnel conversion rates via a query API.
Support redirect tracking: resolve a tracked short URL, record the click, then redirect the browser.

Non-Functional Requirements

Ingest endpoint p99 latency under 50 ms; redirect latency under 100 ms.
Handle 100k events per second at peak with horizontal scaling.
At-least-once event delivery to the storage layer.

Data Model

Raw Event

event_id — client-generated UUID; used as the deduplication key.
event_time — client timestamp (milliseconds); server records received_at separately for skew detection.
user_id, session_id, anonymous_id — identity resolution chain.
page_url, referrer, campaign_id, element_id — context fields.
client_ip, user_agent — for geo enrichment and bot filtering.

Deduplication Store

A Redis key per event_id with a 5-minute TTL. Key format: dedup:click:{event_id}. SET NX atomically rejects duplicates without a read-before-write.

Attribution Record

session_id, user_id, campaign_id, converted (bool), conversion_value.
Stored in a columnar store (BigQuery, Redshift, ClickHouse) for analytical queries.

Core Algorithm: Event Ingestion Pipeline

Step 1 — Receive and Validate

The ingest endpoint accepts a JSON body or query-string parameters (for pixel tracking). Validate required fields (event_id, event_time, page_url). Reject malformed requests with 400. Respond 200 immediately after enqueueing to minimize client-perceived latency.

Step 2 — Deduplication

Before enqueueing, attempt SET dedup:click:{event_id} 1 NX PX 300000. If the key already exists, drop the event and return 200 (silent dedup, do not signal error to client). This prevents retry storms from inflating counts.

Step 3 — Enqueue to Stream

Publish the validated event to a Kafka topic partitioned by session_id. Partitioning by session ensures ordered processing for session reconstruction downstream without global ordering overhead.

Step 4 — Stream Processing and Attribution

A stream processor (Flink, Kafka Streams) consumes events, joins against a campaign lookup table (broadcast join), enriches with geo data, applies bot filtering rules (known data-center IP ranges, headless browser signatures), and emits enriched records to a columnar sink and a real-time aggregation store.

Step 5 — Funnel Analysis

Maintain per-session state: ordered list of page events within a configurable window. When a session ends (timeout or explicit close), emit a session summary containing the funnel steps traversed and whether a conversion event was observed.

API Design

POST /events/click — ingest endpoint; returns 200 always (errors logged server-side).
GET /1×1.gif?eid=&cid=&… — pixel endpoint for environments blocking JS; responds with a transparent GIF after recording.
GET /r/{token} — redirect tracking; records click then issues 302 to destination URL.
GET /stats/clicks?campaign_id=X&from=&to= — aggregated counts from the read store.
GET /stats/funnel?funnel_id=X&from=&to= — conversion rates by funnel step.

Scalability Considerations

Deploy the ingest tier as stateless pods behind a load balancer; autoscale on request rate. Redis deduplication shards across a cluster by event_id hash slot. Kafka partitions absorb burst writes without back-pressure on the ingest API. The stream processor scales by adding consumer group members up to the partition count. For the analytical store, pre-aggregate hourly rollup tables so dashboard queries avoid full scans. Use a CDN edge worker for pixel and redirect tracking to reduce origin load for geographically distributed traffic.

Summary

A robust click tracking service combines a low-latency ingest path with Redis-based deduplication, Kafka-backed stream processing for enrichment and attribution, and a columnar store for analytical queries. Separating ingest from processing ensures that burst traffic never degrades query performance, and client-side event IDs make deduplication safe across retries.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does Redis SET NX enable click deduplication?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “For each click event, construct a dedup key from user_id + item_id + time_bucket (e.g., 1-minute window). Issue SET dedup_key 1 NX EX 120. If the command returns nil the click is a duplicate within the window and is dropped. If it returns OK the click is novel and forwarded to the ingest pipeline. The short TTL bounds memory usage while covering realistic double-click and retry windows.”
}
},
{
“@type”: “Question”,
“name”: “How should Kafka be used for partitioned click ingest?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Partition the click topic by user_id so all clicks for a given user land on the same partition, preserving order for session stitching. Producers send click events with user_id as the partition key. Consumers in a consumer group each own a subset of partitions and write to a columnar store (e.g., ClickHouse or BigQuery) in micro-batches. Partition count should be set high enough to support future consumer parallelism without repartitioning.”
}
},
{
“@type”: “Question”,
“name”: “How do you filter bot traffic in a click tracking system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Apply a layered approach: server-side, reject requests matching known bot User-Agent strings and datacenter IP ranges via a blocklist. Behaviorally, flag sessions with click rates exceeding a human-plausible threshold (e.g., >10 clicks/second) or with no mouse-move events preceding a click. Use a CAPTCHA challenge for borderline sessions. Store a bot_probability score per session for downstream filtering rather than hard-dropping at ingest.”
}
},
{
“@type”: “Question”,
“name”: “How does session attribution and funnel analysis work in a click tracking system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Assign a session_id on first page load, persisted in a first-party cookie with a 30-minute inactivity timeout. Attribute each click to the session's acquisition source (UTM parameters captured at session start). For funnel analysis, join click events on session_id ordered by timestamp to identify step completion and drop-off. Use window functions in the analytics store to compute conversion rates between funnel steps per cohort.”
}
}
]
}