Click Tracking Service Low-Level Design: Event Collection, Deduplication, and Attribution

What Is a Click Tracking Service?

A click tracking service records user interactions — ad clicks, link taps, button presses — at high throughput, then attributes those events to sessions, campaigns, or conversion funnels. The design must handle write bursts without data loss, suppress duplicate events caused by browser retries or double-taps, and deliver clean attributed data to analytics consumers with low latency.

Requirements

Functional Requirements

  • Accept click events from web and mobile clients via a lightweight HTTP endpoint.
  • Deduplicate events with the same client-generated event_id within a configurable window (default 5 minutes).
  • Attribute each click to a session, user, campaign, and page context.
  • Expose aggregated click counts and funnel conversion rates via a query API.
  • Support redirect tracking: resolve a tracked short URL, record the click, then redirect the browser.

Non-Functional Requirements

  • Ingest endpoint p99 latency under 50 ms; redirect latency under 100 ms.
  • Handle 100k events per second at peak with horizontal scaling.
  • At-least-once event delivery to the storage layer.

Data Model

Raw Event

  • event_id — client-generated UUID; used as the deduplication key.
  • event_time — client timestamp (milliseconds); server records received_at separately for skew detection.
  • user_id, session_id, anonymous_id — identity resolution chain.
  • page_url, referrer, campaign_id, element_id — context fields.
  • client_ip, user_agent — for geo enrichment and bot filtering.

Deduplication Store

A Redis key per event_id with a 5-minute TTL. Key format: dedup:click:{event_id}. SET NX atomically rejects duplicates without a read-before-write.

Attribution Record

  • session_id, user_id, campaign_id, converted (bool), conversion_value.
  • Stored in a columnar store (BigQuery, Redshift, ClickHouse) for analytical queries.

Core Algorithm: Event Ingestion Pipeline

Step 1 — Receive and Validate

The ingest endpoint accepts a JSON body or query-string parameters (for pixel tracking). Validate required fields (event_id, event_time, page_url). Reject malformed requests with 400. Respond 200 immediately after enqueueing to minimize client-perceived latency.

Step 2 — Deduplication

Before enqueueing, attempt SET dedup:click:{event_id} 1 NX PX 300000. If the key already exists, drop the event and return 200 (silent dedup, do not signal error to client). This prevents retry storms from inflating counts.

Step 3 — Enqueue to Stream

Publish the validated event to a Kafka topic partitioned by session_id. Partitioning by session ensures ordered processing for session reconstruction downstream without global ordering overhead.

Step 4 — Stream Processing and Attribution

A stream processor (Flink, Kafka Streams) consumes events, joins against a campaign lookup table (broadcast join), enriches with geo data, applies bot filtering rules (known data-center IP ranges, headless browser signatures), and emits enriched records to a columnar sink and a real-time aggregation store.

Step 5 — Funnel Analysis

Maintain per-session state: ordered list of page events within a configurable window. When a session ends (timeout or explicit close), emit a session summary containing the funnel steps traversed and whether a conversion event was observed.

API Design

  • POST /events/click — ingest endpoint; returns 200 always (errors logged server-side).
  • GET /1×1.gif?eid=&cid=&… — pixel endpoint for environments blocking JS; responds with a transparent GIF after recording.
  • GET /r/{token} — redirect tracking; records click then issues 302 to destination URL.
  • GET /stats/clicks?campaign_id=X&from=&to= — aggregated counts from the read store.
  • GET /stats/funnel?funnel_id=X&from=&to= — conversion rates by funnel step.

Scalability Considerations

Deploy the ingest tier as stateless pods behind a load balancer; autoscale on request rate. Redis deduplication shards across a cluster by event_id hash slot. Kafka partitions absorb burst writes without back-pressure on the ingest API. The stream processor scales by adding consumer group members up to the partition count. For the analytical store, pre-aggregate hourly rollup tables so dashboard queries avoid full scans. Use a CDN edge worker for pixel and redirect tracking to reduce origin load for geographically distributed traffic.

Summary

A robust click tracking service combines a low-latency ingest path with Redis-based deduplication, Kafka-backed stream processing for enrichment and attribution, and a columnar store for analytical queries. Separating ingest from processing ensures that burst traffic never degrades query performance, and client-side event IDs make deduplication safe across retries.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

Scroll to Top