Low Level Design: Link Analytics Service

What Is a Link Analytics Service?

A Link Analytics Service tracks every interaction with a shortened URL or QR code and surfaces aggregated statistics: click counts, geographic distribution, referrer breakdown, device types, and time-series trends. It is the reporting backbone for URL shorteners and QR platforms. The core challenge is ingesting high-velocity click events without impacting redirect latency, then making aggregated data queryable in near-real-time.

Data Model

Analytics data lives in two tiers: a raw event store for auditability and a pre-aggregated summary store for fast dashboard queries.

-- Raw click events (append-only, partitioned by date)
CREATE TABLE click_event (
    id            BIGINT UNSIGNED AUTO_INCREMENT,
    link_id       BIGINT UNSIGNED NOT NULL,
    clicked_at    DATETIME(3)  NOT NULL,
    country_code  CHAR(2),
    region        VARCHAR(64),
    city          VARCHAR(128),
    referrer      VARCHAR(2048),
    user_agent    VARCHAR(512),
    device_type   ENUM('desktop','mobile','tablet','bot'),
    browser       VARCHAR(64),
    os            VARCHAR(64),
    ip_hash       CHAR(64),     -- SHA-256 of IP for unique-visitor estimation
    PRIMARY KEY (id, clicked_at)
) PARTITION BY RANGE (YEAR(clicked_at) * 100 + MONTH(clicked_at));

-- Pre-aggregated hourly rollups
CREATE TABLE link_stats_hourly (
    link_id       BIGINT UNSIGNED NOT NULL,
    hour_bucket   DATETIME     NOT NULL,   -- truncated to the hour
    total_clicks  BIGINT       DEFAULT 0,
    unique_ips    BIGINT       DEFAULT 0,
    PRIMARY KEY (link_id, hour_bucket)
);

-- Dimension breakdown table
CREATE TABLE link_stats_dimension (
    link_id       BIGINT UNSIGNED NOT NULL,
    day_bucket    DATE         NOT NULL,
    dimension     VARCHAR(32)  NOT NULL,   -- e.g., 'country', 'device', 'referrer'
    value         VARCHAR(128) NOT NULL,
    click_count   BIGINT       DEFAULT 0,
    PRIMARY KEY (link_id, day_bucket, dimension, value)
);

Click Tracking Workflow

The redirect hot path must stay fast. Click tracking is therefore fully asynchronous.

  1. Redirect service issues the 302 and immediately emits a lightweight event to a Kafka topic click-events. The emit is fire-and-forget with a local buffer to survive brief Kafka unavailability.
  2. Stream processor (Flink or Kafka Streams) consumes the topic, enriches each event with GeoIP lookup and User-Agent parsing, and writes enriched records to the click_event table.
  3. Aggregation job runs on a micro-batch schedule (every 60 seconds) to upsert counts into link_stats_hourly and link_stats_dimension.
  4. Dashboard API reads only from the aggregated tables; raw events are queried only for exports or deep-dive forensics.
// Pseudo-code: stream enrichment step
on_event(raw):
    geo   = geoip_lookup(raw.ip)
    ua    = parse_user_agent(raw.user_agent)
    event = merge(raw, geo, ua)
    event.ip_hash = sha256(raw.ip + daily_salt)
    write_to_click_event(event)
    emit_to_aggregation_queue(event)

Unique Visitor Estimation

Exact unique-IP counting at scale is expensive. Use a HyperLogLog structure (native in Redis) per link per day. On each click, call PFADD link:{id}:uv:{date} {ip_hash}. Query with PFCOUNT for an estimate with <1% error and O(1) memory regardless of cardinality. Persist the HLL to the summary table during the nightly rollup.

Failure Handling

  • Kafka producer buffer full: Drop the event and increment a clicks_dropped counter. Accept minor under-counting rather than slow the redirect. Alert when the drop rate exceeds 0.1%.
  • GeoIP service down: Enrich with empty geo fields and flag the record. A backfill job re-enriches flagged records once the service recovers.
  • Aggregation job lag: Dashboard queries fall back to the last successful rollup and display a staleness indicator. Use a Kafka consumer group lag metric to trigger auto-scaling of the processor.
  • Partition pruning failures: Archive partitions older than 90 days to cold object storage (Parquet on S3). Query via Athena or BigQuery for historical reports without keeping old partitions hot in MySQL.

Scalability Considerations

  • Event volume: At 100K redirects/sec, Kafka partitions should be sized at ~10K events/sec each. Use at least 10 partitions per topic; increase as volume grows.
  • Write amplification: Batch inserts into click_event in micro-batches of 500-1000 rows to reduce per-row overhead.
  • Query isolation: Never run dashboard aggregation queries against the raw event table in production. All user-facing queries hit the pre-aggregated tables or a read replica.
  • Time-zone handling: Store all timestamps in UTC. Apply time-zone offsets at the API layer when returning bucketed data to clients.
  • Multi-tenant isolation: Shard link_stats_hourly and link_stats_dimension by link_id range or tenant ID to prevent noisy-neighbor effects on large accounts.

Summary

A Link Analytics Service succeeds by decoupling the critical redirect path from all analytics work. Kafka absorbs the event burst; a stream processor enriches and stores raw data; micro-batch jobs maintain fast-query aggregation tables. HyperLogLog handles unique-visitor estimation at negligible memory cost. Partition pruning and cold-tier archival keep the hot database lean as data accumulates over months and years.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What are the main components of a link analytics system design?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A link analytics system needs an event ingestion layer (e.g., Kafka or Kinesis) to capture click events at high throughput, a stream processing layer (e.g., Flink or Spark Streaming) to enrich and aggregate events in near real-time, a time-series or columnar data store (e.g., ClickHouse, Druid, or BigQuery) for analytical queries, and a dashboard API that serves aggregated metrics to users.”
}
},
{
“@type”: “Question”,
“name”: “How do you accurately count unique visitors in a link analytics system without storing every user ID?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use probabilistic data structures such as HyperLogLog to estimate unique visitor counts with very low memory overhead and configurable error rates (typically under 1%). For exact counts on smaller datasets, a bitmap (Roaring Bitmap) per time window per link is efficient. Storing a fingerprint of IP + User-Agent in a Bloom filter can also deduplicate clicks in real time before counting.”
}
},
{
“@type”: “Question”,
“name”: “How do you design the storage layer for a link analytics system that must support both real-time and historical queries?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use a lambda or kappa architecture. The speed layer writes pre-aggregated counters to a fast store like Redis for the most recent time windows. The batch layer writes raw or aggregated events to a columnar store like ClickHouse or BigQuery for historical range queries. The query API merges results from both layers, giving low-latency access to recent data and full historical depth.”
}
},
{
“@type”: “Question”,
“name”: “How do you prevent bot traffic from polluting link analytics data?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Filter events server-side by matching the User-Agent string against known bot lists (IAB/ABC list is a standard reference). Apply IP reputation scoring and rate limiting: if a single IP generates hundreds of clicks per minute it’s almost certainly automated. For higher accuracy, use a JavaScript challenge or a CAPTCHA on redirect pages, and cross-reference with browser fingerprinting signals before counting an event as a genuine human click.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

Scroll to Top