Low Level Design: Link Analytics Service

What Is a Link Analytics Service?

A Link Analytics Service tracks every interaction with a shortened URL or QR code and surfaces aggregated statistics: click counts, geographic distribution, referrer breakdown, device types, and time-series trends. It is the reporting backbone for URL shorteners and QR platforms. The core challenge is ingesting high-velocity click events without impacting redirect latency, then making aggregated data queryable in near-real-time.

Data Model

Analytics data lives in two tiers: a raw event store for auditability and a pre-aggregated summary store for fast dashboard queries.

-- Raw click events (append-only, partitioned by date)
CREATE TABLE click_event (
    id            BIGINT UNSIGNED AUTO_INCREMENT,
    link_id       BIGINT UNSIGNED NOT NULL,
    clicked_at    DATETIME(3)  NOT NULL,
    country_code  CHAR(2),
    region        VARCHAR(64),
    city          VARCHAR(128),
    referrer      VARCHAR(2048),
    user_agent    VARCHAR(512),
    device_type   ENUM('desktop','mobile','tablet','bot'),
    browser       VARCHAR(64),
    os            VARCHAR(64),
    ip_hash       CHAR(64),     -- SHA-256 of IP for unique-visitor estimation
    PRIMARY KEY (id, clicked_at)
) PARTITION BY RANGE (YEAR(clicked_at) * 100 + MONTH(clicked_at));

-- Pre-aggregated hourly rollups
CREATE TABLE link_stats_hourly (
    link_id       BIGINT UNSIGNED NOT NULL,
    hour_bucket   DATETIME     NOT NULL,   -- truncated to the hour
    total_clicks  BIGINT       DEFAULT 0,
    unique_ips    BIGINT       DEFAULT 0,
    PRIMARY KEY (link_id, hour_bucket)
);

-- Dimension breakdown table
CREATE TABLE link_stats_dimension (
    link_id       BIGINT UNSIGNED NOT NULL,
    day_bucket    DATE         NOT NULL,
    dimension     VARCHAR(32)  NOT NULL,   -- e.g., 'country', 'device', 'referrer'
    value         VARCHAR(128) NOT NULL,
    click_count   BIGINT       DEFAULT 0,
    PRIMARY KEY (link_id, day_bucket, dimension, value)
);

Click Tracking Workflow

The redirect hot path must stay fast. Click tracking is therefore fully asynchronous.

Redirect service issues the 302 and immediately emits a lightweight event to a Kafka topic click-events. The emit is fire-and-forget with a local buffer to survive brief Kafka unavailability.
Stream processor (Flink or Kafka Streams) consumes the topic, enriches each event with GeoIP lookup and User-Agent parsing, and writes enriched records to the click_event table.
Aggregation job runs on a micro-batch schedule (every 60 seconds) to upsert counts into link_stats_hourly and link_stats_dimension.
Dashboard API reads only from the aggregated tables; raw events are queried only for exports or deep-dive forensics.

// Pseudo-code: stream enrichment step
on_event(raw):
    geo   = geoip_lookup(raw.ip)
    ua    = parse_user_agent(raw.user_agent)
    event = merge(raw, geo, ua)
    event.ip_hash = sha256(raw.ip + daily_salt)
    write_to_click_event(event)
    emit_to_aggregation_queue(event)

Unique Visitor Estimation

Exact unique-IP counting at scale is expensive. Use a HyperLogLog structure (native in Redis) per link per day. On each click, call PFADD link:{id}:uv:{date} {ip_hash}. Query with PFCOUNT for an estimate with <1% error and O(1) memory regardless of cardinality. Persist the HLL to the summary table during the nightly rollup.

Failure Handling

Kafka producer buffer full: Drop the event and increment a clicks_dropped counter. Accept minor under-counting rather than slow the redirect. Alert when the drop rate exceeds 0.1%.
GeoIP service down: Enrich with empty geo fields and flag the record. A backfill job re-enriches flagged records once the service recovers.
Aggregation job lag: Dashboard queries fall back to the last successful rollup and display a staleness indicator. Use a Kafka consumer group lag metric to trigger auto-scaling of the processor.
Partition pruning failures: Archive partitions older than 90 days to cold object storage (Parquet on S3). Query via Athena or BigQuery for historical reports without keeping old partitions hot in MySQL.

Scalability Considerations

Event volume: At 100K redirects/sec, Kafka partitions should be sized at ~10K events/sec each. Use at least 10 partitions per topic; increase as volume grows.
Write amplification: Batch inserts into click_event in micro-batches of 500-1000 rows to reduce per-row overhead.
Query isolation: Never run dashboard aggregation queries against the raw event table in production. All user-facing queries hit the pre-aggregated tables or a read replica.
Time-zone handling: Store all timestamps in UTC. Apply time-zone offsets at the API layer when returning bucketed data to clients.
Multi-tenant isolation: Shard link_stats_hourly and link_stats_dimension by link_id range or tenant ID to prevent noisy-neighbor effects on large accounts.

Summary

A Link Analytics Service succeeds by decoupling the critical redirect path from all analytics work. Kafka absorbs the event burst; a stream processor enriches and stores raw data; micro-batch jobs maintain fast-query aggregation tables. HyperLogLog handles unique-visitor estimation at negligible memory cost. Partition pruning and cold-tier archival keep the hot database lean as data accumulates over months and years.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What are the main components of a link analytics system design?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A link analytics system needs an event ingestion layer (e.g., Kafka or Kinesis) to capture click events at high throughput, a stream processing layer (e.g., Flink or Spark Streaming) to enrich and aggregate events in near real-time, a time-series or columnar data store (e.g., ClickHouse, Druid, or BigQuery) for analytical queries, and a dashboard API that serves aggregated metrics to users.”
}
},
{
“@type”: “Question”,
“name”: “How do you accurately count unique visitors in a link analytics system without storing every user ID?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use probabilistic data structures such as HyperLogLog to estimate unique visitor counts with very low memory overhead and configurable error rates (typically under 1%). For exact counts on smaller datasets, a bitmap (Roaring Bitmap) per time window per link is efficient. Storing a fingerprint of IP + User-Agent in a Bloom filter can also deduplicate clicks in real time before counting.”
}
},
{
“@type”: “Question”,
“name”: “How do you design the storage layer for a link analytics system that must support both real-time and historical queries?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use a lambda or kappa architecture. The speed layer writes pre-aggregated counters to a fast store like Redis for the most recent time windows. The batch layer writes raw or aggregated events to a columnar store like ClickHouse or BigQuery for historical range queries. The query API merges results from both layers, giving low-latency access to recent data and full historical depth.”
}
},
{
“@type”: “Question”,
“name”: “How do you prevent bot traffic from polluting link analytics data?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Filter events server-side by matching the User-Agent string against known bot lists (IAB/ABC list is a standard reference). Apply IP reputation scoring and rate limiting: if a single IP generates hundreds of clicks per minute it’s almost certainly automated. For higher accuracy, use a JavaScript challenge or a CAPTCHA on redirect pages, and cross-reference with browser fingerprinting signals before counting an event as a genuine human click.”
}
}
]
}