HyperLogLog Counter Low-Level Design: Cardinality Estimation, Register Arrays, and Merge Operations

Counting the number of distinct elements in a large stream — cardinality estimation — seems to require storing every unique element seen. HyperLogLog achieves this with kilobytes of memory and sub-1% error for billions of elements. It is behind Redis PFCOUNT, BigQuery's APPROX_COUNT_DISTINCT, and Spark's countApproxDistinct. This post covers the complete low-level design.

The Cardinality Estimation Problem

Exact distinct counting requires a hash set of every unique element seen, growing linearly with the true cardinality. For 1 billion unique visitors per day, that is tens of gigabytes. HyperLogLog uses a fixed-size register array (12 KB for p=14) regardless of stream size, with a standard error of ~0.81%.

The HLL Algorithm

Core insight: hash each element to a uniformly random bit string. The probability that the maximum run of leading zeros across all hashes is at least r is approximately 1 / 2^r. If the max leading zeros is r, the estimated cardinality is 2^r. One register is too noisy; HLL averages across many buckets.

Algorithm:

Choose precision b (typically 4–16). There are 2^b buckets.
Hash each element to a 64-bit value.
Use the first b bits to select a bucket index.
Count the position of the leftmost 1-bit in the remaining bits (leading zeros + 1).
Update the bucket register to the maximum of its current value and this count.

Estimator: the estimated cardinality is:

E = alpha_m * m^2 * (sum of 2^(-register[j]) for j in 0..m-1)^(-1)

where m = 2^b and alpha_m is a bias correction constant (~0.7213 for large m).

Precision, Memory, and Error

Precision b	Registers m	Memory	Std Error
10	1,024	768 B	3.25%
12	4,096	3 KB	1.625%
14	16,384	12 KB	0.8125%
16	65,536	48 KB	0.4%

Standard error = 1.04 / sqrt(m). For most applications p=14 gives a good balance: 12 KB and sub-1% error for cardinalities from 0 to hundreds of billions.

Small and Large Range Corrections

The base estimator breaks down at extremes:

Small range: when many registers are zero, switch to linear counting — count zero registers V, estimate = m * ln(m / V). This is more accurate for sparse register arrays.
Large range: when the estimate approaches 2^32 (32-bit hash saturation), apply a correction to account for hash collisions. With 64-bit hashes this correction is practically unnecessary.

Merge for Set Union Cardinality

The HLL register array is mergeable: the union cardinality of two sets is estimated by taking the component-wise maximum of their register arrays, then running the estimator on the merged array.

merged[j] = max(hll_a[j], hll_b[j]) for j in 0..m-1

This is exact for the HLL estimator and enables distributed cardinality computation: each shard maintains its own HLL, the coordinator merges register arrays, and the merged estimate equals what you would get from a single HLL that saw all elements.

Redis Integration

Redis implements HyperLogLog natively:

PFADD key elem1 elem2 ... — add elements
PFCOUNT key [key2 ...] — estimate cardinality (merge if multiple keys)
PFMERGE dest src1 src2 ... — merge HLLs into a new key

Each Redis HLL is stored in 12 KB (dense format) or as a sparse encoding for low-cardinality HLLs (< ~300 elements). The dense format uses 6 bits per register (max run length = 63 for 64-bit hashes).

Use Cases

Unique visitor counting: track daily active users per product feature with a per-day-per-feature HLL key. PFMERGE to get weekly or monthly uniques.
Distinct query counting: count unique query patterns hitting a search engine or database — identifies cardinality of the query space.
A/B experiment reach: estimate distinct users exposed to each variant without storing user IDs.

SQL Schema

-- HLL counter registry
CREATE TABLE HLLCounter (
    id             BIGSERIAL PRIMARY KEY,
    name           TEXT NOT NULL UNIQUE,
    precision      INTEGER NOT NULL DEFAULT 14,  -- b value, registers = 2^precision
    registers      BYTEA,                         -- serialized register array
    last_updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Daily cardinality snapshots
CREATE TABLE HLLDailySnapshot (
    id                   BIGSERIAL PRIMARY KEY,
    counter_id           BIGINT NOT NULL REFERENCES HLLCounter(id),
    date                 DATE NOT NULL,
    estimated_cardinality BIGINT NOT NULL,
    serialized           BYTEA NOT NULL,
    UNIQUE(counter_id, date)
);

CREATE INDEX idx_hll_snap_counter ON HLLDailySnapshot(counter_id, date DESC);

Python Implementation

import math
import hashlib
import struct
from typing import Optional


class HyperLogLog:
    """
    HyperLogLog cardinality estimator.
    precision b: number of leading bits for bucket selection.
    m = 2^b registers, standard error = 1.04/sqrt(m).
    """

    ALPHA = {
        4: 0.673,
        5: 0.697,
        6: 0.709,
    }
    ALPHA_DEFAULT = 0.7213

    def __init__(self, precision: int = 14):
        if not (4 <= precision <= 18):
            raise ValueError("precision must be between 4 and 18")
        self.b = precision
        self.m = 1 < None:
        h = int(hashlib.sha256(item.encode()).hexdigest(), 16)
        # Use top b bits for bucket index
        bucket = h >> (128 - self.b)
        # Count leading zeros in remaining bits + 1
        remaining = h & ((1 < self.registers[bucket]:
            self.registers[bucket] = run_length

    def estimate(self) -> int:
        """Return estimated cardinality."""
        m = self.m
        raw = self._alpha * m * m / sum(2.0 ** (-r) for r in self.registers)

        # Small range correction: linear counting
        zero_count = self.registers.count(0)
        if raw  0:
            return int(m * math.log(m / zero_count))

        # Large range correction (for 32-bit hashes only, skip for 128-bit)
        return int(raw)

    def merge(self, other: "HyperLogLog") -> "HyperLogLog":
        """Return new HLL representing the union of this and other."""
        if self.b != other.b:
            raise ValueError("Cannot merge HLLs with different precision")
        merged = HyperLogLog(self.b)
        for i in range(self.m):
            merged.registers[i] = max(self.registers[i], other.registers[i])
        return merged

    def serialize(self) -> bytes:
        header = struct.pack(">B", self.b)
        return header + bytes(self.registers)

    @classmethod
    def deserialize(cls, data: bytes) -> "HyperLogLog":
        precision = struct.unpack(">B", data[:1])[0]
        hll = cls(precision)
        hll.registers = bytearray(data[1:])
        return hll

    @staticmethod
    def _count_leading_zeros(value: int, total_bits: int) -> int:
        if value == 0:
            return total_bits
        count = 0
        mask = 1 <>= 1
        return count


class RedisHLL:
    """Thin wrapper around Redis native HyperLogLog commands."""
    def __init__(self, redis_client, key: str):
        self.redis = redis_client
        self.key = key

    def add(self, *items: str) -> bool:
        return bool(self.redis.pfadd(self.key, *items))

    def estimate(self) -> int:
        return self.redis.pfcount(self.key)

    def merge_into(self, dest_key: str, *other_keys: str) -> None:
        self.redis.pfmerge(dest_key, self.key, *other_keys)

Why Harmonic Mean?

The HLL estimator uses the harmonic mean of 2^register[j] values across buckets rather than the arithmetic mean. The arithmetic mean is skewed by outlier registers with large values (high run lengths from lucky hashes). The harmonic mean suppresses these outliers, giving a more stable and accurate estimate of the true cardinality.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is the standard error of a HyperLogLog calculated?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The standard error of HyperLogLog is 1.04 / sqrt(m) where m = 2^b is the number of registers and b is the precision parameter. For b=14 (16,384 registers), the standard error is approximately 0.81%. This is the relative standard deviation of the estimate — meaning the estimate is within 0.81% of the true cardinality for the middle two-thirds of estimates.”
}
},
{
“@type”: “Question”,
“name”: “How does increasing HyperLogLog precision affect memory usage?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each additional bit of precision doubles the number of registers and doubles memory usage. At b=14 (standard Redis HLL), memory is 12 KB with 0.81% error. At b=16, memory is 48 KB with 0.4% error. For most analytical use cases, b=14 is the right default — going higher gives diminishing returns on accuracy relative to the 4x memory cost per 2-bit precision increase.”
}
},
{
“@type”: “Question”,
“name”: “How does HyperLogLog merge work for union cardinality?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Merging two HLL structures gives the estimated cardinality of the union of the two sets. The merge operation takes the component-wise maximum of the two register arrays: merged[j] = max(hll_a[j], hll_b[j]). Running the HLL estimator on the merged register array gives the union cardinality. This makes HLL ideal for distributed systems where each shard computes a local HLL and a coordinator merges them.”
}
},
{
“@type”: “Question”,
“name”: “Why does HyperLogLog use the harmonic mean rather than arithmetic mean?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The harmonic mean suppresses the influence of outlier registers — buckets that observed unusually long runs of leading zeros by chance. An arithmetic mean would be heavily skewed upward by these outliers, producing overestimates. The harmonic mean weights small values more heavily, giving a more robust and accurate estimator. The same principle applies in other averaging contexts where outlier suppression is needed.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does the HyperLogLog algorithm estimate cardinality?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each element is hashed; the hash is split into a bucket index (leading b bits) and a value; the value's position of the leftmost 1-bit is observed and stored as the bucket's maximum; the harmonic mean of 2^(max_leading_zeros) across all 2^b buckets estimates the cardinality.”
}
},
{
“@type”: “Question”,
“name”: “What are the small and large range corrections?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Small range correction uses linear counting when fewer than a threshold of registers are non-zero (sparse regime); large range correction adjusts for hash saturation when the estimate exceeds a fraction of the hash space; both corrections reduce systematic bias.”
}
},
{
“@type”: “Question”,
“name”: “How does PFMERGE enable set union cardinality?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Two HLL structures are merged by taking the component-wise maximum of each register; the resulting structure estimates the cardinality of the union of both sets without storing the actual elements; this is exact for the HLL estimation guarantee.”
}
},
{
“@type”: “Question”,
“name”: “Why is HyperLogLog preferred over exact counting for unique visitor metrics?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Exact counting requires storing all distinct user IDs (unbounded memory); HLL uses a fixed ~12KB for p=14 regardless of cardinality, with ~0.81% error — sufficient for dashboards and analytics without exact precision.”
}
}
]
}