Low Level Design: Usage Metering Service

What Is a Usage Metering Service?

A Usage Metering Service measures how much of a resource a customer consumes, aggregates that usage, enforces limits, and feeds data to billing. It is the foundation of usage-based pricing (pay-per-seat, pay-per-API-call, pay-per-GB). The core challenge is ingesting a high-throughput stream of raw events, rolling them up accurately, and making the current totals queryable in near-real-time without overwhelming the database.

Data Model

-- Raw event log (append-only, partitioned by time)
CREATE TABLE usage_events (
    event_id         BIGINT PRIMARY KEY,
    customer_id      BIGINT NOT NULL,
    metric_key       VARCHAR(100) NOT NULL,   -- e.g. 'api_calls', 'storage_gb'
    quantity         DECIMAL(18,6) NOT NULL,
    idempotency_key  VARCHAR(255) NOT NULL UNIQUE,
    recorded_at      TIMESTAMP NOT NULL,
    INDEX (customer_id, metric_key, recorded_at)
) PARTITION BY RANGE (UNIX_TIMESTAMP(recorded_at));

-- Aggregated hourly rollups
CREATE TABLE usage_rollups (
    customer_id    BIGINT NOT NULL,
    metric_key     VARCHAR(100) NOT NULL,
    period_start   TIMESTAMP NOT NULL,   -- truncated to hour
    total_quantity DECIMAL(18,6) NOT NULL DEFAULT 0,
    PRIMARY KEY (customer_id, metric_key, period_start)
);

-- Per-subscription limits
CREATE TABLE usage_limits (
    subscription_id  BIGINT NOT NULL,
    metric_key       VARCHAR(100) NOT NULL,
    soft_limit       DECIMAL(18,6),   -- warn at this threshold
    hard_limit       DECIMAL(18,6),   -- block at this threshold
    PRIMARY KEY (subscription_id, metric_key)
);

Core Algorithm: Metering Pipeline

The pipeline has three stages:

  1. Ingest: clients POST events to the metering API. Each event carries an idempotency_key (client-generated UUID). The service writes to usage_events; duplicate keys are silently ignored via INSERT IGNORE or ON CONFLICT DO NOTHING.
  2. Aggregate: a streaming processor (Flink, Kafka Streams, or a simple consumer group) reads from the event log and upserts into usage_rollups using SUM(quantity) grouped by (customer, metric, hour). Rollups older than the current billing period are frozen and never re-written.
  3. Query: entitlement checks sum the rollups for the current billing period and compare against usage_limits. This query is O(hours in period), typically under 750 rows per customer per metric per month.

Limit Enforcement Workflow

  • Soft limit: when usage crosses the soft threshold, emit a usage.soft_limit_reached event and send a warning email. Do not block.
  • Hard limit: reject API calls with HTTP 429 and error code USAGE_LIMIT_EXCEEDED. Re-check limit status at most once per minute per customer using a short-lived cache entry.
  • Overage billing: at period end, compute total usage minus included quota, multiply by the overage rate in the plan, and generate a line item for the invoice.

Failure Handling

At-least-once delivery: the event ingest endpoint is idempotent by design. Clients may safely retry any failed POST. The idempotency_key uniqueness constraint is the deduplication mechanism.

Rollup lag: the aggregate pipeline may be seconds to minutes behind real time. Enforce limits against slightly stale rollup data rather than raw events to avoid full table scans. Accept the small window of overage as a business policy decision, not a bug.

Partition pruning: if the event table grows to billions of rows, range partitioning by month ensures queries only scan recent partitions. Drop partitions older than the retention window rather than deleting rows.

Scalability Considerations

  • Write path: ingest is the hot path. Front it with a Kafka topic; consumers write to the DB in micro-batches rather than one row per HTTP request. This absorbs traffic spikes without back-pressure on the API tier.
  • Read path: cache current-period rollup totals per customer in Redis. TTL of 60 seconds. Invalidate on each rollup write. Entitlement checks read from cache; only miss to DB on cold start or forced refresh.
  • Multi-region: ingest events in the local region; replicate to a central region for aggregation and billing. Avoid cross-region synchronous writes on the critical path.

Summary

A usage metering service is a pipeline: ingest events idempotently, aggregate into rollups asynchronously, and serve limit checks from a cache. The schema must be append-only and partitioned by time. Idempotency keys at the ingest layer eliminate double-counting. Separating raw events from aggregated rollups keeps both the write path and the read path efficient. Overage billing ties the metering data back to the subscription and plan management layers to complete the billing stack.

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

Scroll to Top