Low Level Design: Usage Metering Service

What Is a Usage Metering Service?

A Usage Metering Service measures how much of a resource a customer consumes, aggregates that usage, enforces limits, and feeds data to billing. It is the foundation of usage-based pricing (pay-per-seat, pay-per-API-call, pay-per-GB). The core challenge is ingesting a high-throughput stream of raw events, rolling them up accurately, and making the current totals queryable in near-real-time without overwhelming the database.

Data Model

-- Raw event log (append-only, partitioned by time)
CREATE TABLE usage_events (
    event_id         BIGINT PRIMARY KEY,
    customer_id      BIGINT NOT NULL,
    metric_key       VARCHAR(100) NOT NULL,   -- e.g. 'api_calls', 'storage_gb'
    quantity         DECIMAL(18,6) NOT NULL,
    idempotency_key  VARCHAR(255) NOT NULL UNIQUE,
    recorded_at      TIMESTAMP NOT NULL,
    INDEX (customer_id, metric_key, recorded_at)
) PARTITION BY RANGE (UNIX_TIMESTAMP(recorded_at));

-- Aggregated hourly rollups
CREATE TABLE usage_rollups (
    customer_id    BIGINT NOT NULL,
    metric_key     VARCHAR(100) NOT NULL,
    period_start   TIMESTAMP NOT NULL,   -- truncated to hour
    total_quantity DECIMAL(18,6) NOT NULL DEFAULT 0,
    PRIMARY KEY (customer_id, metric_key, period_start)
);

-- Per-subscription limits
CREATE TABLE usage_limits (
    subscription_id  BIGINT NOT NULL,
    metric_key       VARCHAR(100) NOT NULL,
    soft_limit       DECIMAL(18,6),   -- warn at this threshold
    hard_limit       DECIMAL(18,6),   -- block at this threshold
    PRIMARY KEY (subscription_id, metric_key)
);

Core Algorithm: Metering Pipeline

The pipeline has three stages:

Ingest: clients POST events to the metering API. Each event carries an idempotency_key (client-generated UUID). The service writes to usage_events; duplicate keys are silently ignored via INSERT IGNORE or ON CONFLICT DO NOTHING.
Aggregate: a streaming processor (Flink, Kafka Streams, or a simple consumer group) reads from the event log and upserts into usage_rollups using SUM(quantity) grouped by (customer, metric, hour). Rollups older than the current billing period are frozen and never re-written.
Query: entitlement checks sum the rollups for the current billing period and compare against usage_limits. This query is O(hours in period), typically under 750 rows per customer per metric per month.

Limit Enforcement Workflow

Soft limit: when usage crosses the soft threshold, emit a usage.soft_limit_reached event and send a warning email. Do not block.
Hard limit: reject API calls with HTTP 429 and error code USAGE_LIMIT_EXCEEDED. Re-check limit status at most once per minute per customer using a short-lived cache entry.
Overage billing: at period end, compute total usage minus included quota, multiply by the overage rate in the plan, and generate a line item for the invoice.

Failure Handling

At-least-once delivery: the event ingest endpoint is idempotent by design. Clients may safely retry any failed POST. The idempotency_key uniqueness constraint is the deduplication mechanism.

Rollup lag: the aggregate pipeline may be seconds to minutes behind real time. Enforce limits against slightly stale rollup data rather than raw events to avoid full table scans. Accept the small window of overage as a business policy decision, not a bug.

Partition pruning: if the event table grows to billions of rows, range partitioning by month ensures queries only scan recent partitions. Drop partitions older than the retention window rather than deleting rows.

Scalability Considerations

Write path: ingest is the hot path. Front it with a Kafka topic; consumers write to the DB in micro-batches rather than one row per HTTP request. This absorbs traffic spikes without back-pressure on the API tier.
Read path: cache current-period rollup totals per customer in Redis. TTL of 60 seconds. Invalidate on each rollup write. Entitlement checks read from cache; only miss to DB on cold start or forced refresh.
Multi-region: ingest events in the local region; replicate to a central region for aggregation and billing. Avoid cross-region synchronous writes on the critical path.

Summary

A usage metering service is a pipeline: ingest events idempotently, aggregate into rollups asynchronously, and serve limit checks from a cache. The schema must be append-only and partitioned by time. Idempotency keys at the ingest layer eliminate double-counting. Separating raw events from aggregated rollups keeps both the write path and the read path efficient. Overage billing ties the metering data back to the subscription and plan management layers to complete the billing stack.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is usage metering and why is it important in system design?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Usage metering is the process of accurately measuring and recording how much of a resource (API calls, compute hours, data processed, seats) each customer consumes. It is critical for usage-based billing models because revenue depends directly on the precision and reliability of these measurements. Errors in metering lead to under-billing, over-billing disputes, or audit failures.”
}
},
{
“@type”: “Question”,
“name”: “How do you design a high-throughput usage metering pipeline?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Emit usage events from application services into a durable message queue (Kafka is common). A stream processing layer (Flink, Spark Streaming, or Kinesis Data Analytics) aggregates events by customer and time window, writing rollups to a time-series or columnar store. The billing engine reads pre-aggregated rollups at invoice time rather than scanning raw events, keeping query latency low even with billions of events.”
}
},
{
“@type”: “Question”,
“name”: “How do you guarantee at-least-once delivery and avoid double-counting in usage metering?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use idempotent event ingestion: assign each usage event a globally unique event_id at the point of emission. The metering store deduplicates on event_id within a deduplication window (e.g., 24 hours in Redis or a bloom filter). Downstream aggregation is performed on deduplicated events, so retried deliveries from the queue do not inflate usage counts.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle late-arriving usage events in a metering system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Define a watermark — a cutoff time after which the billing period is considered closed. Events arriving before the watermark are applied to the correct billing period; events arriving after it are either charged in the next cycle with a line-item note or discarded per SLA policy. Configuring a grace window (e.g., 15 minutes of allowed lateness) in the stream processor catches most late events without holding invoices open indefinitely.”
}
}
]
}