What Is a Usage Metering Service?
A Usage Metering Service measures how much of a resource a customer consumes, aggregates that usage, enforces limits, and feeds data to billing. It is the foundation of usage-based pricing (pay-per-seat, pay-per-API-call, pay-per-GB). The core challenge is ingesting a high-throughput stream of raw events, rolling them up accurately, and making the current totals queryable in near-real-time without overwhelming the database.
Data Model
-- Raw event log (append-only, partitioned by time)
CREATE TABLE usage_events (
event_id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL,
metric_key VARCHAR(100) NOT NULL, -- e.g. 'api_calls', 'storage_gb'
quantity DECIMAL(18,6) NOT NULL,
idempotency_key VARCHAR(255) NOT NULL UNIQUE,
recorded_at TIMESTAMP NOT NULL,
INDEX (customer_id, metric_key, recorded_at)
) PARTITION BY RANGE (UNIX_TIMESTAMP(recorded_at));
-- Aggregated hourly rollups
CREATE TABLE usage_rollups (
customer_id BIGINT NOT NULL,
metric_key VARCHAR(100) NOT NULL,
period_start TIMESTAMP NOT NULL, -- truncated to hour
total_quantity DECIMAL(18,6) NOT NULL DEFAULT 0,
PRIMARY KEY (customer_id, metric_key, period_start)
);
-- Per-subscription limits
CREATE TABLE usage_limits (
subscription_id BIGINT NOT NULL,
metric_key VARCHAR(100) NOT NULL,
soft_limit DECIMAL(18,6), -- warn at this threshold
hard_limit DECIMAL(18,6), -- block at this threshold
PRIMARY KEY (subscription_id, metric_key)
);
Core Algorithm: Metering Pipeline
The pipeline has three stages:
- Ingest: clients POST events to the metering API. Each event carries an
idempotency_key(client-generated UUID). The service writes tousage_events; duplicate keys are silently ignored viaINSERT IGNOREorON CONFLICT DO NOTHING. - Aggregate: a streaming processor (Flink, Kafka Streams, or a simple consumer group) reads from the event log and upserts into
usage_rollupsusingSUM(quantity)grouped by (customer, metric, hour). Rollups older than the current billing period are frozen and never re-written. - Query: entitlement checks sum the rollups for the current billing period and compare against
usage_limits. This query is O(hours in period), typically under 750 rows per customer per metric per month.
Limit Enforcement Workflow
- Soft limit: when usage crosses the soft threshold, emit a
usage.soft_limit_reachedevent and send a warning email. Do not block. - Hard limit: reject API calls with HTTP 429 and error code
USAGE_LIMIT_EXCEEDED. Re-check limit status at most once per minute per customer using a short-lived cache entry. - Overage billing: at period end, compute total usage minus included quota, multiply by the overage rate in the plan, and generate a line item for the invoice.
Failure Handling
At-least-once delivery: the event ingest endpoint is idempotent by design. Clients may safely retry any failed POST. The idempotency_key uniqueness constraint is the deduplication mechanism.
Rollup lag: the aggregate pipeline may be seconds to minutes behind real time. Enforce limits against slightly stale rollup data rather than raw events to avoid full table scans. Accept the small window of overage as a business policy decision, not a bug.
Partition pruning: if the event table grows to billions of rows, range partitioning by month ensures queries only scan recent partitions. Drop partitions older than the retention window rather than deleting rows.
Scalability Considerations
- Write path: ingest is the hot path. Front it with a Kafka topic; consumers write to the DB in micro-batches rather than one row per HTTP request. This absorbs traffic spikes without back-pressure on the API tier.
- Read path: cache current-period rollup totals per customer in Redis. TTL of 60 seconds. Invalidate on each rollup write. Entitlement checks read from cache; only miss to DB on cold start or forced refresh.
- Multi-region: ingest events in the local region; replicate to a central region for aggregation and billing. Avoid cross-region synchronous writes on the critical path.
Summary
A usage metering service is a pipeline: ingest events idempotently, aggregate into rollups asynchronously, and serve limit checks from a cache. The schema must be append-only and partitioned by time. Idempotency keys at the ingest layer eliminate double-counting. Separating raw events from aggregated rollups keeps both the write path and the read path efficient. Overage billing ties the metering data back to the subscription and plan management layers to complete the billing stack.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is usage metering and why is it important in system design?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Usage metering is the process of accurately measuring and recording how much of a resource (API calls, compute hours, data processed, seats) each customer consumes. It is critical for usage-based billing models because revenue depends directly on the precision and reliability of these measurements. Errors in metering lead to under-billing, over-billing disputes, or audit failures.”
}
},
{
“@type”: “Question”,
“name”: “How do you design a high-throughput usage metering pipeline?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Emit usage events from application services into a durable message queue (Kafka is common). A stream processing layer (Flink, Spark Streaming, or Kinesis Data Analytics) aggregates events by customer and time window, writing rollups to a time-series or columnar store. The billing engine reads pre-aggregated rollups at invoice time rather than scanning raw events, keeping query latency low even with billions of events.”
}
},
{
“@type”: “Question”,
“name”: “How do you guarantee at-least-once delivery and avoid double-counting in usage metering?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use idempotent event ingestion: assign each usage event a globally unique event_id at the point of emission. The metering store deduplicates on event_id within a deduplication window (e.g., 24 hours in Redis or a bloom filter). Downstream aggregation is performed on deduplicated events, so retried deliveries from the queue do not inflate usage counts.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle late-arriving usage events in a metering system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Define a watermark — a cutoff time after which the billing period is considered closed. Events arriving before the watermark are applied to the correct billing period; events arriving after it are either charged in the next cycle with a line-item note or discarded per SLA policy. Configuring a grace window (e.g., 15 minutes of allowed lateness) in the stream processor catches most late events without holding invoices open indefinitely.”
}
}
]
}
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture