Tenant Billing Service: Overview and Requirements
A tenant billing service meters resource consumption per tenant, applies tiered pricing rules, generates invoices on a billing cycle, and runs a dunning workflow to recover overdue accounts. It must be accurate, auditable, and resilient to metering pipeline gaps — a missed charge damages revenue while an overcharge damages customer trust.
Functional Requirements
- Ingest usage events (API calls, storage bytes, compute minutes) from the platform in real time.
- Aggregate usage per tenant per billing period and apply tiered pricing tiers (e.g., first 1M API calls free, next 10M at $0.001 each).
- Generate itemized invoices at the end of each billing cycle.
- Charge payment methods via a payment processor integration (Stripe, Braintree).
- Run a dunning workflow for failed payments: retry on a schedule, send reminder emails, and suspend the tenant after configurable failed attempts.
Non-Functional Requirements
- Usage aggregation latency under 5 minutes from event to metered total.
- Invoice generation idempotent: generating the same invoice twice produces the same result.
- Exactly-once billing: each usage event contributes to exactly one invoice.
- Audit log retention for 7 years to satisfy financial compliance requirements.
Data Model
Usage Event
- event_id — UUID, used for deduplication.
- tenant_id — the consuming tenant.
- metric_name — api_calls, storage_gb_hours, compute_minutes.
- quantity — numeric usage amount.
- occurred_at — event timestamp from the source system.
- ingested_at — arrival time at the billing service.
Metered Usage Record
- tenant_id, billing_period_id, metric_name — composite key.
- total_quantity — aggregated usage for the period.
- last_event_id — watermark for idempotent replay.
Invoice
- invoice_id — UUID.
- tenant_id, billing_period_id.
- line_items — JSONB array: metric, quantity, unit_price, tier_label, subtotal.
- subtotal, tax, total.
- status — DRAFT, FINALIZED, PAID, OVERDUE, VOID.
- due_date, paid_at.
Dunning Attempt
- attempt_id — UUID.
- invoice_id — foreign key.
- attempt_number — 1-based sequence.
- scheduled_at, executed_at.
- outcome — SUCCESS, FAILED, DECLINED.
- processor_response_code.
Core Algorithms
Tiered Pricing Calculation
For each metric and billing period, evaluate the total quantity against the pricing tiers in ascending order. Consume each tier up to its upper bound at the tier rate, then apply the next tier rate to the remainder. Sum the tier subtotals to produce the metric line item total. Store the tier breakdown in the line_items JSONB for invoice transparency.
Idempotent Usage Ingestion
Usage events arrive on a Kafka topic. The ingestor uses event_id as a deduplication key in a Redis set (with a 48-hour TTL window). On each event, check if event_id is present in Redis before updating the metered_usage_record. If present, skip and acknowledge — the event was already counted. This prevents double-counting during consumer restarts or event replay.
Invoice Generation
Invoice generation runs as a scheduled job at billing period close. For each tenant, read the finalized metered usage records, apply the pricing tiers, compute tax using the tenant address and a tax service integration, and write the invoice in DRAFT state. A second pass marks DRAFT invoices as FINALIZED after a human or automated review window. Finalization is a state machine transition enforced at the database level by a check constraint.
Dunning Workflow
- On invoice finalization, schedule the first payment attempt immediately.
- On failure, schedule retry attempts at day 3, day 7, and day 14 after the due date.
- Send a payment reminder email before each retry attempt.
- After the final retry fails, set the invoice status to OVERDUE, notify the customer success team, and trigger a tenant suspension workflow via the multi-tenant service API.
- On successful payment at any retry, cancel pending future retries and update invoice status to PAID.
Scalability Design
- Partition the usage event Kafka topic by tenant_id to ensure ordered processing per tenant and allow horizontal scaling of ingestor consumers.
- Aggregate metered usage in ClickHouse for fast period-over-period reporting across millions of events without impacting the transactional invoice database.
- Use PostgreSQL advisory locks keyed on tenant_id when generating invoices to prevent concurrent generation for the same tenant.
- Enqueue dunning jobs in a durable job queue (Sidekiq with Redis persistence or AWS SQS) with visibility timeouts to handle worker failures without losing retry state.
API Design
- POST /v1/usage-events — bulk-ingest usage events from platform services; returns accepted count and duplicate count.
- GET /v1/tenants/{tenant_id}/usage?period={billing_period_id} — return metered totals per metric for the period.
- GET /v1/invoices/{invoice_id} — retrieve a full invoice with line items.
- POST /v1/invoices/{invoice_id}/finalize — transition a DRAFT invoice to FINALIZED (operator action).
- POST /v1/invoices/{invoice_id}/void — void an invoice with a required reason; creates an audit log entry.
- GET /v1/tenants/{tenant_id}/invoices — paginated invoice history for a tenant.
Observability
- Track the deduplication rate (duplicates as a percentage of total ingested events) to monitor upstream pipeline health.
- Alert when metered usage for any tenant shows a gap larger than 10 minutes — missing events mean an undercharge in the current period.
- Monitor dunning success rate by attempt number; a drop in first-attempt success rate signals a payment processor issue, not a customer problem.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is a usage metering pipeline designed for tenant billing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Usage events (API calls, storage bytes written, compute seconds) are emitted by services to a Kafka topic. A stream processor aggregates events per tenant per billing period into usage records stored in an immutable append-only ledger. Idempotent event IDs prevent double-counting on reprocessing. Hourly snapshots enable real-time usage dashboards while end-of-period rollups feed invoice generation.”
}
},
{
“@type”: “Question”,
“name”: “How is tiered pricing calculated from usage data?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Tiered pricing applies different per-unit rates across usage bands (e.g., first 1 M API calls at $0.001, next 9 M at $0.0008). The calculation iterates over tiers in order, consuming usage until each tier's cap is reached and summing the cost. Volume and committed-use discounts are applied as post-processing steps after raw tier calculation. The pricing engine reads tier definitions from config, enabling rate changes without code deploys.”
}
},
{
“@type”: “Question”,
“name”: “How is invoice generation handled in a billing system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “At billing cycle close, a job reads the tenant's finalized usage ledger, applies the pricing engine, and produces a line-item invoice record in the database. A PDF renderer (or third-party billing API like Stripe) generates the customer-facing document. Invoices are immutable once issued; corrections are handled via credit memos. The generation job is idempotent — re-running it for the same period produces the same invoice.”
}
},
{
“@type”: “Question”,
“name”: “How does a dunning workflow state machine handle failed payments?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The dunning state machine transitions through states: INVOICE_DUE → RETRY_1 (e.g., +3 days) → RETRY_2 (+7 days) → FINAL_NOTICE (+14 days) → SUSPENDED → CANCELLED. Each transition triggers an email notification and a payment retry attempt. Successful payment moves the tenant back to ACTIVE. State transitions are persisted to a durable store and driven by a scheduled job, ensuring reliability across restarts.”
}
}
]
}
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Shopify Interview Guide