Quota Management Service Low-Level Design: Resource Limits, Usage Tracking, and Soft vs Hard Enforcement

Quota Management Service: Low-Level Design

A quota management service defines resource limits per customer or user, tracks real-time usage against those limits, and enforces them at request time. The key engineering challenges are: making quota checks fast (in the hot request path), supporting hierarchical limits (org → team → user), and handling period resets cleanly.

Quota Types

API call limits — requests per minute, per day, per month
Storage quota — total bytes stored (files, database rows, media)
Compute quota — CPU seconds, build minutes, GPU hours
Seat limits — maximum number of active users per organization
Feature-specific limits — number of projects, number of integrations, number of alerts

Hierarchical Quotas

Quotas are applied at multiple levels in a hierarchy:

Organization (10,000 API calls/day)
  └── Team A (3,000 API calls/day)
        └── User 1 (1,000 API calls/day)
        └── User 2 (1,000 API calls/day)
  └── Team B (5,000 API calls/day)

Invariant: a child's limit cannot exceed its parent's limit. A request by User 1 consumes quota at three levels: user, team, and organization. All three must have remaining quota for the request to succeed. If any level is exhausted, the request is blocked.

Quota Definition Schema

quotas(
  quota_id       UUID PRIMARY KEY,
  entity_type    ENUM('org','team','user'),
  entity_id      UUID NOT NULL,
  resource_type  VARCHAR NOT NULL,   -- 'api_calls', 'storage_bytes', etc.
  limit_value    BIGINT NOT NULL,
  period         ENUM('per_minute','per_day','per_month','unlimited'),
  overage_policy ENUM('block','warn','charge'),
  created_at     TIMESTAMP,
  expires_at     TIMESTAMP           -- for temporary quota increases
)

Usage Tracking with Redis

Real-time usage counters live in Redis for O(1) check and increment in the hot request path:

Key pattern: quota:{entity_type}:{entity_id}:{resource_type}:{period_bucket}
Example:     quota:user:usr_123:api_calls:2025-04-17

INCR quota:user:usr_123:api_calls:2025-04-17
-- Returns new count; compare against limit

The period_bucket encodes the current period: daily quotas use YYYY-MM-DD, monthly use YYYY-MM, per-minute use YYYY-MM-DDTHH:mm. Redis key TTL is set to 2x the period duration to handle clock skew at period boundaries.

Atomic Check-and-Increment

The check and increment must be atomic to prevent race conditions. A Lua script executes both operations in a single Redis transaction:

local current = redis.call('INCR', KEYS[1])
redis.call('EXPIRE', KEYS[1], ARGV[1])  -- set TTL on first write
if current > tonumber(ARGV[2]) then
  redis.call('DECR', KEYS[1])           -- roll back increment
  return -1                             -- signal: over limit
end
return current

Return value -1 means the request is over quota; any positive value means the increment succeeded and the new count is returned.

Period Reset

Quota counters reset at the end of each billing period:

Daily quotas — key includes date; a new key is automatically created the next day. Old keys expire via TTL.
Monthly quotas — key includes year-month; resets on the first of each month (or billing anniversary date)
Per-minute quotas — key includes minute; automatically expires after 2 minutes

No scheduled job is needed for most cases — the key expiry handles reset. For billing anniversary dates that differ per customer, a scheduled job resets the customer's Redis key at midnight of their anniversary.

Soft vs Hard Enforcement

Hard limit — request is rejected with HTTP 429 (rate limit) or 403 (quota exceeded) when the limit is reached. No exceptions.
Soft limit — requests are allowed to exceed the limit; warnings are sent at 80%, 90%, and 100% of the quota. Used for paid tiers where overage is billed rather than blocked.
Warn policy — same as soft, but no billing; purely informational. Used for storage quotas on free tiers where abrupt blocking would cause data loss.

The overage_policy field in the quota definition controls which enforcement mode applies.

Quota Check in the Request Path

Middleware intercepts every API request and runs the quota check before handing off to the handler:

Extract entity identifiers from the authenticated request (user_id, team_id, org_id)
Look up applicable quotas from the quota cache (Redis or in-memory, refreshed every 60 seconds)
Run atomic check-and-increment Lua script for each level of the hierarchy
If any level returns -1, return 429 with Retry-After header and X-RateLimit-Limit / X-RateLimit-Remaining headers
On success, proceed to the handler

Quota Override Workflow

Administrators can grant temporary quota increases:

Create a new row in the quotas table with a higher limit and an expires_at timestamp
The override row takes precedence over the base quota (highest limit wins, or most specific entity wins)
When the override expires, the base quota automatically applies again
Override grants are logged in an admin audit trail

Quota Aggregation for Reporting and Billing

Redis counters are the source of truth for real-time enforcement, but they are not durable for billing. A periodic job (every 15 minutes) snapshots Redis counters to a relational table:

usage_snapshots(entity_id, resource_type, period, usage_count, snapshotted_at)

The billing system reads from this table to compute invoices. For overage billing, usage above the plan limit is summed at month end and priced per unit.

Inherited Quotas

If no explicit quota is defined for a child entity, it inherits the parent's limit. The lookup order: user-specific quota → team quota → org quota → plan default quota. The first match wins. This allows setting org-wide defaults without having to define quotas for every user individually.

Summary

The quota management service combines hierarchical quota definitions, atomic Redis check-and-increment for O(1) enforcement in the request path, period-based key expiry for resets, soft vs hard enforcement policies, temporary override workflows, and periodic snapshot aggregation for billing — providing flexible, scalable resource governance across all customer tiers.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is hierarchical quota enforcement implemented?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Hierarchical quotas are enforced by checking limits at each level of the hierarchy (organization → team → user) in order before allowing a request, using atomic counter increments at each node. A request is denied if any ancestor's counter would exceed its configured limit, and all increments below the denied node are rolled back atomically (or avoided by short-circuiting at the first violation).”
}
},
{
“@type”: “Question”,
“name”: “How does Redis atomic increment enable real-time quota checks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Redis's INCR command increments a counter and returns the new value in a single atomic operation, so the calling service can compare the returned value against the quota limit in O(1) time without a separate read-then-write race condition. For sliding window rate limits, a Lua script or the Redis cell module combines INCR with TTL management to atomically count events within a rolling time window.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between soft and hard quota limits?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A hard limit rejects requests the moment the counter reaches the configured maximum, immediately returning a 429 or quota-exceeded error with no grace. A soft limit allows usage to exceed the threshold by some percentage or for a short burst duration before enforcement kicks in, giving clients time to react to quota warnings (e.g., via a Retry-After header or a warning flag in the response) without immediately degrading their experience.”
}
},
{
“@type”: “Question”,
“name”: “How are quotas reset at the end of a billing period?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Quota counters are reset by a scheduled job (cron or workflow orchestrator) that sets the Redis key to 0 (or deletes it) at the billing period boundary, with the reset time anchored to the customer's account creation timestamp to avoid thundering-herd resets at midnight. For high-cardinality customer sets, resets are batched and processed in parallel, with a short window where both the old and new counters may coexist to avoid under-counting edge cases.”
}
}
]
}