Quota Management Service: Low-Level Design
A quota management service defines resource limits per customer or user, tracks real-time usage against those limits, and enforces them at request time. The key engineering challenges are: making quota checks fast (in the hot request path), supporting hierarchical limits (org → team → user), and handling period resets cleanly.
Quota Types
- API call limits — requests per minute, per day, per month
- Storage quota — total bytes stored (files, database rows, media)
- Compute quota — CPU seconds, build minutes, GPU hours
- Seat limits — maximum number of active users per organization
- Feature-specific limits — number of projects, number of integrations, number of alerts
Hierarchical Quotas
Quotas are applied at multiple levels in a hierarchy:
Organization (10,000 API calls/day)
└── Team A (3,000 API calls/day)
└── User 1 (1,000 API calls/day)
└── User 2 (1,000 API calls/day)
└── Team B (5,000 API calls/day)
Invariant: a child's limit cannot exceed its parent's limit. A request by User 1 consumes quota at three levels: user, team, and organization. All three must have remaining quota for the request to succeed. If any level is exhausted, the request is blocked.
Quota Definition Schema
quotas(
quota_id UUID PRIMARY KEY,
entity_type ENUM('org','team','user'),
entity_id UUID NOT NULL,
resource_type VARCHAR NOT NULL, -- 'api_calls', 'storage_bytes', etc.
limit_value BIGINT NOT NULL,
period ENUM('per_minute','per_day','per_month','unlimited'),
overage_policy ENUM('block','warn','charge'),
created_at TIMESTAMP,
expires_at TIMESTAMP -- for temporary quota increases
)
Usage Tracking with Redis
Real-time usage counters live in Redis for O(1) check and increment in the hot request path:
Key pattern: quota:{entity_type}:{entity_id}:{resource_type}:{period_bucket}
Example: quota:user:usr_123:api_calls:2025-04-17
INCR quota:user:usr_123:api_calls:2025-04-17
-- Returns new count; compare against limit
The period_bucket encodes the current period: daily quotas use YYYY-MM-DD, monthly use YYYY-MM, per-minute use YYYY-MM-DDTHH:mm. Redis key TTL is set to 2x the period duration to handle clock skew at period boundaries.
Atomic Check-and-Increment
The check and increment must be atomic to prevent race conditions. A Lua script executes both operations in a single Redis transaction:
local current = redis.call('INCR', KEYS[1])
redis.call('EXPIRE', KEYS[1], ARGV[1]) -- set TTL on first write
if current > tonumber(ARGV[2]) then
redis.call('DECR', KEYS[1]) -- roll back increment
return -1 -- signal: over limit
end
return current
Return value -1 means the request is over quota; any positive value means the increment succeeded and the new count is returned.
Period Reset
Quota counters reset at the end of each billing period:
- Daily quotas — key includes date; a new key is automatically created the next day. Old keys expire via TTL.
- Monthly quotas — key includes year-month; resets on the first of each month (or billing anniversary date)
- Per-minute quotas — key includes minute; automatically expires after 2 minutes
No scheduled job is needed for most cases — the key expiry handles reset. For billing anniversary dates that differ per customer, a scheduled job resets the customer's Redis key at midnight of their anniversary.
Soft vs Hard Enforcement
- Hard limit — request is rejected with HTTP 429 (rate limit) or 403 (quota exceeded) when the limit is reached. No exceptions.
- Soft limit — requests are allowed to exceed the limit; warnings are sent at 80%, 90%, and 100% of the quota. Used for paid tiers where overage is billed rather than blocked.
- Warn policy — same as soft, but no billing; purely informational. Used for storage quotas on free tiers where abrupt blocking would cause data loss.
The overage_policy field in the quota definition controls which enforcement mode applies.
Quota Check in the Request Path
Middleware intercepts every API request and runs the quota check before handing off to the handler:
- Extract entity identifiers from the authenticated request (user_id, team_id, org_id)
- Look up applicable quotas from the quota cache (Redis or in-memory, refreshed every 60 seconds)
- Run atomic check-and-increment Lua script for each level of the hierarchy
- If any level returns -1, return 429 with
Retry-Afterheader andX-RateLimit-Limit/X-RateLimit-Remainingheaders - On success, proceed to the handler
Quota Override Workflow
Administrators can grant temporary quota increases:
- Create a new row in the
quotastable with a higher limit and anexpires_attimestamp - The override row takes precedence over the base quota (highest limit wins, or most specific entity wins)
- When the override expires, the base quota automatically applies again
- Override grants are logged in an admin audit trail
Quota Aggregation for Reporting and Billing
Redis counters are the source of truth for real-time enforcement, but they are not durable for billing. A periodic job (every 15 minutes) snapshots Redis counters to a relational table:
usage_snapshots(entity_id, resource_type, period, usage_count, snapshotted_at)
The billing system reads from this table to compute invoices. For overage billing, usage above the plan limit is summed at month end and priced per unit.
Inherited Quotas
If no explicit quota is defined for a child entity, it inherits the parent's limit. The lookup order: user-specific quota → team quota → org quota → plan default quota. The first match wins. This allows setting org-wide defaults without having to define quotas for every user individually.
Summary
The quota management service combines hierarchical quota definitions, atomic Redis check-and-increment for O(1) enforcement in the request path, period-based key expiry for resets, soft vs hard enforcement policies, temporary override workflows, and periodic snapshot aggregation for billing — providing flexible, scalable resource governance across all customer tiers.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is hierarchical quota enforcement implemented?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Hierarchical quotas are enforced by checking limits at each level of the hierarchy (organization → team → user) in order before allowing a request, using atomic counter increments at each node. A request is denied if any ancestor's counter would exceed its configured limit, and all increments below the denied node are rolled back atomically (or avoided by short-circuiting at the first violation).”
}
},
{
“@type”: “Question”,
“name”: “How does Redis atomic increment enable real-time quota checks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Redis's INCR command increments a counter and returns the new value in a single atomic operation, so the calling service can compare the returned value against the quota limit in O(1) time without a separate read-then-write race condition. For sliding window rate limits, a Lua script or the Redis cell module combines INCR with TTL management to atomically count events within a rolling time window.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between soft and hard quota limits?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A hard limit rejects requests the moment the counter reaches the configured maximum, immediately returning a 429 or quota-exceeded error with no grace. A soft limit allows usage to exceed the threshold by some percentage or for a short burst duration before enforcement kicks in, giving clients time to react to quota warnings (e.g., via a Retry-After header or a warning flag in the response) without immediately degrading their experience.”
}
},
{
“@type”: “Question”,
“name”: “How are quotas reset at the end of a billing period?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Quota counters are reset by a scheduled job (cron or workflow orchestrator) that sets the Redis key to 0 (or deletes it) at the billing period boundary, with the reset time anchored to the customer's account creation timestamp to avoid thundering-herd resets at midnight. For high-cardinality customer sets, resets are batched and processed in parallel, with a short window where both the old and new counters may coexist to avoid under-counting edge cases.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems