API key management handles the full lifecycle of programmatic access credentials: creation, scoped authorization, usage tracking, rotation, and revocation. API keys authenticate machine-to-machine requests where interactive OAuth flows are impractical. The design challenges are storing keys securely (you should not be able to recover a key after creation), enforcing per-key rate limits and scope restrictions, and supporting rotation without downtime.
Core Data Model
CREATE TABLE ApiKey (
key_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id BIGINT NOT NULL REFERENCES User(id),
name VARCHAR(100) NOT NULL, -- human label: "production key", "CI/CD"
key_prefix VARCHAR(8) NOT NULL, -- first 8 chars, shown in UI
key_hash VARCHAR(64) NOT NULL UNIQUE, -- SHA-256 of full key
scopes TEXT[] NOT NULL DEFAULT '{}', -- ['read:users', 'write:orders']
rate_limit_rpm INT NOT NULL DEFAULT 1000, -- requests per minute
expires_at TIMESTAMPTZ, -- NULL = no expiry
last_used_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
revoked_at TIMESTAMPTZ,
revoke_reason TEXT
);
CREATE INDEX idx_apikey_user ON ApiKey(user_id) WHERE revoked_at IS NULL;
CREATE INDEX idx_apikey_hash ON ApiKey(key_hash) WHERE revoked_at IS NULL;
-- Usage log for billing and audit
CREATE TABLE ApiKeyUsage (
id BIGSERIAL PRIMARY KEY,
key_id UUID NOT NULL,
endpoint VARCHAR(255) NOT NULL,
status_code INT NOT NULL,
latency_ms INT,
ip_address INET,
occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (occurred_at);
Key Generation and Storage
import secrets
import hashlib
def create_api_key(user_id: int, name: str, scopes: list[str],
rate_limit_rpm: int = 1000) -> dict:
"""
Returns the full key ONCE — never stored in plaintext.
After this call, only the hash and prefix are retained.
"""
# Format: prefix_randompart e.g., "sk_live_aB3kR9xZmQpLnT7vWq2Y"
prefix = "sk_live"
random_part = secrets.token_urlsafe(32) # 256 bits of entropy
full_key = f"{prefix}_{random_part}"
key_prefix = full_key[:8] # show in UI to identify which key
key_hash = hashlib.sha256(full_key.encode()).hexdigest()
key_id = db.fetchone("""
INSERT INTO ApiKey (user_id, name, key_prefix, key_hash, scopes, rate_limit_rpm)
VALUES (%s, %s, %s, %s, %s, %s)
RETURNING key_id
""", [user_id, name, key_prefix, key_hash, scopes, rate_limit_rpm])['key_id']
return {
'key_id': str(key_id),
'key': full_key, # shown to user ONCE, never again
'prefix': key_prefix,
'message': 'Store this key securely. It will not be shown again.'
}
Request Authentication and Rate Limiting
def authenticate_request(api_key_raw: str, required_scope: str,
endpoint: str, ip: str) -> dict:
key_hash = hashlib.sha256(api_key_raw.encode()).hexdigest()
# Cache lookups: most keys are valid and hot
cache_key = f"apikey:{key_hash}"
cached = redis.get(cache_key)
if cached:
key_data = json.loads(cached)
else:
key_data = db.fetchone("""
SELECT key_id, user_id, scopes, rate_limit_rpm, expires_at
FROM ApiKey
WHERE key_hash = %s AND revoked_at IS NULL
""", [key_hash])
if not key_data:
raise UnauthorizedError("Invalid API key")
if key_data['expires_at'] and datetime.utcnow() > key_data['expires_at']:
raise UnauthorizedError("API key expired")
# Cache for 5 minutes — revocation takes up to 5 min to propagate
redis.setex(cache_key, 300, json.dumps(key_data, default=str))
# Check scope
if required_scope not in (key_data['scopes'] or []):
raise ForbiddenError(f"Key lacks required scope: {required_scope}")
# Per-key rate limiting using sliding window in Redis
rate_key = f"rate:{key_data['key_id']}:{int(time.time()) // 60}"
count = redis.incr(rate_key)
redis.expire(rate_key, 120) # 2-minute window
if count > key_data['rate_limit_rpm']:
raise RateLimitError(f"Rate limit exceeded: {key_data['rate_limit_rpm']} RPM")
# Async usage logging (non-blocking)
log_usage_async(key_data['key_id'], endpoint, ip)
return key_data
def log_usage_async(key_id: str, endpoint: str, ip: str):
"""Fire-and-forget usage log update."""
# Update last_used_at in DB with write coalescing:
# Don't write if last_used_at was updated in the last 60 seconds
redis.set(f"last_used:{key_id}", int(time.time()), ex=60, nx=True)
usage_queue.enqueue('flush_key_usage', key_id=key_id, endpoint=endpoint)
Key Rotation Without Downtime
def rotate_api_key(key_id: str, user_id: int) -> dict:
"""
Issue a new key while keeping the old one valid for a grace period.
Allows callers to update their key without downtime.
"""
old_key = db.fetchone("""
SELECT * FROM ApiKey WHERE key_id=%s AND user_id=%s AND revoked_at IS NULL
""", [key_id, user_id])
if not old_key:
raise NotFoundError("Key not found")
# Issue new key with same config
new_key_result = create_api_key(
user_id=user_id,
name=f"{old_key['name']} (rotated)",
scopes=old_key['scopes'],
rate_limit_rpm=old_key['rate_limit_rpm']
)
# Schedule old key revocation after grace period (24 hours)
revoke_at = datetime.utcnow() + timedelta(hours=24)
db.execute("""
UPDATE ApiKey SET revoked_at=%s, revoke_reason='rotated'
WHERE key_id=%s
""", [revoke_at, key_id]) # soft-revoke with future timestamp
return {
'new_key': new_key_result['key'],
'old_key_revokes_at': revoke_at.isoformat(),
'message': 'Update your integration within 24 hours'
}
Key Interview Points
- Never store raw API keys — store SHA-256 hash. If the database is breached, attackers get only hashes, which cannot be used to authenticate. The raw key is shown once at creation.
- The key_prefix field (first 8 chars) lets users identify which key made a request in logs without exposing the full key — safe to display in UI and audit logs.
- Cache key lookups in Redis with a short TTL (5 minutes) — authentication happens on every request; a DB query per request is too expensive at scale. Revocation takes up to one TTL to propagate.
- Per-key rate limiting isolates misbehaving callers — one runaway integration cannot impact other users’ rate limits.
- Rotation with a grace period allows zero-downtime key updates — callers have 24 hours to switch to the new key before the old one is revoked.
- Scopes limit blast radius: a key with only
read:reportsscope cannot write data even if stolen. Always require callers to request the minimum scope needed.
API key management and developer platform design is discussed in Stripe system design interview questions.
API key management and authentication system design is covered in Google system design interview preparation.
API key management and developer access design is discussed in Atlassian system design interview guide.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering