Authentication is one of the most common low-level design questions in system design interviews, and one of the most commonly implemented incorrectly in production. This guide covers the full stack: password storage, token lifecycle, session management, OAuth2, and MFA, with schema and pseudocode for each component.
Password Storage
Never store plaintext passwords. Never store MD5 or SHA256 hashes – they are fast, which makes them vulnerable to brute force and rainbow table attacks. Use a slow, purpose-built password hashing function.
bcrypt
bcrypt is the industry standard. It incorporates a salt automatically and has a tunable cost factor that increases computation time.
import bcrypt
# Hashing at registration
def hash_password(plaintext: str) -> bytes:
cost = 12 # 2^12 = 4096 rounds, roughly 250ms on modern hardware
salt = bcrypt.gensalt(rounds=cost)
return bcrypt.hashpw(plaintext.encode(), salt)
# Verification at login
def verify_password(plaintext: str, hashed: bytes) -> bool:
return bcrypt.checkpw(plaintext.encode(), hashed)
Cost factor 12 is a reasonable baseline in 2024. As hardware gets faster, increase it to keep verification time around 100-300ms. bcrypt stores the salt inside the hash string, so you only store one field.
Argon2 is the modern alternative, winner of the Password Hashing Competition (2015). It is memory-hard, making GPU attacks more expensive. Use argon2-cffi in Python. Prefer Argon2id which resists both side-channel and GPU attacks.
JWT Access Tokens
JSON Web Tokens (JWTs) allow stateless authentication. The server signs a token at login; subsequent requests carry the token, and the server verifies the signature without a database lookup.
Structure
A JWT is three base64url-encoded parts joined by dots: header.payload.signature.
// Header
{"alg": "HS256", "typ": "JWT"}
// Payload (claims)
{
"sub": "user_id_123",
"email": "user@example.com",
"roles": ["user"],
"iat": 1700000000,
"exp": 1700000900 // 15 minutes from iat
}
// Signature
HMAC-SHA256(base64url(header) + "." + base64url(payload), secret_key)
Key rules:
- Keep TTL short – 15 minutes is standard. Short TTL limits exposure if a token is stolen.
- Do NOT store sensitive data in the payload – it is only base64-encoded, not encrypted.
- Use RS256 (asymmetric) in multi-service architectures so services can verify without knowing the signing secret.
- Validate
exp,iat, andisson every request.
Refresh Token Rotation
Short-lived access tokens require a mechanism to get new ones without re-login. Refresh tokens are long-lived credentials stored server-side that can issue new access tokens.
Flow
Login:
-> server issues access_token (15 min) + refresh_token (30 days)
-> refresh_token stored in DB (hashed), sent to client as HttpOnly cookie
Access token expired:
-> client sends refresh_token
-> server validates: hash(token) exists in DB, not revoked, not expired
-> server issues new access_token + new refresh_token (rotation)
-> old refresh_token is invalidated in DB
Reuse detection:
-> if a refresh_token is used that was already rotated (appears used),
the entire token FAMILY is invalidated (all sessions for that user/device)
-> this detects token theft: if attacker replays a stolen old token,
the legitimate user's next request triggers family invalidation
Session Table Schema
CREATE TABLE sessions (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL REFERENCES users(id),
refresh_token_hash VARCHAR(64) NOT NULL UNIQUE, -- SHA256 of token
family_id UUID NOT NULL, -- for reuse detection
device_info VARCHAR(255), -- user-agent, device type
ip_address VARCHAR(45),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
last_used_at TIMESTAMP NOT NULL DEFAULT NOW(),
expires_at TIMESTAMP NOT NULL,
revoked BOOLEAN NOT NULL DEFAULT FALSE,
revoked_at TIMESTAMP
);
CREATE INDEX idx_sessions_user_id ON sessions(user_id);
CREATE INDEX idx_sessions_family ON sessions(family_id);
Store only a hash of the refresh token (SHA256 is fine here – you are not defending against brute force, just storing a reference). The raw token goes to the client; the server never stores it.
OAuth2 Authorization Code Flow
OAuth2 is used when your app allows users to log in via a third-party identity provider (Google, GitHub, etc.). The Authorization Code flow is the secure choice for server-side applications.
Step 1: Redirect to provider
GET https://accounts.google.com/o/oauth2/auth
?client_id=YOUR_CLIENT_ID
&redirect_uri=https://yourapp.com/callback
&response_type=code
&scope=openid email profile
&state=RANDOM_CSRF_TOKEN // store in session, verify on return
&code_challenge=CODE_VERIFIER_HASH // PKCE
&code_challenge_method=S256
Step 2: User authenticates at provider, consents to scopes
Step 3: Provider redirects back
GET https://yourapp.com/callback?code=AUTH_CODE&state=RANDOM_CSRF_TOKEN
Step 4: Verify state, exchange code for tokens
POST https://accounts.google.com/o/oauth2/token
code=AUTH_CODE
client_id=YOUR_CLIENT_ID
client_secret=YOUR_CLIENT_SECRET
redirect_uri=https://yourapp.com/callback
grant_type=authorization_code
code_verifier=ORIGINAL_CODE_VERIFIER // PKCE
Step 5: Provider returns access_token, id_token (JWT), refresh_token
Step 6: Extract user info from id_token or call userinfo endpoint,
create/update local user record, issue your own session
PKCE (Proof Key for Code Exchange) prevents authorization code interception attacks. Always use it even for server-side apps.
Multi-Factor Authentication (TOTP)
Time-based One-Time Passwords (TOTP) use a shared secret and the current time to generate 6-digit codes that change every 30 seconds.
Algorithm
import hmac, hashlib, base64, time, struct
def totp(secret_base32: str, digits: int = 6, interval: int = 30) -> str:
key = base64.b32decode(secret_base32.upper())
# T = number of 30-second intervals since Unix epoch
T = int(time.time()) // interval
msg = struct.pack('>Q', T) # 8-byte big-endian
h = hmac.new(key, msg, hashlib.sha1).digest()
# Dynamic truncation
offset = h[-1] & 0x0F
code = struct.unpack('>I', h[offset:offset+4])[0] & 0x7FFFFFFF
return str(code % (10 ** digits)).zfill(digits)
The server generates the secret, displays it as a QR code (URI format), and the user scans it with an authenticator app. At verification, the server computes TOTP for T-1, T, and T+1 to allow for clock skew.
Backup Codes
Generate 10 random backup codes at MFA setup. Store them hashed with bcrypt (same as passwords). Each code is single-use – mark it as consumed after use.
MFA Config Schema
CREATE TABLE mfa_configs (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL UNIQUE REFERENCES users(id),
type ENUM('totp', 'sms') NOT NULL DEFAULT 'totp',
secret VARCHAR(64) NOT NULL, -- encrypted at rest
verified BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE TABLE mfa_backup_codes (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL REFERENCES users(id),
code_hash VARCHAR(60) NOT NULL, -- bcrypt hash
used BOOLEAN NOT NULL DEFAULT FALSE,
used_at TIMESTAMP
);
Rate Limiting Login Attempts
Use Redis for fast, atomic counters. Key by both IP and username to prevent distributed attacks.
import redis
import time
r = redis.Redis()
def check_rate_limit(ip: str, username: str) -> tuple[bool, int]:
"""Returns (allowed, wait_seconds)"""
ip_key = f"login_fail:ip:{ip}"
user_key = f"login_fail:user:{username}"
ip_fails = int(r.get(ip_key) or 0)
user_fails = int(r.get(user_key) or 0)
fails = max(ip_fails, user_fails)
if fails >= 10:
ttl = r.ttl(user_key)
return False, max(ttl, 0)
return True, 0
def record_failure(ip: str, username: str):
ip_key = f"login_fail:ip:{ip}"
user_key = f"login_fail:user:{username}"
# Exponential backoff: 2^fails seconds, capped at 1 hour
fails = int(r.incr(ip_key))
r.incr(user_key)
expiry = min(2 ** fails, 3600)
r.expire(ip_key, expiry)
r.expire(user_key, expiry)
def record_success(ip: str, username: str):
r.delete(f"login_fail:ip:{ip}")
r.delete(f"login_fail:user:{username}")
Full Database Schema
CREATE TABLE users (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(255) NOT NULL UNIQUE,
password_hash VARCHAR(60), -- NULL for OAuth-only accounts
email_verified BOOLEAN NOT NULL DEFAULT FALSE,
mfa_enabled BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW() ON UPDATE NOW(),
deleted_at TIMESTAMP -- soft delete
);
CREATE TABLE oauth_providers (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL REFERENCES users(id),
provider VARCHAR(32) NOT NULL, -- 'google', 'github', etc.
provider_user_id VARCHAR(255) NOT NULL,
access_token TEXT, -- encrypted
refresh_token TEXT, -- encrypted
expires_at TIMESTAMP,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
UNIQUE (provider, provider_user_id)
);
Token Revocation at Scale
The fundamental tension: JWTs are stateless (no DB lookup) but cannot be revoked before expiry.
Strategy 1 – Short TTL + refresh revocation (recommended):
Keep access token TTL at 15 minutes. When a user logs out or a compromise is detected, revoke the refresh token in the sessions table. The access token remains valid for up to 15 minutes, which is an acceptable tradeoff for most applications.
Strategy 2 – Token blocklist in Redis:
On logout, add the JWT’s jti (unique token ID) to a Redis set with the same TTL as the token. Every request checks the blocklist. This is exact revocation but adds a Redis lookup to every request.
Strategy 3 – Opaque tokens:
Use random strings as tokens. Every request hits the database to look up the session. Simpler logic, perfect revocation, but adds DB latency to every authenticated request. Acceptable for lower-scale applications.
For most production systems: Strategy 1 for access tokens + database revocation for refresh tokens is the right balance.
Security Best Practices
- HTTPS only: Set HSTS headers. Never transmit tokens over HTTP.
- HttpOnly + SameSite=Strict cookies: Store refresh tokens in HttpOnly cookies to prevent XSS access. SameSite=Strict prevents CSRF.
- Separate token storage: Access token in memory (JavaScript variable), refresh token in HttpOnly cookie. This limits XSS to 15-minute access token exposure.
- PKCE for OAuth: Always use PKCE even for confidential clients. It prevents code interception in proxies.
- Rotate signing secrets: Support multiple valid signing keys with a key ID (kid) in the JWT header to allow seamless rotation.
- Log authentication events: Login success/failure, MFA events, and token issuance. Ship to a SIEM. Rate-limit log volume per user.
- Account lockout vs. progressive delays: Hard lockout enables denial-of-service against users. Progressive delays (exponential backoff) are usually preferable.
Stripe system design interviews cover authentication and token management. See design patterns for Stripe interview: authentication and token system design.
Coinbase system design covers security-critical authentication systems. See patterns for Coinbase interview: authentication and security system design.
Shopify system design covers OAuth and multi-tenant authentication. See design patterns for Shopify interview: OAuth and authentication system design.