Problem Scope and Requirements
A Session Manager maintains authenticated user sessions across stateless HTTP services. The core challenges are fast session lookup, preventing session fixation and hijacking, implementing sliding expiry correctly, and enforcing concurrent session limits across a distributed fleet.
Functional Requirements
- Create a session on successful authentication; return an opaque session token to the client.
- Validate session tokens on each request; return session metadata (user ID, roles, device info).
- Implement sliding window expiry: each validated request extends the session lifetime.
- Enforce concurrent session limits per user (e.g., max 5 active sessions).
- Support explicit logout (immediate invalidation) and device-specific logout.
- Capture device fingerprint and flag anomalous access (new device type or geography).
Non-Functional Requirements
- Session validation under 1 ms P99 (it runs on every authenticated request).
- Sessions survive a single Redis node failure (replication required).
- Support 50 million active sessions with 10 million validations per minute.
Core Data Model
Session (stored in Redis)
Key: "sess:{token_id}"
Value: {
session_id: string // == token_id (UUID v4, 128-bit random)
user_id: string
device_id: string // stable device fingerprint hash
ip_address: string
user_agent: string
roles: []string
created_at: int64
last_active_at: int64
absolute_expiry: int64 // hard max, not extended by activity
metadata: map[string]string
}
TTL: sliding_window (e.g., 30 minutes, reset on each access)
User Session Index (for concurrent limit enforcement)
Key: "user_sessions:{user_id}"
Type: Redis Sorted Set
Score: last_active_at (Unix timestamp)
Member: session_id
TTL: absolute_expiry of longest-lived session
Device Fingerprint
DeviceFingerprint {
raw_inputs: {
user_agent: string
accept_language: string
screen_resolution: string // from JS or client-side SDK
timezone: string
}
fingerprint: string // SHA-256(canonical(raw_inputs))
trust_score: float // 0.0 to 1.0, decays on anomalies
}
Session Token Design
The session token must be opaque (not a JWT — the server controls validity), unpredictable, and self-identifying to allow routing. A good format: {version:1byte}{session_id:16bytes} base64url-encoded to 23 characters. The version byte allows future token format upgrades without invalidating existing sessions.
Never use sequential IDs or UUIDs derived from timestamps alone — use crypto/rand for the full 128 bits. Store tokens as their SHA-256 hash in Redis to prevent token leakage from a Redis dump from being directly usable.
Sliding Expiry Implementation
Sliding expiry means the TTL resets to the full window on each valid access. In Redis this is a single EXPIRE call after a successful GET. However, naive implementation has a race: two concurrent requests both read the session, then both call EXPIRE — this is safe (idempotent) but introduces a subtle bug: if the session expired between the GET and the EXPIRE, the EXPIRE silently fails and the next request sees a missing key. Solve this with a Lua script that atomically gets, validates, updates last_active_at, and resets TTL:
-- Lua
local data = redis.call("GET", KEYS[1])
if not data then return nil end
local sess = cjson.decode(data)
sess.last_active_at = tonumber(ARGV[1])
redis.call("SET", KEYS[1], cjson.encode(sess), "EX", tonumber(ARGV[2]))
return cjson.encode(sess)
Absolute expiry (e.g., 24 hours from creation regardless of activity) is enforced by checking absolute_expiry in the Lua script and returning nil if exceeded, preventing a session from being kept alive indefinitely by continuous activity.
Concurrent Session Limits
When a new session is created, add its session_id to the user's sorted set with score = created_at. Check the cardinality. If it exceeds the limit (e.g., 5), evict the oldest sessions by score (lowest last_active_at). This entire operation runs in a Lua script for atomicity:
- ZADD the new session_id.
- ZCARD to get count.
- If count > limit: ZRANGE by score ascending to get the excess oldest session IDs.
- DEL each excess session key and ZREM from the index.
Evicted sessions are optionally written to a notifications queue so the affected user can be informed (“You were logged out on another device”).
Device Fingerprinting and Anomaly Detection
On each session validation, compare the request's device fingerprint against the fingerprint stored in the session. A mismatch (different browser or OS) does not immediately invalidate the session but decrements the trust_score and may trigger a step-up authentication challenge (re-enter password, or OTP). Geographic anomaly detection: if the request IP resolves to a country different from the session's origin and the time delta is too short for travel, flag the session for review.
API Design
POST /sessions — create session (called post-auth with user_id, device_info)
GET /sessions/{token} — validate token; returns session data or 401
DELETE /sessions/{token} — logout (invalidate this session)
DELETE /sessions?user_id={uid} — logout all sessions for a user
GET /sessions?user_id={uid} — list active sessions (for "manage devices" UI)
PATCH /sessions/{token} — update metadata (e.g., roles after permission change)
Scalability Considerations
- Redis cluster: Shard by user_id hash. Keep session key and user_sessions index on the same shard by using hash tags:
"sess:{user_id}:{session_id}"and"user_sessions:{user_id}"both hash on user_id. This allows the concurrent-limit Lua script to run on a single shard without cross-slot operations. - Read-through cache: For services that validate sessions on every request, run a local in-process cache of recently seen valid tokens with a 5-second TTL. This absorbs repeated validations within a single request burst without hitting Redis, at the cost of up to 5 seconds of stale session data after logout.
- Session store alternatives: For extremely high throughput, consider a purpose-built store like Dragonfly (Redis-compatible, multithreaded) or a custom session table in Cassandra for geographic distribution, trading some Lua scripting capability for better write scalability.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering