Session Manager Low-Level Design: Token Storage, Sliding Expiry, and Concurrent Session Limits

Problem Scope and Requirements

A Session Manager maintains authenticated user sessions across stateless HTTP services. The core challenges are fast session lookup, preventing session fixation and hijacking, implementing sliding expiry correctly, and enforcing concurrent session limits across a distributed fleet.

Functional Requirements

  • Create a session on successful authentication; return an opaque session token to the client.
  • Validate session tokens on each request; return session metadata (user ID, roles, device info).
  • Implement sliding window expiry: each validated request extends the session lifetime.
  • Enforce concurrent session limits per user (e.g., max 5 active sessions).
  • Support explicit logout (immediate invalidation) and device-specific logout.
  • Capture device fingerprint and flag anomalous access (new device type or geography).

Non-Functional Requirements

  • Session validation under 1 ms P99 (it runs on every authenticated request).
  • Sessions survive a single Redis node failure (replication required).
  • Support 50 million active sessions with 10 million validations per minute.

Core Data Model

Session (stored in Redis)

Key:   "sess:{token_id}"
Value: {
    session_id:     string    // == token_id (UUID v4, 128-bit random)
    user_id:        string
    device_id:      string    // stable device fingerprint hash
    ip_address:     string
    user_agent:     string
    roles:          []string
    created_at:     int64
    last_active_at: int64
    absolute_expiry: int64    // hard max, not extended by activity
    metadata:       map[string]string
}
TTL: sliding_window (e.g., 30 minutes, reset on each access)

User Session Index (for concurrent limit enforcement)

Key:   "user_sessions:{user_id}"
Type:  Redis Sorted Set
Score: last_active_at (Unix timestamp)
Member: session_id
TTL:   absolute_expiry of longest-lived session

Device Fingerprint

DeviceFingerprint {
    raw_inputs: {
        user_agent:      string
        accept_language: string
        screen_resolution: string   // from JS or client-side SDK
        timezone:        string
    }
    fingerprint: string    // SHA-256(canonical(raw_inputs))
    trust_score: float     // 0.0 to 1.0, decays on anomalies
}

Session Token Design

The session token must be opaque (not a JWT — the server controls validity), unpredictable, and self-identifying to allow routing. A good format: {version:1byte}{session_id:16bytes} base64url-encoded to 23 characters. The version byte allows future token format upgrades without invalidating existing sessions.

Never use sequential IDs or UUIDs derived from timestamps alone — use crypto/rand for the full 128 bits. Store tokens as their SHA-256 hash in Redis to prevent token leakage from a Redis dump from being directly usable.

Sliding Expiry Implementation

Sliding expiry means the TTL resets to the full window on each valid access. In Redis this is a single EXPIRE call after a successful GET. However, naive implementation has a race: two concurrent requests both read the session, then both call EXPIRE — this is safe (idempotent) but introduces a subtle bug: if the session expired between the GET and the EXPIRE, the EXPIRE silently fails and the next request sees a missing key. Solve this with a Lua script that atomically gets, validates, updates last_active_at, and resets TTL:

-- Lua
local data = redis.call("GET", KEYS[1])
if not data then return nil end
local sess = cjson.decode(data)
sess.last_active_at = tonumber(ARGV[1])
redis.call("SET", KEYS[1], cjson.encode(sess), "EX", tonumber(ARGV[2]))
return cjson.encode(sess)

Absolute expiry (e.g., 24 hours from creation regardless of activity) is enforced by checking absolute_expiry in the Lua script and returning nil if exceeded, preventing a session from being kept alive indefinitely by continuous activity.

Concurrent Session Limits

When a new session is created, add its session_id to the user's sorted set with score = created_at. Check the cardinality. If it exceeds the limit (e.g., 5), evict the oldest sessions by score (lowest last_active_at). This entire operation runs in a Lua script for atomicity:

  1. ZADD the new session_id.
  2. ZCARD to get count.
  3. If count > limit: ZRANGE by score ascending to get the excess oldest session IDs.
  4. DEL each excess session key and ZREM from the index.

Evicted sessions are optionally written to a notifications queue so the affected user can be informed (“You were logged out on another device”).

Device Fingerprinting and Anomaly Detection

On each session validation, compare the request's device fingerprint against the fingerprint stored in the session. A mismatch (different browser or OS) does not immediately invalidate the session but decrements the trust_score and may trigger a step-up authentication challenge (re-enter password, or OTP). Geographic anomaly detection: if the request IP resolves to a country different from the session's origin and the time delta is too short for travel, flag the session for review.

API Design

POST   /sessions                — create session (called post-auth with user_id, device_info)
GET    /sessions/{token}        — validate token; returns session data or 401
DELETE /sessions/{token}        — logout (invalidate this session)
DELETE /sessions?user_id={uid}  — logout all sessions for a user
GET    /sessions?user_id={uid}  — list active sessions (for "manage devices" UI)
PATCH  /sessions/{token}        — update metadata (e.g., roles after permission change)

Scalability Considerations

  • Redis cluster: Shard by user_id hash. Keep session key and user_sessions index on the same shard by using hash tags: "sess:{user_id}:{session_id}" and "user_sessions:{user_id}" both hash on user_id. This allows the concurrent-limit Lua script to run on a single shard without cross-slot operations.
  • Read-through cache: For services that validate sessions on every request, run a local in-process cache of recently seen valid tokens with a 5-second TTL. This absorbs repeated validations within a single request burst without hitting Redis, at the cost of up to 5 seconds of stale session data after logout.
  • Session store alternatives: For extremely high throughput, consider a purpose-built store like Dragonfly (Redis-compatible, multithreaded) or a custom session table in Cassandra for geographic distribution, trading some Lua scripting capability for better write scalability.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

Scroll to Top