What Is a Token Refresh Service?
A Token Refresh Service manages the lifecycle of short-lived access tokens and long-lived refresh tokens in an OAuth 2.0 / JWT-based authentication system. Access tokens expire quickly (minutes to hours); the refresh service silently issues new access tokens using a valid refresh token, keeping users authenticated without forcing re-login.
Data Model
Two tables track token families and refresh token state:
CREATE TABLE refresh_tokens (
token_id CHAR(64) PRIMARY KEY,
token_family CHAR(64) NOT NULL, -- groups rotated tokens together
user_id BIGINT NOT NULL,
client_id VARCHAR(128) NOT NULL,
scope TEXT,
issued_at TIMESTAMP NOT NULL DEFAULT NOW(),
expires_at TIMESTAMP NOT NULL,
used_at TIMESTAMP, -- NULL = not yet used
is_revoked BOOLEAN NOT NULL DEFAULT FALSE,
replaced_by CHAR(64) -- FK to next token in chain
);
CREATE INDEX idx_rt_user_id ON refresh_tokens(user_id);
CREATE INDEX idx_rt_family ON refresh_tokens(token_family);
CREATE INDEX idx_rt_expires_at ON refresh_tokens(expires_at);
Core Algorithm and Workflow
Initial Issuance
- After successful login, generate a new
token_family(random 32 bytes, hex-encoded). - Issue a refresh token (
token_id= 32 random bytes) tied to that family. Store in DB. - Issue a signed JWT access token with short expiry (e.g., 15 minutes). Return both to the client.
Token Refresh Flow
- Client sends expired access token + refresh token to
POST /token/refresh. - Look up refresh token by
token_id. Validate: not revoked, not expired,used_at IS NULL. - Mark current token as used (
used_at = NOW()). - Generate a new refresh token in the same
token_family; setreplaced_byon the old row. - Issue a new JWT access token. Return both new tokens to the client.
Refresh Token Rotation and Reuse Detection
If a refresh token arrives that is already used_at IS NOT NULL, this signals a possible replay attack. Immediately revoke all tokens in the same token_family by setting is_revoked = TRUE on every row where token_family = ?. Force the user to re-authenticate.
Security Considerations and Failure Handling
- Rotation: Always rotate refresh tokens on use. Single-use tokens reduce the window of abuse if a token is stolen.
- Family revocation: Reuse detection via token families is the primary defense against stolen refresh tokens.
- Storage on client: Store refresh tokens in
HttpOnlycookies, not localStorage. LocalStorage is XSS-accessible. - Scope binding: Refresh tokens should only issue access tokens for the scopes originally granted.
- Clock skew: Build a small grace period (30 seconds) into expiry checks to tolerate NTP drift between services.
- DB failure: Fail closed. If the token store is unavailable, reject the refresh request and return 503. Do not issue tokens without DB confirmation.
Scalability Considerations
- Read-heavy validation: Cache valid refresh token metadata in Redis. On use, invalidate the cache entry and write to Postgres atomically (use a DB transaction or optimistic locking).
- Atomic single-use enforcement: Use a DB-level unique constraint or Redis SET NX to guarantee a token can only be consumed once, even under concurrent requests.
- Partitioning: Partition
refresh_tokensbyissued_atmonth. Old partitions can be archived or dropped without a full-table operation. - Cleanup job: Run a periodic job to delete rows where
expires_at < NOW() - 30 days, keeping the table bounded.
Summary
The Token Refresh Service is the backbone of seamless session continuity in token-based auth. The critical design points are single-use refresh tokens with rotation, token family tracking for reuse detection, and atomic single-consumption enforcement at the storage layer. Keep access tokens short-lived and treat any refresh token reuse as a compromise signal requiring full family revocation.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How would you design a token refresh service that handles millions of clients?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A token refresh service needs to be stateless and horizontally scalable, validating incoming refresh tokens against a persistent store (e.g., Redis or a SQL database) that records issued refresh tokens and their revocation status. Each refresh rotates the refresh token to limit replay attack windows, and the new access token is signed with a short TTL. Rate limiting per user and per IP prevents abuse of the refresh endpoint.”
}
},
{
“@type”: “Question”,
“name”: “What is refresh token rotation and why is it important?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Refresh token rotation means issuing a new refresh token every time one is used, invalidating the old one immediately. This limits the window during which a stolen refresh token can be exploited, since any use of an already-rotated token signals a potential compromise and can trigger revocation of the entire token family. It is a core recommendation in the OAuth 2.0 Security Best Current Practice specification.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle refresh token revocation at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Revocation can be implemented by storing a revocation record in a fast lookup store like Redis and checking it on every refresh attempt, keeping the list small by only storing revoked-but-not-yet-expired tokens. Alternatively, token families can be tracked so revoking a family ID cascades to all descendant tokens without storing each individually. A background job prunes expired revocation records to keep the store lean.”
}
},
{
“@type”: “Question”,
“name”: “How would you ensure the token refresh service is resilient to outages?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The service should be deployed across multiple availability zones behind a load balancer, with the backing store replicated for high availability. Access tokens should have a long enough TTL (e.g., 15 minutes) that a brief refresh service outage does not immediately impact logged-in users. Circuit breakers and graceful degradation patterns can prevent cascading failures when the token store becomes slow or unavailable.”
}
}
]
}
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems
See also: Coinbase Interview Guide