What Is an Online Status Service?
An online status service exposes a queryable view of whether users are currently active, idle, or offline. Unlike a raw presence service that manages connection-level state, an online status service adds a semantic layer: it aggregates multi-device presence, applies user-defined visibility rules (e.g., appear offline), and serves status queries from other services such as friend lists, profile pages, and notification routing. It is a read-heavy, eventually consistent system where slight staleness is acceptable.
Data Model
Three storage tiers serve different access patterns.
Redis (hot read cache)
status:{user_id} HASH fields: status, last_active_ms, device_count
status:visible:{user_id} STRING value = 'true' | 'false', no TTL (user preference)
status:bulk:{shard} ZSET score = last_active_ms, member = user_id
SQL Schema
CREATE TABLE user_status (
user_id BIGINT PRIMARY KEY,
status ENUM('online','idle','offline') NOT NULL DEFAULT 'offline',
last_active_at TIMESTAMP NOT NULL,
device_count SMALLINT NOT NULL DEFAULT 0,
visible BOOLEAN NOT NULL DEFAULT TRUE,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
INDEX idx_last_active (last_active_at)
);
Core Algorithm: Aggregation and Idle Detection
A user may be logged in on multiple devices simultaneously. The status service must synthesize a single canonical status.
- Presence events arrive from the underlying presence service via Kafka topic
presence.updates. - Status aggregator reads the current
device_countfrom Redis. On asession_connectedevent it increments the counter; onsession_disconnectedit decrements. Status transitions toofflineonly whendevice_countreaches zero. - Idle detection: A background job scans the
status:bulk:{shard}sorted set for users whoselast_active_msscore is older than 5 minutes. It transitions those users toidleby updating their Redis hash and publishing astatus_changedevent. - Visibility filter: Before any status is returned to callers the service checks
status:visible:{user_id}. If the user has chosen to appear offline the API always returnsofflineregardless of true status.
Read Path: Bulk Status Queries
Friend lists, contact pages, and notification services query the status of many users in a single call. The read path is optimized for bulk access:
- Caller sends a list of up to 1,000 user IDs.
- Status service issues a Redis
HMGETpipeline across the relevantstatus:{user_id}hashes in a single round trip. - Cache misses (cold users) are fetched from SQL in a single
WHERE user_id IN (...)query and backfilled into Redis with a 60-second TTL. - Results are filtered through visibility rules before being returned.
Failure Handling
- Aggregator crash: Kafka consumer group re-balances and the aggregator replays from the last committed offset. Because status transitions are idempotent (setting the same status twice has no net effect), replay is safe.
- Redis unavailability: The service falls back to SQL for reads and queues writes to a local buffer (or a secondary Redis replica). A brief period of stale status data is acceptable.
- Idle scanner failure: Users remain in
onlinestate longer than intended. The impact is cosmetic; correctness is restored when the scanner recovers or the presence service eventually sends anofflineevent. - Split-brain multi-device: If two devices send conflicting events concurrently, a compare-and-swap on the
device_countfield with Redis WATCH/MULTI prevents counter corruption.
Scalability Considerations
The online status service is read-heavy: status is written once per session event but read thousands of times per second for friend lists, badge counts, and routing decisions.
- Read replicas: Route all bulk read queries to Redis read replicas. Replicate SQL status table to read replicas for fallback queries.
- Sharded aggregators: Partition the
presence.updatesKafka topic byuser_id mod N. Each aggregator shard owns a non-overlapping set of user IDs, eliminating cross-shard coordination. - CDN-cached public profiles: For public-facing status (e.g., creator profiles), cache the status response at the CDN edge with a 30-second TTL. This absorbs spikes without hitting the origin.
- Rate-limit polling clients: Mobile clients that poll status via REST rather than maintaining a WebSocket are throttled to one request per 30 seconds per user to prevent thundering-herd patterns.
Summary
An online status service sits above raw presence data and adds multi-device aggregation, idle detection, visibility controls, and a bulk-read-optimized query layer. Redis is the primary store for hot status data; SQL provides durability and fallback. Kafka decouples the write path (status aggregation) from the read path (query serving), allowing each to scale independently. The result is an eventually consistent, horizontally scalable service capable of serving millions of status queries per second with sub-10-millisecond p99 latency.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you design an online status system for a social platform?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An online status system typically relies on clients sending periodic heartbeats (e.g., every 30 seconds) to a presence server over a persistent connection such as WebSocket. The server stores status in a fast in-memory store like Redis with a TTL slightly longer than the heartbeat interval. If a heartbeat is missed, the TTL expires and the user is marked offline. Status changes are propagated to followers or friends via a pub/sub layer.”
}
},
{
“@type”: “Question”,
“name”: “How do you scale online status reads for users with millions of followers?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “For celebrity or high-fan-out users, direct push of status changes to all followers is impractical. Instead, systems use a pull-on-demand model where a follower’s client fetches status only when the user opens a conversation or profile. Caching status at the CDN or application layer with a short TTL (e.g., 60 seconds) further reduces read load. Fan-out on write is reserved for users with manageable follower counts.”
}
},
{
“@type”: “Question”,
“name”: “How do privacy controls affect online status system design?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Privacy controls add an authorization layer between status reads and the data store. When a user sets their status to hidden or restricts visibility to friends only, the system must check the requesting user’s relationship before returning real status. This is typically enforced at the API gateway or presence service layer using a cached friends/permissions graph. Some platforms return a fake ‘offline’ status rather than a permission-denied response to avoid leaking that the user is hiding their status.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between online status and last-seen timestamps?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Online status is a binary or categorical real-time signal (online, offline, idle) updated continuously via heartbeats. Last-seen is a historical timestamp recorded when the user was last active or disconnected, and it persists even when the user is offline. Systems often store both: online status in Redis with a TTL for fast expiry, and last-seen as a durable write to a relational or NoSQL database updated on each disconnection event.”
}
}
]
}
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Snap Interview Guide