A user presence system tracks whether users are online, offline, or idle — powering the green dots in Slack, WhatsApp, and Google Docs. Core challenges: detecting disconnects reliably (TCP connections can silently die), scaling to millions of concurrent connections, propagating status changes to subscribers without overwhelming the message bus, and respecting user privacy settings.
Core Data Model
-- Presence state (hot path: Redis)
-- Key: presence:{user_id}
-- Value: {"status":"online","last_seen":1713340800,"device":"web"}
-- TTL: 30 seconds (refreshed by heartbeat every 15s; expires = offline)
-- Presence subscription (who wants to know about whom)
CREATE TABLE PresenceSubscription (
subscriber_id UUID NOT NULL,
target_id UUID NOT NULL,
subscribed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (subscriber_id, target_id)
);
CREATE INDEX idx_presence_target ON PresenceSubscription (target_id);
-- Persistent last-seen (cold path: Postgres, updated on disconnect)
CREATE TABLE UserLastSeen (
user_id UUID PRIMARY KEY,
last_seen TIMESTAMPTZ NOT NULL,
status TEXT NOT NULL DEFAULT 'offline', -- 'online','away','offline'
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
WebSocket Heartbeat and TTL-Based Offline Detection
import asyncio, json, redis.asyncio as aioredis
from datetime import datetime, timezone
r = aioredis.Redis(host='redis', decode_responses=True)
HEARTBEAT_INTERVAL = 15 # client sends heartbeat every 15s
PRESENCE_TTL = 30 # server expires key after 30s — 1 missed heartbeat = offline
PRESENCE_KEY = "presence:{}"
async def handle_connection(websocket, user_id: str):
"""
On connection: set presence to online.
Expect client heartbeat every 15s; refresh TTL on each.
On disconnect: set offline in Postgres, delete Redis key.
"""
await set_online(user_id, websocket.client_info)
try:
async for message in websocket:
data = json.loads(message)
if data.get("type") == "heartbeat":
await refresh_presence(user_id)
elif data.get("type") == "set_status":
# User manually sets away/busy
await set_status(user_id, data["status"])
except Exception:
pass
finally:
await set_offline(user_id)
async def set_online(user_id: str, client_info: dict):
payload = json.dumps({
"status": "online",
"last_seen": int(datetime.now(timezone.utc).timestamp()),
"device": client_info.get("device", "web")
})
await r.setex(PRESENCE_KEY.format(user_id), PRESENCE_TTL, payload)
await publish_presence_change(user_id, "online")
async def refresh_presence(user_id: str):
"""Reset TTL without changing value — O(1) Redis command."""
key = PRESENCE_KEY.format(user_id)
payload = await r.get(key)
if payload:
await r.expire(key, PRESENCE_TTL)
async def set_offline(user_id: str):
await r.delete(PRESENCE_KEY.format(user_id))
await publish_presence_change(user_id, "offline")
# Persist last-seen to Postgres
await update_last_seen_db(user_id)
async def publish_presence_change(user_id: str, status: str):
"""Notify all subscribers via Redis pub/sub."""
event = json.dumps({"user_id": user_id, "status": status,
"ts": int(datetime.now(timezone.utc).timestamp())})
await r.publish(f"presence:changes", event)
Reading Presence and Fan-out to Subscribers
async def get_presence(user_id: str) -> dict:
"""
Check Redis first (live presence).
Fall back to Postgres UserLastSeen for recently-offline users.
"""
raw = await r.get(PRESENCE_KEY.format(user_id))
if raw:
data = json.loads(raw)
data["source"] = "live"
return data
# Fall back to DB for last-seen timestamp
# (simplified: would use async DB client in production)
return {"status": "offline", "last_seen": None, "source": "db"}
async def get_bulk_presence(user_ids: list[str]) -> dict[str, dict]:
"""Fetch presence for up to 100 users in a single pipeline."""
keys = [PRESENCE_KEY.format(uid) for uid in user_ids]
pipeline = r.pipeline()
for key in keys:
pipeline.get(key)
results = await pipeline.execute()
presence = {}
for uid, raw in zip(user_ids, results):
if raw:
presence[uid] = {**json.loads(raw), "status": "online"}
else:
presence[uid] = {"status": "offline"}
return presence
async def presence_fan_out_worker():
"""
Subscribe to presence:changes channel.
For each event, notify subscribers via their WebSocket connections.
"""
pubsub = r.pubsub()
await pubsub.subscribe("presence:changes")
async for message in pubsub.listen():
if message["type"] != "message":
continue
event = json.loads(message["data"])
user_id = event["user_id"]
# Find all subscribers interested in this user's presence
# (query PresenceSubscription or use a Redis set: subscribers:{user_id})
subscriber_ids = await get_subscribers(user_id)
for sub_id in subscriber_ids:
await push_to_websocket(sub_id, event)
Privacy and Status Controls
async def set_status(user_id: str, requested_status: str):
"""
Allow users to set: online, away, busy, invisible.
Invisible: user appears offline to others but can still use the app.
"""
VALID_STATUSES = {"online", "away", "busy", "invisible"}
if requested_status not in VALID_STATUSES:
raise ValueError(f"Invalid status: {requested_status}")
key = PRESENCE_KEY.format(user_id)
raw = await r.get(key)
if not raw:
return # Not connected — ignore
data = json.loads(raw)
data["status"] = requested_status
# Invisible: store true status in Redis but broadcast "offline" to subscribers
broadcast_status = "offline" if requested_status == "invisible" else requested_status
await r.setex(key, PRESENCE_TTL, json.dumps(data))
await publish_presence_change(user_id, broadcast_status)
Key Interview Points
- TTL as implicit offline detection: TCP connections can die silently — a client behind a NAT or mobile network may lose connectivity without sending a FIN. Setting a Redis key with a 30s TTL and refreshing it every 15s means a missed heartbeat causes the key to expire naturally, broadcasting “offline” without needing the server to detect the dead connection explicitly.
- Multi-device presence: A user on both desktop and mobile should show “online” until both devices go offline. Use a per-device key (presence:{user_id}:{device_id}) and a counter: INCR connected:{user_id} on connect, DECR on disconnect. User is “online” if the counter > 0. The counter lives in Redis with no TTL — decremented on disconnect and deleted when it reaches 0.
- Subscriber fan-out scaling: A celebrity with 1M followers sends a presence change that must fan-out to 1M subscribers. Strategies: (1) subscribe only to contacts/friends, not all followers (most apps); (2) cap presence subscriptions at 500 per user; (3) lazy fan-out — clients poll presence on demand rather than receiving push updates. Twitter-style apps don’t show real-time presence for non-contacts.
- Redis keyspace notifications: Enable keyspace notifications (CONFIG SET notify-keyspace-events KEgx) to receive a pub/sub event when a presence key expires (user went offline without explicit disconnect). Subscribe to __keyevent@0__:expired and filter for presence:* keys. This allows the fan-out worker to detect TTL expirations and broadcast “offline” events.
- Last-seen granularity: WhatsApp shows “last seen today at 3:42pm”. Store UserLastSeen.last_seen in Postgres (updated on each disconnect). For privacy: let users control visibility — “nobody”, “contacts only”, “everyone”. Cache the setting per user in Redis to avoid a DB lookup on every presence fetch.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does Redis TTL replace explicit disconnect detection for presence?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”TCP connections can silently drop — a mobile device entering a tunnel or a NAT gateway timing out does not send a TCP FIN packet. The server has no explicit notification. Redis TTL sidesteps this: the client sends a heartbeat (WebSocket ping or application-level message) every 15 seconds, which resets the Redis key TTL to 30 seconds via EXPIRE. If two consecutive heartbeats are missed, the key expires naturally — no FIN required. The key expiry event (via Redis keyspace notifications or a separate poller) triggers broadcasting "offline" to the user’s subscribers. This TTL-as-offline-detection pattern is simpler and more reliable than relying on TCP close events, which are frequently missed in mobile and cloud environments.”}},{“@type”:”Question”,”name”:”How do you track presence for a user logged in on multiple devices simultaneously?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A single TTL key (presence:{user_id}) would be overwritten by the most recent device and expire incorrectly if just one device disconnects. Multi-device approach: maintain one Redis key per device (presence:{user_id}:{device_id}) with independent TTLs refreshed by each device’s heartbeat. User status is the union: if any device key exists, the user is "online." Use a Redis counter (INCR connected:{user_id} on connect, DECR on disconnect) to track active device count. User is online if the counter is > 0. Optionally: show the most active device type (mobile vs desktop) by querying presence:{user_id}:* keys with SCAN. Clean up stale device keys when the counter reaches 0.”}},{“@type”:”Question”,”name”:”How does the fan-out worker propagate presence changes to subscribers?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”When a user goes online or offline, their N subscribers need to be notified. The presence change is published to a Redis pub/sub channel (presence:changes). All WebSocket server instances subscribe to this channel. When a server receives a change event, it looks up which of its currently connected clients are subscribed to that user’s presence (via a local in-memory map: subscriber_id → [target_ids]) and sends the update via their WebSocket connection. This avoids querying the PresenceSubscription table on every presence change — the subscription data is held in memory per server process. On connection, each client registers which users’ presence it wants to watch; the server updates its local map.”}},{“@type”:”Question”,”name”:”What is the invisible status and how does it work technically?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Invisible mode: the user can see others’ presence and interact normally, but appears as "offline" to everyone else. Implementation: store the true status as "invisible" in the Redis presence key (only the server reads this). When broadcasting to subscribers via pub/sub, always broadcast "offline" for invisible users regardless of their true status. The API endpoint GET /presence/{user_id} also returns "offline" for invisible users — except for the user themselves (check session user_id == target user_id and return true status). Invisible users’ own presence key has a normal TTL refreshed by heartbeats — they just broadcast "offline" to others. This is why you cannot derive true status from the pub/sub channel alone.”}},{“@type”:”Question”,”name”:”How do you scale presence to 10 million concurrent users?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Single Redis node: Redis can handle ~1M SETEX operations/second — for 10M users each sending a heartbeat every 15s, that is 667K writes/second. A single Redis node handles this. For reads: GET presence:{user_id} is O(1); bulk GET via pipeline scales to millions of requests/second. For pub/sub fan-out: a single Redis pub/sub channel can handle 100K messages/second. If fan-out is too heavy (a user with 1M subscribers goes online), shard presence:{user_id} subscriptions across multiple pub/sub channels by subscriber ID range. WebSocket servers need sticky sessions (load balancer routes a user’s connections to the same server) or shared fan-out via Redis pub/sub so any server can notify any subscriber.”}}]}
User presence and real-time status system design is discussed in Snap system design interview questions.
User presence and professional network status design is covered in LinkedIn system design interview preparation.
User presence and messaging system design is discussed in Meta system design interview guide.