Presence System — Low-Level Design
A presence system tracks whether users are online, idle, or offline, and broadcasts status changes to other users. This design powers the green dot on Slack, WhatsApp, and LinkedIn. The core challenge is handling millions of connections and heartbeats efficiently.
Presence States
online → User has an active connection and has been active recently (within 60s)
idle → User has a connection but no activity for 5+ minutes
offline → No active connection OR no heartbeat for 2+ minutes
busy → User-set manual override (do not disturb)
State transitions:
Connect → online
5 min inactivity → idle
2 min without heartbeat → offline (connection assumed dead)
User action → online (from idle)
Disconnect → offline
Heartbeat Architecture
# Client sends a heartbeat every 30 seconds
# Server records it in Redis with a 90-second TTL
# If 2 heartbeats are missed (90s), the key expires and user appears offline
def heartbeat(user_id, connection_id):
pipe = redis.pipeline()
pipe.setex(f'presence:{user_id}', 90, 'online')
pipe.setex(f'presence:conn:{connection_id}', 90, user_id)
# Track last activity time for idle detection
pipe.hset(f'presence:meta:{user_id}', 'last_heartbeat', now().timestamp())
pipe.execute()
def get_presence(user_id):
status = redis.get(f'presence:{user_id}')
if not status:
return 'offline'
meta = redis.hgetall(f'presence:meta:{user_id}')
last_active = float(meta.get(b'last_activity', meta.get(b'last_heartbeat', 0)))
if time.time() - last_active > 300: # 5 minutes
return 'idle'
return 'online'
def user_activity(user_id):
"""Called when user sends a message, clicks, etc."""
pipe = redis.pipeline()
pipe.setex(f'presence:{user_id}', 90, 'online')
pipe.hset(f'presence:meta:{user_id}', 'last_activity', now().timestamp())
pipe.execute()
# Broadcast status change if was idle
broadcast_presence_change(user_id, 'online')
Presence for a Contact List
def get_presence_for_users(user_ids):
"""Batch presence lookup for a contact list."""
pipe = redis.pipeline()
for uid in user_ids:
pipe.get(f'presence:{uid}')
results = pipe.execute()
presence = {}
for uid, status in zip(user_ids, results):
if status is None:
presence[uid] = 'offline'
else:
# Further check for idle (could cache idle in Redis too)
presence[uid] = status.decode()
return presence
# For a contact list of 200 users: one Redis pipeline = ~1ms
# Never query presence one user at a time (N+1 problem)
Broadcasting Presence Changes
def broadcast_presence_change(user_id, new_status):
"""Notify all users who care about this user's presence."""
# Who needs to know? Followers, contacts, active chat participants
subscribers = get_presence_subscribers(user_id)
if not subscribers:
return
message = json.dumps({
'user_id': user_id,
'status': new_status,
'timestamp': now().isoformat(),
})
# Fan-out via Redis Pub/Sub
for subscriber_id in subscribers:
redis.publish(f'presence:updates:{subscriber_id}', message)
# Scale concern: a celebrity with 1M followers going online
# would fan-out to 1M subscribers simultaneously.
# Solution: limit presence visibility to mutual connections or
# only show presence to users who have recently been in contact.
Presence Subscription Management
def subscribe_to_presence(subscriber_id, target_user_id):
"""
subscriber_id wants to know when target_user_id changes status.
Call this when opening a DM or viewing a contact's profile.
"""
redis.sadd(f'presence:subscribers:{target_user_id}', subscriber_id)
redis.expire(f'presence:subscribers:{target_user_id}', 3600)
def unsubscribe_from_presence(subscriber_id, target_user_id):
redis.srem(f'presence:subscribers:{target_user_id}', subscriber_id)
def get_presence_subscribers(user_id):
members = redis.smembers(f'presence:subscribers:{user_id}')
return [int(m) for m in members]
Handling WebSocket Disconnection
def on_websocket_disconnect(user_id, connection_id):
# Don't immediately mark offline — the user may reconnect within seconds
# (network blip, tab reload). Use a grace period.
# Schedule deferred offline event
redis.setex(f'presence:disconnect:{connection_id}', 30, user_id)
# A background job checks for expired disconnect keys
# If still expired after 30s and no new heartbeat, mark offline
enqueue_delayed_offline_check(user_id, connection_id, delay_seconds=30)
def deferred_offline_check(user_id, connection_id):
# Check if user reconnected
if redis.exists(f'presence:{user_id}'):
return # Reconnected — still online
# Mark offline
redis.delete(f'presence:meta:{user_id}')
broadcast_presence_change(user_id, 'offline')
Key Interview Points
- Redis TTL is the mechanism, not a job: The 90-second TTL on the presence key means no background job is needed to detect stale heartbeats — Redis expires the key automatically. When the key is gone, the user is offline.
- Batch presence reads: Fetching a contact list of 200 users means 200 presence lookups. Use a Redis pipeline to execute all gets in one round-trip, not 200 separate calls.
- Deferred disconnect: Network blips cause brief disconnections. A 30-second grace period prevents flickering the online indicator for a user who reconnects immediately.
- Presence fan-out is the hardest scale problem: Limit who sees presence to actively interested parties (open DMs, recent contacts). Full-follower fan-out at celebrity scale requires special handling or disabling presence for high-follower accounts.
Presence system and online status design is discussed in Atlassian system design interview questions.
Presence system and chat status design is covered in Meta system design interview preparation.
Presence and online status system design is discussed in LinkedIn system design interview guide.