Presence System Low-Level Design

Presence System — Low-Level Design

A presence system tracks whether users are online, idle, or offline, and broadcasts status changes to other users. This design powers the green dot on Slack, WhatsApp, and LinkedIn. The core challenge is handling millions of connections and heartbeats efficiently.

Presence States

online  → User has an active connection and has been active recently (within 60s)
idle    → User has a connection but no activity for 5+ minutes
offline → No active connection OR no heartbeat for 2+ minutes
busy    → User-set manual override (do not disturb)

State transitions:
  Connect → online
  5 min inactivity → idle
  2 min without heartbeat → offline (connection assumed dead)
  User action → online (from idle)
  Disconnect → offline

Heartbeat Architecture

# Client sends a heartbeat every 30 seconds
# Server records it in Redis with a 90-second TTL
# If 2 heartbeats are missed (90s), the key expires and user appears offline

def heartbeat(user_id, connection_id):
    pipe = redis.pipeline()
    pipe.setex(f'presence:{user_id}', 90, 'online')
    pipe.setex(f'presence:conn:{connection_id}', 90, user_id)
    # Track last activity time for idle detection
    pipe.hset(f'presence:meta:{user_id}', 'last_heartbeat', now().timestamp())
    pipe.execute()

def get_presence(user_id):
    status = redis.get(f'presence:{user_id}')
    if not status:
        return 'offline'
    meta = redis.hgetall(f'presence:meta:{user_id}')
    last_active = float(meta.get(b'last_activity', meta.get(b'last_heartbeat', 0)))
    if time.time() - last_active > 300:  # 5 minutes
        return 'idle'
    return 'online'

def user_activity(user_id):
    """Called when user sends a message, clicks, etc."""
    pipe = redis.pipeline()
    pipe.setex(f'presence:{user_id}', 90, 'online')
    pipe.hset(f'presence:meta:{user_id}', 'last_activity', now().timestamp())
    pipe.execute()
    # Broadcast status change if was idle
    broadcast_presence_change(user_id, 'online')

Presence for a Contact List

def get_presence_for_users(user_ids):
    """Batch presence lookup for a contact list."""
    pipe = redis.pipeline()
    for uid in user_ids:
        pipe.get(f'presence:{uid}')
    results = pipe.execute()

    presence = {}
    for uid, status in zip(user_ids, results):
        if status is None:
            presence[uid] = 'offline'
        else:
            # Further check for idle (could cache idle in Redis too)
            presence[uid] = status.decode()
    return presence

# For a contact list of 200 users: one Redis pipeline = ~1ms
# Never query presence one user at a time (N+1 problem)

Broadcasting Presence Changes

def broadcast_presence_change(user_id, new_status):
    """Notify all users who care about this user's presence."""
    # Who needs to know? Followers, contacts, active chat participants
    subscribers = get_presence_subscribers(user_id)

    if not subscribers:
        return

    message = json.dumps({
        'user_id': user_id,
        'status': new_status,
        'timestamp': now().isoformat(),
    })

    # Fan-out via Redis Pub/Sub
    for subscriber_id in subscribers:
        redis.publish(f'presence:updates:{subscriber_id}', message)

# Scale concern: a celebrity with 1M followers going online
# would fan-out to 1M subscribers simultaneously.
# Solution: limit presence visibility to mutual connections or
# only show presence to users who have recently been in contact.

Presence Subscription Management

def subscribe_to_presence(subscriber_id, target_user_id):
    """
    subscriber_id wants to know when target_user_id changes status.
    Call this when opening a DM or viewing a contact's profile.
    """
    redis.sadd(f'presence:subscribers:{target_user_id}', subscriber_id)
    redis.expire(f'presence:subscribers:{target_user_id}', 3600)

def unsubscribe_from_presence(subscriber_id, target_user_id):
    redis.srem(f'presence:subscribers:{target_user_id}', subscriber_id)

def get_presence_subscribers(user_id):
    members = redis.smembers(f'presence:subscribers:{user_id}')
    return [int(m) for m in members]

Handling WebSocket Disconnection

def on_websocket_disconnect(user_id, connection_id):
    # Don't immediately mark offline — the user may reconnect within seconds
    # (network blip, tab reload). Use a grace period.

    # Schedule deferred offline event
    redis.setex(f'presence:disconnect:{connection_id}', 30, user_id)

    # A background job checks for expired disconnect keys
    # If still expired after 30s and no new heartbeat, mark offline
    enqueue_delayed_offline_check(user_id, connection_id, delay_seconds=30)

def deferred_offline_check(user_id, connection_id):
    # Check if user reconnected
    if redis.exists(f'presence:{user_id}'):
        return  # Reconnected — still online

    # Mark offline
    redis.delete(f'presence:meta:{user_id}')
    broadcast_presence_change(user_id, 'offline')

Key Interview Points

  • Redis TTL is the mechanism, not a job: The 90-second TTL on the presence key means no background job is needed to detect stale heartbeats — Redis expires the key automatically. When the key is gone, the user is offline.
  • Batch presence reads: Fetching a contact list of 200 users means 200 presence lookups. Use a Redis pipeline to execute all gets in one round-trip, not 200 separate calls.
  • Deferred disconnect: Network blips cause brief disconnections. A 30-second grace period prevents flickering the online indicator for a user who reconnects immediately.
  • Presence fan-out is the hardest scale problem: Limit who sees presence to actively interested parties (open DMs, recent contacts). Full-follower fan-out at celebrity scale requires special handling or disabling presence for high-follower accounts.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does a Redis TTL implement presence without a background job?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”When a user sends a heartbeat, set a Redis key presence:{user_id} with a 90-second expiry: SET presence:{user_id} online EX 90. If the user sends another heartbeat before 90 seconds, the key is refreshed. If the client disconnects and no more heartbeats arrive, the key expires after 90 seconds and is automatically deleted by Redis. Checking presence: GET presence:{user_id} — if the key exists, the user is online; if nil, they are offline. No background job is needed to detect stale presence — Redis TTL handles it automatically.”}},{“@type”:”Question”,”name”:”How do you implement the idle state separate from offline?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Track last_activity separately from last_heartbeat. Heartbeats confirm the connection is alive; activity (mouse clicks, key presses, message sends) confirms the user is actively using the app. Store last_activity_ts in Redis alongside the heartbeat key. On presence check: if the heartbeat key exists but last_activity_ts > 5 minutes ago, return idle. If heartbeat key is absent, return offline. Client-side: send an activity event to the server on any user interaction. The server updates the last_activity_ts in Redis. Idle is a display state; offline is an infrastructure state.”}},{“@type”:”Question”,”name”:”How do you batch-fetch presence for a contact list of 200 users?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use a Redis pipeline: open a pipeline, issue GET presence:{user_id} for each of the 200 user IDs, execute the pipeline in one network round-trip. All 200 lookups complete in approximately the same time as a single GET — typically under 1ms. Never use a loop with individual GET calls (200 round-trips ≈ 200ms). For very large contact lists (10,000 users), use Redis MGET which returns all values in one command. Check presence only for users the viewer is likely to interact with, not their entire social graph.”}},{“@type”:”Question”,”name”:”What is the deferred offline pattern and why is it needed?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”When a WebSocket connection drops, it might be due to: (1) a permanent disconnect (app closed, network lost), or (2) a transient blip (momentary network interruption, tab reload) where the user reconnects within a few seconds. If you immediately mark the user offline on disconnect, their status flickers online→offline→online during reconnection — a poor UX. The deferred offline pattern: on disconnect, schedule an offline check 30 seconds later. If the user reconnects and sends a heartbeat within 30 seconds, the check is cancelled. If no heartbeat arrives after 30 seconds, mark offline.”}},{“@type”:”Question”,”name”:”How do you handle presence fan-out for users with many contacts?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”When a celebrity with 5 million followers comes online, naive fan-out would publish 5 million Redis Pub/Sub messages simultaneously, overwhelming the message broker. Solutions: (1) Limit presence visibility to mutual connections or recent chat partners (not all followers). (2) On-demand model: instead of pushing status changes, clients poll presence for only the users they actively care about (open conversations). (3) Batched fan-out: divide the subscriber list into shards and fan out asynchronously over 10-30 seconds, prioritizing active users. (4) For very high-follow accounts: disable real-time presence and show "recently active" instead.”}}]}

Presence system and online status design is discussed in Atlassian system design interview questions.

Presence system and chat status design is covered in Meta system design interview preparation.

Presence and online status system design is discussed in LinkedIn system design interview guide.

Scroll to Top