WebSocket Server Low-Level Design: Real-Time Connections, Routing, and Scaling

A WebSocket server maintains persistent bidirectional connections between clients and servers, enabling real-time push without polling. The challenge is not the protocol — it is the stateful nature of WebSocket connections. Each connection is tied to one server instance; when you scale to multiple servers, a message must be routed to the correct server that holds the target client’s connection. This routing problem, and the memory/resource management of thousands of concurrent connections, are the core interview topics.

Core Data Model

-- Connection registry: maps client ID to server instance
-- Stored in Redis for cross-server routing

-- Redis Hash: ws:connections:{user_id}
-- Field: connection_id
-- Value: server_id (which server instance holds this connection)
-- TTL: set on each heartbeat, expire when client disconnects or times out

-- In-process state (per server instance):
class ConnectionRegistry:
    # connection_id -> websocket object
    connections: dict[str, WebSocket]
    # user_id -> set of connection_ids (one user may have multiple tabs)
    user_connections: dict[int, set[str]]
    # connection_id -> metadata (user_id, subscriptions, last_ping)
    metadata: dict[str, dict]

Connection Lifecycle

import asyncio
import json
import uuid
from fastapi import WebSocket, WebSocketDisconnect

class WebSocketServer:
    def __init__(self, server_id: str, redis_client):
        self.server_id = server_id
        self.redis = redis_client
        self.connections: dict[str, WebSocket] = {}
        self.user_connections: dict[int, set[str]] = {}

    async def connect(self, websocket: WebSocket, user_id: int):
        await websocket.accept()
        conn_id = str(uuid.uuid4())

        # Register locally
        self.connections[conn_id] = websocket
        self.user_connections.setdefault(user_id, set()).add(conn_id)

        # Register in Redis for cross-server routing
        await self.redis.hset(f"ws:connections:{user_id}", conn_id, self.server_id)
        await self.redis.expire(f"ws:connections:{user_id}", 300)  # 5 min TTL

        return conn_id

    async def disconnect(self, conn_id: str, user_id: int):
        self.connections.pop(conn_id, None)
        user_conns = self.user_connections.get(user_id, set())
        user_conns.discard(conn_id)
        if not user_conns:
            del self.user_connections[user_id]

        await self.redis.hdel(f"ws:connections:{user_id}", conn_id)

    async def send_to_connection(self, conn_id: str, message: dict) -> bool:
        ws = self.connections.get(conn_id)
        if not ws:
            return False
        try:
            await ws.send_json(message)
            return True
        except Exception:
            return False

    async def heartbeat_loop(self, conn_id: str, user_id: int):
        """Send ping every 30s; disconnect if pong not received."""
        while conn_id in self.connections:
            await asyncio.sleep(30)
            ws = self.connections.get(conn_id)
            if not ws:
                break
            try:
                await ws.send_json({"type": "ping"})
                # Refresh Redis TTL
                await self.redis.expire(f"ws:connections:{user_id}", 300)
            except Exception:
                await self.disconnect(conn_id, user_id)
                break

Cross-Server Message Delivery via Pub/Sub

class MessageRouter:
    def __init__(self, ws_server: WebSocketServer, redis_client):
        self.ws_server = ws_server
        self.redis = redis_client

    async def send_to_user(self, user_id: int, message: dict):
        """Deliver a message to all connections for a user, across all servers."""
        # Find which server(s) hold this user's connections
        conn_map = await self.redis.hgetall(f"ws:connections:{user_id}")
        # conn_map = {conn_id: server_id, ...}

        for conn_id, server_id in conn_map.items():
            if server_id.decode() == self.ws_server.server_id:
                # Local connection — send directly
                await self.ws_server.send_to_connection(conn_id, message)
            else:
                # Remote connection — publish to that server's channel
                await self.redis.publish(
                    f"ws:server:{server_id.decode()}",
                    json.dumps({'conn_id': conn_id, 'message': message})
                )

    async def subscribe_to_incoming(self):
        """Listen for messages routed to this server from other servers."""
        channel = f"ws:server:{self.ws_server.server_id}"
        pubsub = self.redis.pubsub()
        await pubsub.subscribe(channel)
        async for msg in pubsub.listen():
            if msg['type'] == 'message':
                data = json.loads(msg['data'])
                await self.ws_server.send_to_connection(
                    data['conn_id'], data['message']
                )

    async def broadcast_to_room(self, room_id: str, message: dict):
        """Send to all users subscribed to a room (e.g., a chat channel)."""
        # Publish to a room channel; all servers subscribed to this room deliver locally
        await self.redis.publish(f"ws:room:{room_id}", json.dumps(message))

Connection Limits and Backpressure

# Each WebSocket connection consumes:
# - File descriptor (OS limit: ulimit -n, typically 65535 per process)
# - ~64KB kernel buffers per socket
# - Application state (~1-5KB per connection)
# A single server can handle 10,000-50,000 concurrent WebSocket connections

# Nginx WebSocket proxy configuration:
# upstream ws_backend {
#     server ws-1:8080;
#     server ws-2:8080;
#     server ws-3:8080;
#     # Use ip_hash or sticky sessions to route reconnects to same server
#     ip_hash;
# }
# location /ws {
#     proxy_pass http://ws_backend;
#     proxy_http_version 1.1;
#     proxy_set_header Upgrade $http_upgrade;
#     proxy_set_header Connection "upgrade";
#     proxy_read_timeout 86400;  # 24h -- keep WS connections alive
# }

Key Interview Points

  • WebSocket connections are stateful and server-sticky — reconnections should go to the same server (use consistent hashing or sticky sessions at the load balancer) to avoid unnecessary cross-server routing.
  • Redis Pub/Sub is the standard solution for cross-server message delivery — each server subscribes to its own channel; other servers publish to it. Message delivery is fire-and-forget (no persistence, no acknowledgment).
  • Heartbeats are mandatory — TCP connections silently die due to NAT timeouts, mobile network switches, and proxy timeouts. Send ping every 30 seconds and disconnect if no pong within 10 seconds.
  • Connection registry cleanup: if a server crashes, its connections’ Redis entries become stale. Set a TTL on the Redis hash (refreshed each heartbeat); stale entries expire automatically within 5 minutes.
  • Horizontal scaling: stateless HTTP scales easily; WebSockets scale differently because connections are pinned to servers. Plan for sticky sessions at the load balancer and pub/sub for message routing.
  • For very large fan-outs (broadcast to 1M users), use a tiered architecture: a message queue feeds multiple delivery servers, each responsible for a subset of connections — not a single pub/sub channel.

WebSocket server and real-time messaging system design is discussed in Meta system design interview questions.

WebSocket server and real-time communication design is covered in Snap system design interview preparation.

WebSocket server and real-time collaboration design is discussed in Atlassian system design interview guide.

Scroll to Top