WebSocket Server Low-Level Design: Real-Time Connections, Routing, and Scaling

A WebSocket server maintains persistent bidirectional connections between clients and servers, enabling real-time push without polling. The challenge is not the protocol — it is the stateful nature of WebSocket connections. Each connection is tied to one server instance; when you scale to multiple servers, a message must be routed to the correct server that holds the target client’s connection. This routing problem, and the memory/resource management of thousands of concurrent connections, are the core interview topics.

Core Data Model

-- Connection registry: maps client ID to server instance
-- Stored in Redis for cross-server routing

-- Redis Hash: ws:connections:{user_id}
-- Field: connection_id
-- Value: server_id (which server instance holds this connection)
-- TTL: set on each heartbeat, expire when client disconnects or times out

-- In-process state (per server instance):
class ConnectionRegistry:
    # connection_id -> websocket object
    connections: dict[str, WebSocket]
    # user_id -> set of connection_ids (one user may have multiple tabs)
    user_connections: dict[int, set[str]]
    # connection_id -> metadata (user_id, subscriptions, last_ping)
    metadata: dict[str, dict]

Connection Lifecycle

import asyncio
import json
import uuid
from fastapi import WebSocket, WebSocketDisconnect

class WebSocketServer:
    def __init__(self, server_id: str, redis_client):
        self.server_id = server_id
        self.redis = redis_client
        self.connections: dict[str, WebSocket] = {}
        self.user_connections: dict[int, set[str]] = {}

    async def connect(self, websocket: WebSocket, user_id: int):
        await websocket.accept()
        conn_id = str(uuid.uuid4())

        # Register locally
        self.connections[conn_id] = websocket
        self.user_connections.setdefault(user_id, set()).add(conn_id)

        # Register in Redis for cross-server routing
        await self.redis.hset(f"ws:connections:{user_id}", conn_id, self.server_id)
        await self.redis.expire(f"ws:connections:{user_id}", 300)  # 5 min TTL

        return conn_id

    async def disconnect(self, conn_id: str, user_id: int):
        self.connections.pop(conn_id, None)
        user_conns = self.user_connections.get(user_id, set())
        user_conns.discard(conn_id)
        if not user_conns:
            del self.user_connections[user_id]

        await self.redis.hdel(f"ws:connections:{user_id}", conn_id)

    async def send_to_connection(self, conn_id: str, message: dict) -> bool:
        ws = self.connections.get(conn_id)
        if not ws:
            return False
        try:
            await ws.send_json(message)
            return True
        except Exception:
            return False

    async def heartbeat_loop(self, conn_id: str, user_id: int):
        """Send ping every 30s; disconnect if pong not received."""
        while conn_id in self.connections:
            await asyncio.sleep(30)
            ws = self.connections.get(conn_id)
            if not ws:
                break
            try:
                await ws.send_json({"type": "ping"})
                # Refresh Redis TTL
                await self.redis.expire(f"ws:connections:{user_id}", 300)
            except Exception:
                await self.disconnect(conn_id, user_id)
                break

Cross-Server Message Delivery via Pub/Sub

class MessageRouter:
    def __init__(self, ws_server: WebSocketServer, redis_client):
        self.ws_server = ws_server
        self.redis = redis_client

    async def send_to_user(self, user_id: int, message: dict):
        """Deliver a message to all connections for a user, across all servers."""
        # Find which server(s) hold this user's connections
        conn_map = await self.redis.hgetall(f"ws:connections:{user_id}")
        # conn_map = {conn_id: server_id, ...}

        for conn_id, server_id in conn_map.items():
            if server_id.decode() == self.ws_server.server_id:
                # Local connection — send directly
                await self.ws_server.send_to_connection(conn_id, message)
            else:
                # Remote connection — publish to that server's channel
                await self.redis.publish(
                    f"ws:server:{server_id.decode()}",
                    json.dumps({'conn_id': conn_id, 'message': message})
                )

    async def subscribe_to_incoming(self):
        """Listen for messages routed to this server from other servers."""
        channel = f"ws:server:{self.ws_server.server_id}"
        pubsub = self.redis.pubsub()
        await pubsub.subscribe(channel)
        async for msg in pubsub.listen():
            if msg['type'] == 'message':
                data = json.loads(msg['data'])
                await self.ws_server.send_to_connection(
                    data['conn_id'], data['message']
                )

    async def broadcast_to_room(self, room_id: str, message: dict):
        """Send to all users subscribed to a room (e.g., a chat channel)."""
        # Publish to a room channel; all servers subscribed to this room deliver locally
        await self.redis.publish(f"ws:room:{room_id}", json.dumps(message))

Connection Limits and Backpressure

# Each WebSocket connection consumes:
# - File descriptor (OS limit: ulimit -n, typically 65535 per process)
# - ~64KB kernel buffers per socket
# - Application state (~1-5KB per connection)
# A single server can handle 10,000-50,000 concurrent WebSocket connections

# Nginx WebSocket proxy configuration:
# upstream ws_backend {
#     server ws-1:8080;
#     server ws-2:8080;
#     server ws-3:8080;
#     # Use ip_hash or sticky sessions to route reconnects to same server
#     ip_hash;
# }
# location /ws {
#     proxy_pass http://ws_backend;
#     proxy_http_version 1.1;
#     proxy_set_header Upgrade $http_upgrade;
#     proxy_set_header Connection "upgrade";
#     proxy_read_timeout 86400;  # 24h -- keep WS connections alive
# }

Key Interview Points

  • WebSocket connections are stateful and server-sticky — reconnections should go to the same server (use consistent hashing or sticky sessions at the load balancer) to avoid unnecessary cross-server routing.
  • Redis Pub/Sub is the standard solution for cross-server message delivery — each server subscribes to its own channel; other servers publish to it. Message delivery is fire-and-forget (no persistence, no acknowledgment).
  • Heartbeats are mandatory — TCP connections silently die due to NAT timeouts, mobile network switches, and proxy timeouts. Send ping every 30 seconds and disconnect if no pong within 10 seconds.
  • Connection registry cleanup: if a server crashes, its connections’ Redis entries become stale. Set a TTL on the Redis hash (refreshed each heartbeat); stale entries expire automatically within 5 minutes.
  • Horizontal scaling: stateless HTTP scales easily; WebSockets scale differently because connections are pinned to servers. Plan for sticky sessions at the load balancer and pub/sub for message routing.
  • For very large fan-outs (broadcast to 1M users), use a tiered architecture: a message queue feeds multiple delivery servers, each responsible for a subset of connections — not a single pub/sub channel.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does WebSocket routing work when connections are spread across multiple servers?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Each WebSocket connection is a stateful TCP connection held by one specific server instance. When server A needs to send a message to a user whose connection is on server B, it cannot deliver directly. The standard solution uses Redis Pub/Sub as a message bus: each server subscribes to its own channel (ws:server:{server_id}). To send to a user, look up their connection(s) in Redis (ws:connections:{user_id} maps connection_id to server_id). For local connections, deliver directly. For remote connections, publish the message to that server’s Redis channel. The target server receives the pub/sub message and delivers it to the local WebSocket connection. This adds one Redis round-trip for cross-server delivery (~1ms) but keeps the architecture stateless from the perspective of the message sender.”}},{“@type”:”Question”,”name”:”Why are heartbeats essential in WebSocket connections?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”WebSocket connections run over TCP, which does not automatically detect that a connection has silently died. NAT devices drop idle connections after 30-90 seconds. Mobile networks switch between WiFi and LTE, resetting TCP state. Corporate proxies close idle connections. Without heartbeats, the server holds a WebSocket object for a connection that has been dead for minutes — wasting memory and file descriptors. Solution: send a ping frame every 25-30 seconds. If no pong is received within 10 seconds, close the connection and clean up server state. The WebSocket protocol has built-in ping/pong frames (opcode 0x9/0xA); alternatively, send application-level {"type": "ping"} messages. The client echoes back {"type": "pong"}. Missing pongs trigger disconnect and reconnect on the client side.”}},{“@type”:”Question”,”name”:”How many concurrent WebSocket connections can one server handle?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Each WebSocket connection consumes: one file descriptor (OS limit: ulimit -n, typically 65535 per process), ~64-128KB of kernel TCP buffers, and application-level state (1-10KB per connection). The binding constraint is usually memory or file descriptors. On a server with 8GB RAM dedicated to WebSocket state: at 5KB/connection, that’s 1.6M connections theoretical max. Practical limit with a Node.js or asyncio server: 10,000-100,000 concurrent connections per instance, depending on message frequency. CPU becomes the bottleneck for high-message-rate connections. For 1M concurrent users, run 20-100 WebSocket server instances behind a load balancer with sticky sessions. Profile with realistic load before estimating — idle vs active connections have very different resource profiles.”}},{“@type”:”Question”,”name”:”What is sticky session routing and why is it needed for WebSocket load balancing?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A load balancer normally distributes each request to any healthy backend. For stateless HTTP this is fine — each request carries all state (JWT, cookies). For WebSocket, the initial HTTP upgrade handshake establishes the connection with one specific server — all subsequent frames on that connection must go to the same server. If the load balancer routes frames to a different server, the connection is broken. Sticky sessions (also called session affinity) configure the load balancer to route all connections from the same source IP (or using a cookie) to the same backend server. In Nginx: use ip_hash or sticky cookie directives. In AWS ALB: enable stickiness on the target group. Reconnections after a server failure will naturally go to a new server — handle this gracefully in the client with exponential backoff reconnect logic.”}},{“@type”:”Question”,”name”:”How do you implement room-based broadcasting for a chat application?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A "room" is a logical group of connections that receive the same messages (a chat channel, a game lobby). Each connection subscribes to room channels at connect time. To broadcast to a room: publish the message to a Redis channel named ws:room:{room_id}. Every WebSocket server that has at least one connection subscribed to that room will receive the pub/sub message and fan it out to its local connections in that room. This scales to any number of servers — adding a server automatically handles its share of room subscriptions. Implementation: when a connection joins a room, call redis.subscribe("ws:room:{room_id}") on that server’s pub/sub listener. On leave or disconnect, if no local connections remain subscribed to that room, call redis.unsubscribe to reduce pub/sub overhead.”}}]}

WebSocket server and real-time messaging system design is discussed in Meta system design interview questions.

WebSocket server and real-time communication design is covered in Snap system design interview preparation.

WebSocket server and real-time collaboration design is discussed in Atlassian system design interview guide.

Scroll to Top