A WebSocket server maintains persistent bidirectional connections between clients and servers, enabling real-time push without polling. The challenge is not the protocol — it is the stateful nature of WebSocket connections. Each connection is tied to one server instance; when you scale to multiple servers, a message must be routed to the correct server that holds the target client’s connection. This routing problem, and the memory/resource management of thousands of concurrent connections, are the core interview topics.
Core Data Model
-- Connection registry: maps client ID to server instance
-- Stored in Redis for cross-server routing
-- Redis Hash: ws:connections:{user_id}
-- Field: connection_id
-- Value: server_id (which server instance holds this connection)
-- TTL: set on each heartbeat, expire when client disconnects or times out
-- In-process state (per server instance):
class ConnectionRegistry:
# connection_id -> websocket object
connections: dict[str, WebSocket]
# user_id -> set of connection_ids (one user may have multiple tabs)
user_connections: dict[int, set[str]]
# connection_id -> metadata (user_id, subscriptions, last_ping)
metadata: dict[str, dict]
Connection Lifecycle
import asyncio
import json
import uuid
from fastapi import WebSocket, WebSocketDisconnect
class WebSocketServer:
def __init__(self, server_id: str, redis_client):
self.server_id = server_id
self.redis = redis_client
self.connections: dict[str, WebSocket] = {}
self.user_connections: dict[int, set[str]] = {}
async def connect(self, websocket: WebSocket, user_id: int):
await websocket.accept()
conn_id = str(uuid.uuid4())
# Register locally
self.connections[conn_id] = websocket
self.user_connections.setdefault(user_id, set()).add(conn_id)
# Register in Redis for cross-server routing
await self.redis.hset(f"ws:connections:{user_id}", conn_id, self.server_id)
await self.redis.expire(f"ws:connections:{user_id}", 300) # 5 min TTL
return conn_id
async def disconnect(self, conn_id: str, user_id: int):
self.connections.pop(conn_id, None)
user_conns = self.user_connections.get(user_id, set())
user_conns.discard(conn_id)
if not user_conns:
del self.user_connections[user_id]
await self.redis.hdel(f"ws:connections:{user_id}", conn_id)
async def send_to_connection(self, conn_id: str, message: dict) -> bool:
ws = self.connections.get(conn_id)
if not ws:
return False
try:
await ws.send_json(message)
return True
except Exception:
return False
async def heartbeat_loop(self, conn_id: str, user_id: int):
"""Send ping every 30s; disconnect if pong not received."""
while conn_id in self.connections:
await asyncio.sleep(30)
ws = self.connections.get(conn_id)
if not ws:
break
try:
await ws.send_json({"type": "ping"})
# Refresh Redis TTL
await self.redis.expire(f"ws:connections:{user_id}", 300)
except Exception:
await self.disconnect(conn_id, user_id)
break
Cross-Server Message Delivery via Pub/Sub
class MessageRouter:
def __init__(self, ws_server: WebSocketServer, redis_client):
self.ws_server = ws_server
self.redis = redis_client
async def send_to_user(self, user_id: int, message: dict):
"""Deliver a message to all connections for a user, across all servers."""
# Find which server(s) hold this user's connections
conn_map = await self.redis.hgetall(f"ws:connections:{user_id}")
# conn_map = {conn_id: server_id, ...}
for conn_id, server_id in conn_map.items():
if server_id.decode() == self.ws_server.server_id:
# Local connection — send directly
await self.ws_server.send_to_connection(conn_id, message)
else:
# Remote connection — publish to that server's channel
await self.redis.publish(
f"ws:server:{server_id.decode()}",
json.dumps({'conn_id': conn_id, 'message': message})
)
async def subscribe_to_incoming(self):
"""Listen for messages routed to this server from other servers."""
channel = f"ws:server:{self.ws_server.server_id}"
pubsub = self.redis.pubsub()
await pubsub.subscribe(channel)
async for msg in pubsub.listen():
if msg['type'] == 'message':
data = json.loads(msg['data'])
await self.ws_server.send_to_connection(
data['conn_id'], data['message']
)
async def broadcast_to_room(self, room_id: str, message: dict):
"""Send to all users subscribed to a room (e.g., a chat channel)."""
# Publish to a room channel; all servers subscribed to this room deliver locally
await self.redis.publish(f"ws:room:{room_id}", json.dumps(message))
Connection Limits and Backpressure
# Each WebSocket connection consumes:
# - File descriptor (OS limit: ulimit -n, typically 65535 per process)
# - ~64KB kernel buffers per socket
# - Application state (~1-5KB per connection)
# A single server can handle 10,000-50,000 concurrent WebSocket connections
# Nginx WebSocket proxy configuration:
# upstream ws_backend {
# server ws-1:8080;
# server ws-2:8080;
# server ws-3:8080;
# # Use ip_hash or sticky sessions to route reconnects to same server
# ip_hash;
# }
# location /ws {
# proxy_pass http://ws_backend;
# proxy_http_version 1.1;
# proxy_set_header Upgrade $http_upgrade;
# proxy_set_header Connection "upgrade";
# proxy_read_timeout 86400; # 24h -- keep WS connections alive
# }
Key Interview Points
- WebSocket connections are stateful and server-sticky — reconnections should go to the same server (use consistent hashing or sticky sessions at the load balancer) to avoid unnecessary cross-server routing.
- Redis Pub/Sub is the standard solution for cross-server message delivery — each server subscribes to its own channel; other servers publish to it. Message delivery is fire-and-forget (no persistence, no acknowledgment).
- Heartbeats are mandatory — TCP connections silently die due to NAT timeouts, mobile network switches, and proxy timeouts. Send ping every 30 seconds and disconnect if no pong within 10 seconds.
- Connection registry cleanup: if a server crashes, its connections’ Redis entries become stale. Set a TTL on the Redis hash (refreshed each heartbeat); stale entries expire automatically within 5 minutes.
- Horizontal scaling: stateless HTTP scales easily; WebSockets scale differently because connections are pinned to servers. Plan for sticky sessions at the load balancer and pub/sub for message routing.
- For very large fan-outs (broadcast to 1M users), use a tiered architecture: a message queue feeds multiple delivery servers, each responsible for a subset of connections — not a single pub/sub channel.
WebSocket server and real-time messaging system design is discussed in Meta system design interview questions.
WebSocket server and real-time communication design is covered in Snap system design interview preparation.
WebSocket server and real-time collaboration design is discussed in Atlassian system design interview guide.