WebSockets provide full-duplex, persistent communication between a browser and server over a single TCP connection. Unlike HTTP request-response, either side can send data at any time without the overhead of a new HTTP request. WebSockets power real-time features: live chat, collaborative document editing (Google Docs), real-time dashboards, multiplayer games, stock tickers, and live sports scores. Slack, Discord, Figma, and trading platforms rely on WebSocket connections at massive scale. Understanding WebSocket design is essential for any system requiring low-latency bidirectional communication.
WebSocket Handshake and Protocol
WebSockets start as an HTTP/1.1 request with an Upgrade header. The server responds with 101 Switching Protocols, and the TCP connection is now a WebSocket connection. Data is sent as frames: each frame has a 2-10 byte header (opcode, payload length, masking key for client frames) followed by the payload. Opcodes: 0x1 (text frame), 0x2 (binary frame), 0x8 (close), 0x9 (ping), 0xA (pong). Ping/pong frames are the heartbeat mechanism — the server sends pings every 30-60 seconds to detect dead connections (TCP keepalive is unreliable through proxies and NAT). Browser clients must mask frames; servers must not mask.
// Node.js WebSocket server (ws library) — chat room example
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });
const rooms = new Map(); // room_id -> Set
wss.on('connection', (ws, req) => {
const roomId = new URL(req.url, 'http://host').searchParams.get('room');
if (!rooms.has(roomId)) rooms.set(roomId, new Set());
rooms.get(roomId).add(ws);
ws.on('message', (data) => {
const msg = JSON.parse(data);
// Broadcast to all clients in the same room
rooms.get(roomId).forEach(client => {
if (client !== ws && client.readyState === WebSocket.OPEN) {
client.send(JSON.stringify(msg));
}
});
});
ws.on('close', () => rooms.get(roomId).delete(ws));
// Heartbeat: detect dead connections
ws.isAlive = true;
ws.on('pong', () => { ws.isAlive = true; });
});
setInterval(() => {
wss.clients.forEach(ws => {
if (!ws.isAlive) return ws.terminate();
ws.isAlive = false;
ws.ping();
});
}, 30000); // ping every 30 seconds
Scaling WebSockets: Pub/Sub and Sticky Sessions
A single WebSocket server can handle 10K-100K concurrent connections (depends on memory and message rate). To scale beyond one server: Sticky sessions: route a user always to the same server (via consistent hashing on user_id or session cookie at the load balancer). The user’s WebSocket connection persists on one server; messages from that user are handled locally. Pub/Sub fan-out: when a message must reach users on different servers (e.g., broadcast a chat message to a room where users are spread across servers), use Redis Pub/Sub or Kafka. Each server subscribes to topics for the rooms whose users it serves. A message to room-42 is published to the room-42 topic; all servers serving room-42 users receive it and forward to their local connections.
WebSocket vs. SSE vs. Long Polling
Long polling: client sends an HTTP request; server holds it until data is available (or timeout), then responds; client immediately re-requests. Works everywhere, but each message requires a new HTTP request (latency overhead). Server-Sent Events (SSE): server streams events over a persistent HTTP connection; client only receives (browser EventSource API). Simpler than WebSockets for one-directional server-to-client streaming (notifications, dashboards). Uses HTTP/2 multiplexing — no special proxy configuration needed. WebSockets: full-duplex, binary-capable, lower overhead per message. Required when the client also sends frequent messages to the server (chat, games). Choose SSE for notification-style one-way streaming; WebSocket for interactive bidirectional real-time communication.
Key Interview Discussion Points
- Connection limits: each WebSocket is one TCP connection; Linux default file descriptor limit is 1024 per process — increase with ulimit -n 1000000 and fs.file-max sysctl for high-connection servers
- Presence and online detection: track connected users in Redis (SADD online_users user_id on connect, SREM on disconnect); a heartbeat failure should trigger presence update after grace period to handle network blips
- Message ordering: WebSocket is ordered within a single connection (TCP), but messages from multiple senders may arrive out of order; include sequence numbers or logical timestamps for ordering in the application layer
- Reconnection with exponential backoff: clients should retry with backoff (1s, 2s, 4s, 8s…) and cap (60s) on disconnect; include session ID for state resumption — resume from last received message sequence number
- WebSocket over HTTP/2: RFC 8441 extends WebSocket to work over HTTP/2 streams, enabling WebSocket multiplexing (multiple WebSocket connections over one TCP connection) and better proxy compatibility