Question 1

How does WebSocket routing work when connections are spread across multiple servers?

Accepted Answer

Each WebSocket connection is a stateful TCP connection held by one specific server instance. When server A needs to send a message to a user whose connection is on server B, it cannot deliver directly. The standard solution uses Redis Pub/Sub as a message bus: each server subscribes to its own channel (ws:server:{server_id}). To send to a user, look up their connection(s) in Redis (ws:connections:{user_id} maps connection_id to server_id). For local connections, deliver directly. For remote connections, publish the message to that server's Redis channel. The target server receives the pub/sub message and delivers it to the local WebSocket connection. This adds one Redis round-trip for cross-server delivery (~1ms) but keeps the architecture stateless from the perspective of the message sender.

Question 2

Why are heartbeats essential in WebSocket connections?

Accepted Answer

WebSocket connections run over TCP, which does not automatically detect that a connection has silently died. NAT devices drop idle connections after 30-90 seconds. Mobile networks switch between WiFi and LTE, resetting TCP state. Corporate proxies close idle connections. Without heartbeats, the server holds a WebSocket object for a connection that has been dead for minutes — wasting memory and file descriptors. Solution: send a ping frame every 25-30 seconds. If no pong is received within 10 seconds, close the connection and clean up server state. The WebSocket protocol has built-in ping/pong frames (opcode 0x9/0xA); alternatively, send application-level {"type": "ping"} messages. The client echoes back {"type": "pong"}. Missing pongs trigger disconnect and reconnect on the client side.

Question 3

How many concurrent WebSocket connections can one server handle?

Accepted Answer

Each WebSocket connection consumes: one file descriptor (OS limit: ulimit -n, typically 65535 per process), ~64-128KB of kernel TCP buffers, and application-level state (1-10KB per connection). The binding constraint is usually memory or file descriptors. On a server with 8GB RAM dedicated to WebSocket state: at 5KB/connection, that's 1.6M connections theoretical max. Practical limit with a Node.js or asyncio server: 10,000-100,000 concurrent connections per instance, depending on message frequency. CPU becomes the bottleneck for high-message-rate connections. For 1M concurrent users, run 20-100 WebSocket server instances behind a load balancer with sticky sessions. Profile with realistic load before estimating — idle vs active connections have very different resource profiles.

Question 4

What is sticky session routing and why is it needed for WebSocket load balancing?

Accepted Answer

A load balancer normally distributes each request to any healthy backend. For stateless HTTP this is fine — each request carries all state (JWT, cookies). For WebSocket, the initial HTTP upgrade handshake establishes the connection with one specific server — all subsequent frames on that connection must go to the same server. If the load balancer routes frames to a different server, the connection is broken. Sticky sessions (also called session affinity) configure the load balancer to route all connections from the same source IP (or using a cookie) to the same backend server. In Nginx: use ip_hash or sticky cookie directives. In AWS ALB: enable stickiness on the target group. Reconnections after a server failure will naturally go to a new server — handle this gracefully in the client with exponential backoff reconnect logic.

Question 5

How do you implement room-based broadcasting for a chat application?

Accepted Answer

A "room" is a logical group of connections that receive the same messages (a chat channel, a game lobby). Each connection subscribes to room channels at connect time. To broadcast to a room: publish the message to a Redis channel named ws:room:{room_id}. Every WebSocket server that has at least one connection subscribed to that room will receive the pub/sub message and fan it out to its local connections in that room. This scales to any number of servers — adding a server automatically handles its share of room subscriptions. Implementation: when a connection joins a room, call redis.subscribe("ws:room:{room_id}") on that server's pub/sub listener. On leave or disconnect, if no local connections remain subscribed to that room, call redis.unsubscribe to reduce pub/sub overhead.

WebSocket Server Low-Level Design: Real-Time Connections, Routing, and Scaling

Core Data Model

Connection Lifecycle

Cross-Server Message Delivery via Pub/Sub

Connection Limits and Backpressure

Key Interview Points