WebSockets in Production: Scaling, Auth, and Reconnect

WebSockets are easy to demo and hard to run at scale. Senior frontend interviews probe whether you understand the production realities — connection management, server-side scaling, auth, and the dozen ways a real-time system can fail.

The basic flow

  1. Client opens WebSocket: new WebSocket('wss://...')
  2. HTTP upgrade handshake
  3. Persistent TCP connection
  4. Both sides can send messages

Authentication

HTTP cookies sent during upgrade work for same-origin. Cross-origin needs:

  • Token in URL (logged in server logs — avoid)
  • First-message auth: client sends auth message after connect; server validates before processing
  • Custom subprotocol header: auth in Sec-WebSocket-Protocol

Token expiry handling: when token expires, server closes connection with specific code; client refreshes token and reconnects.

Reconnection

Networks drop. Standard reconnect:

  • Exponential backoff (1s, 2s, 4s, 8s)
  • Cap at 30s
  • Indicate to user: “Reconnecting…”
  • Reset timer on successful reconnect

Libraries handle this: reconnecting-websocket, socket.io-client.

Message ordering on reconnect

The hard part. Strategies:

  • Client tracks last received message ID
  • On reconnect, sends “give me messages since X”
  • Server replays missed messages

Without this, users miss messages during reconnect.

Heartbeat / ping

Networks may silently drop connections. Detect with heartbeat:

  • Client sends ping every 30s
  • Server responds with pong
  • If no pong within timeout, declare connection dead and reconnect

WebSocket protocol has built-in ping/pong; some implementations expose, some don’t.

Server-side scaling

WebSockets are stateful. Each connection ties to a server instance. Scaling concerns:

  • Connection limits: Linux file descriptor limits, typically 65K per process
  • Memory: ~10–50KB per connection, depending on framework
  • Sticky sessions: load balancer must route the same client to the same instance

For 100K+ concurrent connections, plan capacity carefully.

Pub/sub for fanout

Broadcasting messages to many users:

  • Server instances subscribe to a Redis pubsub channel
  • App publishes to Redis
  • All instances receive; broadcast to their connected clients

This pattern (or NATS, Kafka) is standard for chat, live updates, etc.

Channel / room management

Users care about specific topics (chat rooms, document IDs). Pattern:

  • Client joins channel after connecting
  • Server tracks which connections are in which channels
  • Broadcast only to relevant channels

Backpressure

If a slow client cannot keep up:

  • Buffer fills
  • Server runs out of memory
  • Close the slow connection (rather than affecting other clients)

Implement explicit backpressure: drop messages, close slow clients, signal “you fell behind, reconnect.”

Proxy and load balancer issues

  • Some proxies idle-timeout WebSocket connections after 30–60 seconds
  • HTTP/1.1 proxies may not support WebSocket upgrades
  • nginx and HAProxy support WebSockets natively but need explicit config

Test with your actual deployment topology.

Mobile-specific

  • Background apps lose WebSocket connection
  • iOS app suspended? Connection dies. Reconnect on foreground.
  • Cellular handoff (Wi-Fi → LTE) drops the connection
  • For critical real-time, use push notifications as backup signal

Common mistakes

  • No reconnection logic
  • Token expires; connection silently dies
  • No heartbeat; zombie connections accumulate
  • No message replay; users miss messages on reconnect
  • Single-server architecture; cannot scale

Frequently Asked Questions

Should I use Socket.io or native WebSocket?

Native WebSocket is leaner. Socket.io adds reconnect, fallback to long polling, namespaces — useful overhead.

Can I run WebSockets on serverless?

API Gateway WebSocket on AWS, Cloudflare Durable Objects — both support. Higher latency than dedicated servers but easier ops.

How many concurrent WebSocket connections can a Node.js server handle?

Tens of thousands per instance with proper tuning. Beyond that, scale horizontally.

Scroll to Top