WebSocket Gateway Low-Level Design: Connection Management, Message Routing, and Horizontal Scaling

Connection Lifecycle

WebSocket connections begin as HTTP requests and upgrade to a persistent bidirectional channel:

  1. Client sends HTTP GET with Upgrade: websocket and Connection: Upgrade headers.
  2. Server responds with 101 Switching Protocols — the connection is now a WebSocket.
  3. Both sides can send frames at any time without request-response ceremony.
  4. Connection closes when either side sends a close frame, or the TCP connection drops.

Each connection holds memory and a file descriptor on the gateway. A single gateway node can handle 10k–100k concurrent connections depending on message frequency and payload size. Plan capacity accordingly.

Connection State in Redis

WebSocket connections are stateful and long-lived. Store connection metadata in Redis so any gateway node can route messages to any connection:

// On connect: store connection metadata
HSET conn:{connection_id}
  user_id       usr_abc123
  gateway_node  gw-node-2
  connected_at  1713340800
  subscriptions []

EXPIRE conn:{connection_id} 86400

// Index by user_id for user-targeted messages
SADD user_conns:{user_id} {connection_id}

When a connection closes, delete the Redis key and remove from the user index. Expired keys auto-clean stale entries from crashed gateway nodes.

Authentication

Authenticate on the initial HTTP Upgrade request — before the WebSocket is established. Validate the JWT from the Authorization header or token query parameter. Reject with 401 Unauthorized before upgrading. Once upgraded, it is too late to gracefully reject without tearing down the connection.

Token refresh during an active connection: clients send a refresh message type with a new JWT. The gateway validates and updates the session. Do not disconnect clients on token expiry mid-session — implement a grace period and a refresh protocol.

Message Routing to Backend Services

Clients send typed messages; the gateway routes by type to backend services:

// Client sends:
{ "type": "chat.send", "payload": { "room_id": "r1", "text": "hello" } }

// Gateway routes to chat-service via internal HTTP:
POST http://chat-service/messages
{ "connection_id": "conn_xyz", "user_id": "usr_abc", "room_id": "r1", "text": "hello" }

Alternatively, route via a message queue (Kafka, SQS) for decoupling and backpressure. The gateway publishes the message; the backend service consumes it asynchronously. Responses flow back through pub/sub fan-out, not the original HTTP call.

Fan-Out with Redis Pub/Sub

Broadcasting a message to all connections subscribed to a channel:

// Backend service publishes to Redis channel:
PUBLISH channel:room:r1 '{"text":"hello","from":"usr_abc"}'

// Every gateway node subscribes to relevant channels:
// On message: look up local connections subscribed to room:r1
// Deliver frame to each local WebSocket connection

This decouples message delivery from the originating gateway node. A message published by any service reaches all connected clients regardless of which gateway node holds their connection. Redis pub/sub is fire-and-forget — if a subscriber is slow, messages are dropped. For reliability, use Redis Streams or Kafka with consumer groups.

Horizontal Scaling Challenge

WebSocket connections are stateful — a client connected to node A cannot receive a message delivered only to node B. Two solutions:

  • Sticky sessions: Load balancer pins each client to the same gateway node for the life of the connection (via cookie or IP hash). Simple, but node failures disconnect all pinned clients. Does not distribute load evenly if connection lifetimes vary.
  • Pub/sub fan-out: No stickiness required. Every gateway node subscribes to all channels. Any node can publish a message and all nodes deliver it to their local subscribers. More Redis traffic but true horizontal scalability and no single point of failure.

Heartbeat, Back-Pressure, and Rate Limiting

Heartbeat: gateway sends a WebSocket ping frame every 30 seconds. Client must respond with a pong within 10 seconds. No pong = dead connection — close it and clean up Redis state. This recovers file descriptors from zombie connections where the TCP connection silently dropped.

Back-pressure: if a client is consuming messages slower than they arrive, the gateway buffers up to N messages per connection in memory. When the buffer is full, drop the oldest messages (or disconnect the slow client). Never let a slow consumer exhaust gateway memory.

Rate limiting per connection: track message count per connection in a sliding window. Clients exceeding the limit receive an error frame and may be disconnected. Prevents a single connection from monopolizing gateway CPU.

Connection Draining

Graceful gateway shutdown for deploys and scaling events:

  1. Stop accepting new WebSocket upgrade requests (remove from load balancer).
  2. Send a close frame to all active connections with a 1001 Going Away code and a reconnect hint in the payload.
  3. Well-behaved clients reconnect to another gateway node immediately.
  4. Wait for connections to drain (clients close their end after receiving the close frame).
  5. After a timeout (30–60 seconds), force-close remaining connections and exit.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

Scroll to Top