WebSocket Gateway Low-Level Design: Connection Management, Message Routing, and Horizontal Scaling

Connection Lifecycle

WebSocket connections begin as HTTP requests and upgrade to a persistent bidirectional channel:

Client sends HTTP GET with Upgrade: websocket and Connection: Upgrade headers.
Server responds with 101 Switching Protocols — the connection is now a WebSocket.
Both sides can send frames at any time without request-response ceremony.
Connection closes when either side sends a close frame, or the TCP connection drops.

Each connection holds memory and a file descriptor on the gateway. A single gateway node can handle 10k–100k concurrent connections depending on message frequency and payload size. Plan capacity accordingly.

Connection State in Redis

WebSocket connections are stateful and long-lived. Store connection metadata in Redis so any gateway node can route messages to any connection:

// On connect: store connection metadata
HSET conn:{connection_id}
  user_id       usr_abc123
  gateway_node  gw-node-2
  connected_at  1713340800
  subscriptions []

EXPIRE conn:{connection_id} 86400

// Index by user_id for user-targeted messages
SADD user_conns:{user_id} {connection_id}

When a connection closes, delete the Redis key and remove from the user index. Expired keys auto-clean stale entries from crashed gateway nodes.

Authentication

Authenticate on the initial HTTP Upgrade request — before the WebSocket is established. Validate the JWT from the Authorization header or token query parameter. Reject with 401 Unauthorized before upgrading. Once upgraded, it is too late to gracefully reject without tearing down the connection.

Token refresh during an active connection: clients send a refresh message type with a new JWT. The gateway validates and updates the session. Do not disconnect clients on token expiry mid-session — implement a grace period and a refresh protocol.

Message Routing to Backend Services

Clients send typed messages; the gateway routes by type to backend services:

// Client sends:
{ "type": "chat.send", "payload": { "room_id": "r1", "text": "hello" } }

// Gateway routes to chat-service via internal HTTP:
POST http://chat-service/messages
{ "connection_id": "conn_xyz", "user_id": "usr_abc", "room_id": "r1", "text": "hello" }

Alternatively, route via a message queue (Kafka, SQS) for decoupling and backpressure. The gateway publishes the message; the backend service consumes it asynchronously. Responses flow back through pub/sub fan-out, not the original HTTP call.

Fan-Out with Redis Pub/Sub

Broadcasting a message to all connections subscribed to a channel:

// Backend service publishes to Redis channel:
PUBLISH channel:room:r1 '{"text":"hello","from":"usr_abc"}'

// Every gateway node subscribes to relevant channels:
// On message: look up local connections subscribed to room:r1
// Deliver frame to each local WebSocket connection

This decouples message delivery from the originating gateway node. A message published by any service reaches all connected clients regardless of which gateway node holds their connection. Redis pub/sub is fire-and-forget — if a subscriber is slow, messages are dropped. For reliability, use Redis Streams or Kafka with consumer groups.

Horizontal Scaling Challenge

WebSocket connections are stateful — a client connected to node A cannot receive a message delivered only to node B. Two solutions:

Sticky sessions: Load balancer pins each client to the same gateway node for the life of the connection (via cookie or IP hash). Simple, but node failures disconnect all pinned clients. Does not distribute load evenly if connection lifetimes vary.
Pub/sub fan-out: No stickiness required. Every gateway node subscribes to all channels. Any node can publish a message and all nodes deliver it to their local subscribers. More Redis traffic but true horizontal scalability and no single point of failure.

Heartbeat, Back-Pressure, and Rate Limiting

Heartbeat: gateway sends a WebSocket ping frame every 30 seconds. Client must respond with a pong within 10 seconds. No pong = dead connection — close it and clean up Redis state. This recovers file descriptors from zombie connections where the TCP connection silently dropped.

Back-pressure: if a client is consuming messages slower than they arrive, the gateway buffers up to N messages per connection in memory. When the buffer is full, drop the oldest messages (or disconnect the slow client). Never let a slow consumer exhaust gateway memory.

Rate limiting per connection: track message count per connection in a sliding window. Clients exceeding the limit receive an error frame and may be disconnected. Prevents a single connection from monopolizing gateway CPU.

Connection Draining

Graceful gateway shutdown for deploys and scaling events:

Stop accepting new WebSocket upgrade requests (remove from load balancer).
Send a close frame to all active connections with a 1001 Going Away code and a reconnect hint in the payload.
Well-behaved clients reconnect to another gateway node immediately.
Wait for connections to drain (clients close their end after receiving the close frame).
After a timeout (30–60 seconds), force-close remaining connections and exit.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is WebSocket connection state stored in a distributed gateway?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each gateway node keeps an in-process map of connection ID to socket handle for connections it owns, while a shared Redis cluster stores the mapping of user or session ID to the gateway node address. This two-level lookup lets any node route a targeted push to the correct peer node without broadcasting to the entire cluster.”
}
},
{
“@type”: “Question”,
“name”: “How does Redis pub/sub enable fan-out to connections on different gateway nodes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When a message must be delivered to a set of subscribers, the publishing service pushes the payload to a Redis channel; every gateway node is subscribed to that channel and delivers the message to any matching local connections it holds. This decouples the publisher from knowledge of which node owns which connection while keeping fan-out latency to a single Redis hop plus local delivery.”
}
},
{
“@type”: “Question”,
“name”: “How are WebSocket connections authenticated?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The initial HTTP upgrade request carries a short-lived JWT or session token in the Authorization header or query parameter, which the gateway validates against the auth service before completing the handshake. After the upgrade, the authenticated identity is stored server-side against the connection ID so subsequent frames do not repeat full token validation.”
}
},
{
“@type”: “Question”,
“name”: “How does connection draining work during gateway deployment?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “On receiving a SIGTERM the gateway stops accepting new upgrade requests and sends a Close frame (code 1001 Going Away) to all connected clients, giving them a configurable grace period — typically 30–60 seconds — to reconnect to another node before the process exits. A load balancer health check is failed immediately so no new TCP connections are routed to the draining instance during this window.”
}
}
]
}