How does HTTP/2 stream multiplexing eliminate head-of-line blocking?

HTTP/2 carries multiple logical streams over a single TCP connection. Each stream has an independent ID and flow control window. Requests and responses on different streams do not block each other — a slow response on stream 3 does not delay stream 5. This eliminates the HTTP/1.1 pipelining HOL blocking problem. However, TCP-level HOL blocking remains: a lost packet stalls all streams until it is retransmitted.

What is HPACK and how does it compress HTTP headers?

HPACK compresses HTTP/2 headers using two tables: a static table (61 predefined header name-value pairs like :method GET) and a dynamic table (recently used headers, shared per connection). A header can be encoded as an index into either table (1-2 bytes) instead of the full string. New headers are added to the dynamic table. Huffman encoding further compresses literal strings. HPACK avoids the CRIME attack vulnerability of DEFLATE by not compressing across streams.

What is QUIC and how does it differ from TCP?

QUIC is a transport protocol built on UDP, designed to be the foundation for HTTP/3. It implements reliable delivery, flow control, and congestion control in userspace. Key differences from TCP: independent stream delivery (packet loss only blocks the affected stream, not all streams), built-in TLS 1.3 (no separate TLS handshake — 1-RTT connection setup), connection IDs (connections survive IP address changes, enabling mobile handoff), and 0-RTT session resumption (send data with the first packet for known servers).

What is 0-RTT in HTTP/3 and what are its security implications?

0-RTT (zero round trip time) allows a client to send data immediately on reconnection to a known server, before the TLS handshake completes. The client uses a session ticket from a prior connection. This eliminates the 1-RTT overhead for repeat connections. Security risk: 0-RTT data is vulnerable to replay attacks — an attacker who captures a 0-RTT packet can re-send it. Servers must treat 0-RTT requests as potentially replayed and reject non-idempotent operations (POST, payment processing).

When should you use HTTP/2 vs HTTP/3?

Use HTTP/2 when: clients and servers are on reliable, low-packet-loss networks (corporate LAN, data center); UDP is blocked by firewalls or network equipment; the backend infrastructure does not yet support QUIC. Use HTTP/3 when: serving mobile clients on high-packet-loss networks (cellular); connection migration matters (switching between WiFi and cellular); tail latency is critical and 0-RTT savings are measurable. In practice, major CDNs (Cloudflare, Google, Akamai) serve HTTP/3 to supported clients with HTTP/2 fallback.

Low Level Design: HTTP/2 and HTTP/3 Internals

⏱ 7 min read

HTTP/2 and HTTP/3 represent fundamental redesigns of the web’s application protocol, motivated by the performance limitations of HTTP/1.1. This post covers the wire-level design of both protocols: binary framing, multiplexing, header compression, QUIC, and the engineering trade-offs behind each choice.

HTTP/1.1 Problems

HTTP/1.1 has three structural performance problems. First, head-of-line (HOL) blocking in pipelining: although HTTP/1.1 allows sending multiple requests on one connection without waiting for responses, the server must respond in order — a slow first response blocks all subsequent ones. Browsers largely abandoned pipelining as a result.

Second, browsers work around HOL blocking by opening 6–8 TCP connections per host, which is expensive in terms of connection setup, memory, and congestion window ramp-up. Third, HTTP/1.1 headers are plain text and repetitive — cookies, User-Agent, and Accept headers are resent verbatim on every request, adding hundreds to thousands of bytes of overhead per request.

HTTP/2 Binary Framing

HTTP/2 replaces the text-based HTTP/1.1 format with a binary framing layer. All communication is split into frames. Every frame has a 9-byte header: 3 bytes for length (max 16 MB per frame by default), 1 byte for type, 1 byte for flags, and 4 bytes for stream identifier.

The primary frame types are DATA (carries request/response body bytes) and HEADERS (carries HTTP headers, HPACK-compressed). Other types include WINDOW_UPDATE (flow control), RST_STREAM (cancel a stream), SETTINGS (negotiate connection parameters), PING, and GOAWAY (graceful connection shutdown). The binary format is unambiguous, efficient to parse, and enables all the multiplexing machinery described below.

Stream Multiplexing

HTTP/2 introduces the concept of streams: independent, bidirectional sequences of frames within a single TCP connection. Each stream has a unique integer ID (client-initiated streams use odd numbers; server-initiated use even). A single TCP connection carries arbitrarily many concurrent streams interleaved at the frame level.

Each stream has its own flow control window: a credit-based system where the receiver advertises how many bytes it can accept. The sender must not exceed the window; the receiver sends WINDOW_UPDATE frames to extend credit. Flow control operates both at the stream level and at the connection level, allowing fine-grained backpressure without stalling other streams. Stream prioritization (weights and dependencies) allows clients to hint that, say, CSS should be delivered before images.

HPACK Header Compression

HPACK compresses HTTP headers using two mechanisms. The static table contains 61 pre-defined header name/value pairs (e.g., entry 2 is :method: GET, entry 8 is :status: 200). Sending a common header costs as little as 1 byte — the table index.

The dynamic table is a FIFO queue of recently used headers shared between client and server. New headers are added to the dynamic table and can subsequently be referenced by index. Huffman encoding further compresses literal header values. HPACK was carefully designed to prevent the CRIME attack (which exploited TLS compression to extract secrets by observing compressed size variations) by never compressing across request boundaries in a way that leaks secret data — cookies use "never-indexed" literals.

Server Push

Server push allows a server to proactively send resources the client will need before the client requests them. When serving an HTML page, the server can push the associated CSS and JS files in the same round trip, eliminating the latency of the client parsing HTML and issuing follow-up requests.

In practice, server push has been largely abandoned. The server cannot know whether the client already has the resource cached; pushed resources waste bandwidth if the client already has them. Browsers implemented complex heuristics or simply dropped push support (Chrome removed it in 2022). The 103 Early Hints response header is now the preferred mechanism for the same goal.

HTTP/2 HOL Blocking at the TCP Layer

HTTP/2 solves application-level HOL blocking but cannot solve TCP-level HOL blocking. TCP is a reliable, ordered byte stream: if a packet is lost, TCP holds all subsequent data in the receive buffer until the lost packet is retransmitted and delivered. All HTTP/2 streams on that connection are stalled, even those whose data was already received and buffered — they simply cannot be delivered to the application out of order.

On lossy networks (mobile, congested Wi-Fi), HTTP/2 can actually perform worse than HTTP/1.1 with multiple connections, because HTTP/1.1’s separate TCP connections are not all stalled by a single loss event. This fundamental limitation of TCP motivated HTTP/3.

HTTP/3 over QUIC

HTTP/3 runs over QUIC, a transport protocol built on UDP. QUIC implements its own reliability, congestion control, and ordering — but at the stream level, not the connection level. A lost UDP packet stalls only the QUIC stream that was carrying that packet; other streams proceed unaffected. This eliminates TCP-level HOL blocking entirely.

QUIC has TLS 1.3 built in — there is no unencrypted QUIC. The TLS handshake is integrated into the QUIC handshake, so a new connection requires only 1 RTT (versus 2 RTT for TCP + TLS 1.3 separately). Repeated connections can use 0-RTT resumption: the client sends application data with the very first packet using a session ticket from a prior connection, achieving zero additional round trips for repeat visits (with the trade-off that 0-RTT data is vulnerable to replay attacks and should be used only for idempotent requests).

QUIC Connection IDs and Migration

A TCP connection is identified by the 4-tuple (src IP, src port, dst IP, dst port). When a mobile device switches from Wi-Fi to LTE, its IP address changes, and the TCP connection breaks — the client must reconnect, re-establish TLS, and replay any in-flight requests.

QUIC connections are identified by connection IDs chosen by the endpoints, embedded in the QUIC packet header. When the client’s IP or port changes, it sends packets with the same connection ID from the new address. The server recognizes the connection ID and continues the session without interruption — seamless connection migration. This is a significant advantage for mobile users and long-lived connections.

QPACK and Deployment Challenges

QPACK is the header compression scheme for HTTP/3, analogous to HPACK. HPACK could not be used directly because it assumes headers arrive in order — QUIC streams are independent, so header frames can arrive out of order. QPACK solves this with a control stream for dynamic table updates, decoupling table synchronization from header transmission.

HTTP/3 deployment faces practical challenges. Many enterprise firewalls and middleboxes block or rate-limit UDP beyond DNS, assuming UDP means untrusted or non-web traffic. Browsers fall back to HTTP/2 or HTTP/1.1 when QUIC is blocked. High-performance QUIC implementations often use kernel bypass (via DPDK or io_uring) to handle the per-packet overhead of userspace TLS and UDP processing, since QUIC cannot offload to NIC hardware the way TCP can. Despite these challenges, HTTP/3 serves a large and growing fraction of web traffic at major CDNs.