HTTP/2 and HTTP/3 represent fundamental redesigns of the web’s application protocol, motivated by the performance limitations of HTTP/1.1. This post covers the wire-level design of both protocols: binary framing, multiplexing, header compression, QUIC, and the engineering trade-offs behind each choice.
HTTP/1.1 Problems
HTTP/1.1 has three structural performance problems. First, head-of-line (HOL) blocking in pipelining: although HTTP/1.1 allows sending multiple requests on one connection without waiting for responses, the server must respond in order — a slow first response blocks all subsequent ones. Browsers largely abandoned pipelining as a result.
Second, browsers work around HOL blocking by opening 6–8 TCP connections per host, which is expensive in terms of connection setup, memory, and congestion window ramp-up. Third, HTTP/1.1 headers are plain text and repetitive — cookies, User-Agent, and Accept headers are resent verbatim on every request, adding hundreds to thousands of bytes of overhead per request.
HTTP/2 Binary Framing
HTTP/2 replaces the text-based HTTP/1.1 format with a binary framing layer. All communication is split into frames. Every frame has a 9-byte header: 3 bytes for length (max 16 MB per frame by default), 1 byte for type, 1 byte for flags, and 4 bytes for stream identifier.
The primary frame types are DATA (carries request/response body bytes) and HEADERS (carries HTTP headers, HPACK-compressed). Other types include WINDOW_UPDATE (flow control), RST_STREAM (cancel a stream), SETTINGS (negotiate connection parameters), PING, and GOAWAY (graceful connection shutdown). The binary format is unambiguous, efficient to parse, and enables all the multiplexing machinery described below.
Stream Multiplexing
HTTP/2 introduces the concept of streams: independent, bidirectional sequences of frames within a single TCP connection. Each stream has a unique integer ID (client-initiated streams use odd numbers; server-initiated use even). A single TCP connection carries arbitrarily many concurrent streams interleaved at the frame level.
Each stream has its own flow control window: a credit-based system where the receiver advertises how many bytes it can accept. The sender must not exceed the window; the receiver sends WINDOW_UPDATE frames to extend credit. Flow control operates both at the stream level and at the connection level, allowing fine-grained backpressure without stalling other streams. Stream prioritization (weights and dependencies) allows clients to hint that, say, CSS should be delivered before images.
HPACK Header Compression
HPACK compresses HTTP headers using two mechanisms. The static table contains 61 pre-defined header name/value pairs (e.g., entry 2 is :method: GET, entry 8 is :status: 200). Sending a common header costs as little as 1 byte — the table index.
The dynamic table is a FIFO queue of recently used headers shared between client and server. New headers are added to the dynamic table and can subsequently be referenced by index. Huffman encoding further compresses literal header values. HPACK was carefully designed to prevent the CRIME attack (which exploited TLS compression to extract secrets by observing compressed size variations) by never compressing across request boundaries in a way that leaks secret data — cookies use "never-indexed" literals.
Server Push
Server push allows a server to proactively send resources the client will need before the client requests them. When serving an HTML page, the server can push the associated CSS and JS files in the same round trip, eliminating the latency of the client parsing HTML and issuing follow-up requests.
In practice, server push has been largely abandoned. The server cannot know whether the client already has the resource cached; pushed resources waste bandwidth if the client already has them. Browsers implemented complex heuristics or simply dropped push support (Chrome removed it in 2022). The 103 Early Hints response header is now the preferred mechanism for the same goal.
HTTP/2 HOL Blocking at the TCP Layer
HTTP/2 solves application-level HOL blocking but cannot solve TCP-level HOL blocking. TCP is a reliable, ordered byte stream: if a packet is lost, TCP holds all subsequent data in the receive buffer until the lost packet is retransmitted and delivered. All HTTP/2 streams on that connection are stalled, even those whose data was already received and buffered — they simply cannot be delivered to the application out of order.
On lossy networks (mobile, congested Wi-Fi), HTTP/2 can actually perform worse than HTTP/1.1 with multiple connections, because HTTP/1.1’s separate TCP connections are not all stalled by a single loss event. This fundamental limitation of TCP motivated HTTP/3.
HTTP/3 over QUIC
HTTP/3 runs over QUIC, a transport protocol built on UDP. QUIC implements its own reliability, congestion control, and ordering — but at the stream level, not the connection level. A lost UDP packet stalls only the QUIC stream that was carrying that packet; other streams proceed unaffected. This eliminates TCP-level HOL blocking entirely.
QUIC has TLS 1.3 built in — there is no unencrypted QUIC. The TLS handshake is integrated into the QUIC handshake, so a new connection requires only 1 RTT (versus 2 RTT for TCP + TLS 1.3 separately). Repeated connections can use 0-RTT resumption: the client sends application data with the very first packet using a session ticket from a prior connection, achieving zero additional round trips for repeat visits (with the trade-off that 0-RTT data is vulnerable to replay attacks and should be used only for idempotent requests).
QUIC Connection IDs and Migration
A TCP connection is identified by the 4-tuple (src IP, src port, dst IP, dst port). When a mobile device switches from Wi-Fi to LTE, its IP address changes, and the TCP connection breaks — the client must reconnect, re-establish TLS, and replay any in-flight requests.
QUIC connections are identified by connection IDs chosen by the endpoints, embedded in the QUIC packet header. When the client’s IP or port changes, it sends packets with the same connection ID from the new address. The server recognizes the connection ID and continues the session without interruption — seamless connection migration. This is a significant advantage for mobile users and long-lived connections.
QPACK and Deployment Challenges
QPACK is the header compression scheme for HTTP/3, analogous to HPACK. HPACK could not be used directly because it assumes headers arrive in order — QUIC streams are independent, so header frames can arrive out of order. QPACK solves this with a control stream for dynamic table updates, decoupling table synchronization from header transmission.
HTTP/3 deployment faces practical challenges. Many enterprise firewalls and middleboxes block or rate-limit UDP beyond DNS, assuming UDP means untrusted or non-web traffic. Browsers fall back to HTTP/2 or HTTP/1.1 when QUIC is blocked. High-performance QUIC implementations often use kernel bypass (via DPDK or io_uring) to handle the per-packet overhead of userspace TLS and UDP processing, since QUIC cannot offload to NIC hardware the way TCP can. Despite these challenges, HTTP/3 serves a large and growing fraction of web traffic at major CDNs.