How does a matching engine process orders in microseconds?

The matching engine is a single-threaded deterministic state machine (no parallelism -- determinism required for regulatory audit). Low-latency techniques: (1) Bare metal (no VMs). (2) Kernel bypass networking (DPDK) -- packets go directly to userspace. (3) Pre-allocated memory -- zero allocation in the hot path. (4) Lock-free data structures (Disruptor pattern). (5) Thread affinity -- pin to specific CPU cores. (6) Busy-wait polling (no sleep/wake overhead). (7) No garbage collection -- C/C++ or off-heap Java. Order book: red-black tree for price levels (O(log P)), linked list per level for FIFO queue, hash map for O(1) cancel by order_id. Matching: price-time priority. Market orders walk the opposite side until filled. Limit orders rest in the book if not immediately matchable. Latency target: P50 5 microseconds, P99 20 microseconds. Any P99.9 > 1ms is investigated.

How does market data reach traders in microseconds?

Every order book change produces a market data event (new order, cancel, trade). Distribution: (1) UDP multicast -- one packet, all subscribers receive simultaneously. Most efficient for many subscribers. Used by NYSE, NASDAQ. (2) Sequenced messages -- each has a sequence number. Gaps trigger retransmission requests. (3) Data levels: L1 (top of book -- best bid/ask + last trade), L2 (market depth -- all price levels with aggregate size), L3 (full order-by-order for HFT). Latency: microseconds via colocation -- traders place servers physically adjacent to the exchange (same datacenter, same switch). Network latency < 1 microsecond vs milliseconds over the internet. Exchanges charge premium fees for colocation because proximity = speed = profit for high-frequency traders. FPGA (hardware) can parse market data at nanosecond speed, faster than software.

System Design: Design Electronic Trading Platform — Order Book, Matching Engine, Market Data, Low Latency, Colocation

⏱ 7 min read

Electronic trading platforms process millions of orders per second with microsecond-level latency. Designing a trading system tests your understanding of: ultra-low-latency architecture, order book data structures, matching engine algorithms, market data distribution, and the deterministic systems that underpin financial markets. This is a specialized system design question for fintech and quantitative trading interviews.

Order Book Data Structure

The order book is the central data structure: a sorted list of buy orders (bids) and sell orders (asks) for a financial instrument. Bids sorted highest-to-lowest (best bid = highest price buyers will pay). Asks sorted lowest-to-highest (best ask = lowest price sellers will accept). The spread = best ask – best bid. Data structure per price level: each price level has a queue of orders (FIFO — first order at that price gets filled first). The book is: Map<Price, Queue<Order>> for bids and Map<Price, Queue<Order>> for asks. Implementation: (1) Red-black tree or skip list for the price levels (O(log N) insert, delete, and find-best). (2) Linked list per price level for the order queue (O(1) add to tail, O(1) remove from head). (3) Hash map from order_id to order location (O(1) cancel by order_id). Operations: (1) Add order: insert at the correct price level. If the price level does not exist, create it. O(log P) where P is active price levels. (2) Cancel order: look up by order_id (hash map), remove from the queue. O(1). (3) Match: compare the best bid with the best ask. If best_bid >= best_ask: execute a trade. Remove/reduce the matched orders. O(1) per match. For high-frequency trading: the order book must handle 100,000+ updates per second per instrument with deterministic latency (no GC pauses, no memory allocation in the hot path).

Matching Engine

The matching engine determines if and how orders execute against each other. Matching rules: (1) Price-time priority (most common): orders are matched by best price first, then by arrival time (earlier orders fill first at the same price). (2) Pro-rata: at the same price, orders are filled proportional to their size (used for some futures markets). Order types the engine must handle: (1) Market order — execute immediately at the best available price. Walk the opposite side of the book until filled. (2) Limit order — execute at the specified price or better. If not immediately matchable: rest in the book (become a passive order). (3) Stop order — triggered when the market price reaches a threshold. Converts to a market or limit order. (4) Immediate-or-Cancel (IOC) — fill whatever is available immediately, cancel the rest. (5) Fill-or-Kill (FOK) — fill the entire order immediately or cancel entirely (no partial fills). Matching engine requirements: (1) Deterministic — the same sequence of orders always produces the same trades (required for regulatory audit). (2) Sequential — orders are processed in strict arrival order. No parallelism in the matching logic (parallelism would introduce non-determinism). The matching engine is the one component that CANNOT be horizontally scaled — it is a single-threaded, deterministic state machine. (3) Low latency — match and respond in microseconds. Achieved with: bare-metal hardware, kernel bypass networking (DPDK), lock-free data structures, and pre-allocated memory (zero allocation in the hot path).

Market Data Distribution

Every order book change produces a market data event: new order, cancel, trade execution, and price level updates. Subscribers (traders, market makers, data vendors) receive these events in real-time. Data feed types: (1) Level 1 (top of book) — best bid price/size, best ask price/size, and last trade price. Sufficient for most retail traders. Low bandwidth. (2) Level 2 (market depth) — the full order book: all price levels with aggregate size per level. Typically top 5-20 levels. Used by: institutional traders and market makers. (3) Level 3 (full order-by-order) — every individual order in the book with order_id. Maximum information. Used by: high-frequency traders for order flow analysis. Distribution architecture: (1) Multicast — one-to-many UDP multicast. The exchange sends one packet; all subscribers on the multicast group receive it simultaneously. Most efficient for many subscribers. Used by: major exchanges (NYSE, NASDAQ). (2) TCP — unicast per subscriber. More reliable (TCP guarantees delivery) but higher exchange bandwidth. Used by: smaller venues and for guaranteed delivery. (3) Sequenced — every market data message has a sequence number. If a subscriber detects a gap (missed sequence number): request a retransmission or resynchronize from a snapshot. Latency: market data must reach subscribers in microseconds. Colocation: traders place their servers physically adjacent to the exchange matching engine (same datacenter, same switch). Network latency: < 1 microsecond (compared to milliseconds over the internet). The exchange charges premium fees for colocation — proximity = speed = profit for high-frequency traders.

Low-Latency Architecture

Trading systems optimize for latency at every layer: (1) Hardware — bare-metal servers (no virtualization overhead). FPGA (Field-Programmable Gate Array) for network processing — parses market data in hardware at nanosecond speed. (2) Networking — kernel bypass (DPDK, Solarflare OpenOnload) — network packets go directly to userspace without passing through the Linux kernel. Saves 10-50 microseconds per packet. (3) Memory — pre-allocate all memory at startup. Zero allocation in the hot path (malloc/new introduces latency jitter from the memory allocator). Object pools for reusable objects. (4) No garbage collection — use C or C++ (manual memory management). Java: use specialized low-latency JVMs (Azul Zing) or allocate off-heap. GC pauses (even 1ms) are unacceptable. (5) Lock-free data structures — multiple threads communicate via lock-free queues (Disruptor pattern, LMAX). No mutex contention. (6) Thread affinity — pin threads to specific CPU cores. Prevent the OS scheduler from moving threads between cores (which invalidates CPU caches). (7) Busy-wait instead of sleep — the main event loop continuously polls for new data (not sleeping/waiting). This wastes CPU but provides the lowest latency (no wake-up delay). Measurement: latency is measured in percentiles. P50: 5 microseconds. P99: 20 microseconds. P99.9: 50 microseconds. Any outlier (P99.9 > 1ms) is investigated — it may indicate a GC pause, a kernel interrupt, or a cache miss.

Risk Management and Compliance

Pre-trade risk checks run on every order BEFORE it reaches the matching engine: (1) Position limits — the trader cannot exceed their maximum position size per instrument. (2) Order size limits — reject orders larger than a threshold (fat-finger protection). (3) Price limits — reject orders with prices far from the current market (e.g., a buy limit 50% above the current price is likely an error). (4) Rate limits — maximum orders per second per trader (prevent runaway algorithms). (5) Buying power — the trader has sufficient margin/collateral for the order. These checks must be extremely fast (5% in 5 minutes), the exchange halts trading for a cooling-off period (5-15 minutes). This prevents flash crashes (cascading algorithmic selling). Implemented as: a price monitoring service that tracks price movement per instrument and triggers a halt by sending a special message to the matching engine. The matching engine stops processing orders for that instrument and enters an auction state. Audit trail: every order, modification, cancellation, and trade is logged with nanosecond timestamps. Retained for 7 years. Required by regulators (SEC, FINRA, MiFID II) for market surveillance and investigation of market manipulation.