System Design Interview: Stock Exchange (NYSE / Nasdaq Order Book)

⏱ 8 min read

What Makes Stock Exchanges Hard to Design

A stock exchange is one of the most demanding distributed systems: it must process millions of orders per second with microsecond latency, maintain strict price-time priority fairness, guarantee zero message loss, and produce an authoritative audit trail for regulatory compliance. Firms spend hundreds of millions on custom hardware (FPGAs, kernel-bypass networking) to shave microseconds. This question tests knowledge of low-latency design, order matching algorithms, and financial systems concepts. It appears at hedge funds, trading firms, Robinhood, Coinbase, and financial infrastructure companies.

Order Types

Market order: buy or sell immediately at the best available price. Guaranteed execution, uncertain price.
Limit order: buy or sell at a specified price or better. Guaranteed price, uncertain execution (may not fill if price never reached).
Stop order: becomes a market order when price crosses a trigger level. Used for stop-loss protection.
IOC (Immediate or Cancel): fill as much as possible immediately, cancel the rest.
FOK (Fill or Kill): fill the entire order immediately or cancel the whole thing.

The Order Book

The order book is a real-time record of all outstanding limit orders for a security. It has two sides: bids (buyers, sorted descending by price) and asks (sellers, sorted ascending by price). The best bid is the highest price a buyer will pay; the best ask is the lowest price a seller will accept. The spread is the difference between them. Orders at the same price level are queued in time order (price-time priority: better price executes first; equal price executes in FIFO order). Efficient data structure: a sorted map (price level -> queue of orders). In Java: TreeMap<Long, Deque>. Price in integer ticks (not floats) to avoid rounding errors.


from sortedcontainers import SortedDict
from collections import deque

class OrderBook:
    def __init__(self):
        self.bids = SortedDict(lambda x: -x)  # descending by price
        self.asks = SortedDict()               # ascending by price
        self.orders = {}                       # order_id -> Order

    def add_order(self, order):
        book = self.bids if order.side == "BUY" else self.asks
        if order.price not in book:
            book[order.price] = deque()
        book[order.price].append(order)
        self.orders[order.id] = order

    def cancel_order(self, order_id):
        order = self.orders.pop(order_id, None)
        if order:
            book = self.bids if order.side == "BUY" else self.asks
            book[order.price].remove(order)
            if not book[order.price]:
                del book[order.price]

The Matching Engine

The matching engine is the heart of the exchange. When a new order arrives, it attempts to match against the opposite side of the book. For a market buy order: take the best ask (lowest price), fill as much quantity as possible, then move to the next ask level if more quantity is needed. For a limit buy order at price P: match against all asks with price less than or equal to P, fill in price-time order, then add any remaining quantity to the bid side of the book as a resting order.


def match(self, incoming_order):
    trades = []
    book = self.asks if incoming_order.side == "BUY" else self.bids
    while incoming_order.qty > 0 and book:
        best_price = next(iter(book))
        # Check price condition for limit orders
        if incoming_order.is_limit:
            if incoming_order.side == "BUY" and best_price > incoming_order.price:
                break  # no more matchable orders
            if incoming_order.side == "SELL" and best_price  0 and incoming_order.is_limit:
        self.add_order(incoming_order)  # rest the unfilled portion
    return trades

Low-Latency Architecture

Production exchanges achieve microsecond matching latency through: (1) Single-threaded matching engine: no locks, no contention. One thread processes all orders sequentially, maintaining strict ordering guarantees. (2) Kernel-bypass networking (DPDK, RDMA): skip the OS network stack, read packets directly from the NIC into application memory. Eliminates 10-50 microseconds of system call overhead. (3) Pre-allocated memory pools: no malloc/free during order processing. All Order objects are pre-allocated at startup. (4) CPU pinning: the matching engine thread is pinned to a dedicated CPU core with no OS interrupts (isolcpus). (5) FPGA-based matching: some co-location services implement the entire matching engine in hardware (FPGA), achieving sub-microsecond latency that software cannot match.

Market Data Distribution

Every trade and order book change must be broadcast to all market participants simultaneously and in order. The exchange uses a sequenced multicast feed: all market data events are assigned a monotonically increasing sequence number by the matching engine. The sequencer publishes messages via UDP multicast to all subscribers. Receivers detect gaps in sequence numbers (packet loss) and request retransmission from a recovery service. Top-of-book data (best bid, best ask, last trade price) is published every 1ms; full depth-of-book is published on every change. Market data feeds are a separate, isolated subsystem from order entry — market data delay must not affect order processing latency.

Interview Tips

Order book data structure: sorted map of price levels, each with a FIFO queue of orders
Price-time priority is the fairness rule — mention it explicitly
Single-threaded matching engine eliminates locking — this is counterintuitive but correct
Integer ticks for prices, not floats — a critical correctness detail
Separate market data distribution from order processing — two independent subsystems

Frequently Asked Questions

What data structure is used for an order book in a stock exchange?

An order book uses a sorted map (TreeMap in Java, SortedDict in Python) keyed by price, where each price level holds a FIFO queue of orders at that price. The bid side is sorted descending by price (highest bid first); the ask side is sorted ascending by price (lowest ask first). This gives O(log N) insert, O(log N) cancel, and O(1) access to the best bid/ask where N is the number of price levels (typically small, under 1,000 for liquid instruments). Orders at the same price are queued in arrival order (price-time priority): better price executes first; equal price executes first-come-first-served. All prices are stored as integers (ticks) rather than floats to avoid floating-point rounding errors that could corrupt the matching logic.

How does a matching engine work in a stock exchange?

The matching engine processes orders sequentially (single-threaded, no locks) to ensure strict ordering. When a new order arrives: (1) For a market buy order, start at the best ask (lowest price on the sell side), fill as much quantity as possible, then move to the next ask level until filled or no more asks. (2) For a limit buy order at price P, match against all asks with price less than or equal to P in price-time order, then rest any unfilled quantity on the bid side. Each match produces a trade record (buyer ID, seller ID, price, quantity, timestamp). After matching, the engine broadcasts the trade and updated order book snapshot to all market participants via a sequenced multicast feed. The single-threaded design eliminates lock contention and produces a deterministic, auditable sequence of events.

Why do stock exchanges use a single-threaded matching engine?

A single-threaded matching engine eliminates all lock contention and produces a strictly ordered, deterministic sequence of events. If matching were multi-threaded, two concurrent orders at the same price level would require locking, creating contention overhead and the risk of incorrect ordering (violating price-time priority). The matching engine does no I/O during matching — orders arrive via a pre-read queue, trades are written to an outbound queue, and the actual persistence and broadcast happen asynchronously on separate threads. Since matching is pure CPU computation on in-memory data structures, a single modern CPU core can process millions of order operations per second — far more than typical exchange throughput. Multi-threading would add complexity and latency without meaningful throughput benefit.