Order-Matching Engine Internals: How Exchanges Match Trades

The order-matching engine is the heart of every electronic exchange. It’s the component that takes incoming orders, applies the matching rules (price-time priority, pro-rata, etc.), and produces trades. For SWE interviews at trading firms (especially exchanges, ECNs, and HFT firms that build their own matching simulators for backtesting and research), understanding matching engine internals is essential. The data structures, the concurrency model, the determinism guarantees, the failure modes — these come up in systems-design interviews and in real day-to-day work.

What a Matching Engine Does

A matching engine receives orders from market participants. Each order has:

Side (buy or sell)
Quantity
Price (for limit orders) or “any” (for market orders)
Order type (limit, market, stop, peg, etc.)
Time-in-force (day, IOC, FOK, GTC, etc.)
Other attributes (display vs hidden, reserve quantity, post-only flags)

The engine matches orders against the existing book according to its rules:

Price priority: better-priced orders match first.
Time priority (FIFO): at a given price, earlier orders match first. Used at most equity exchanges.
Pro-rata: at a given price, fills are allocated proportionally to size. Used at some futures exchanges (e.g., CME for some products).
Hybrid: mixes of priority types; common at some venues.

The Core Data Structure

An efficient matching engine needs:

Fast lookup of the best bid and best ask
Fast insertion of new orders at any price level
Fast removal of orders (cancellations, fills)
Iteration through the queue at a given price level (for matching)

Common architecture:

Price levels

The set of all bid prices is a sorted structure (typically a sorted map keyed by price, or a vector indexed by price-tick). The best bid is the highest bid level; the best ask is the lowest ask level. Operations on the best level are O(1) amortized.

Order queues per level

At each price level, orders are stored in a queue (FIFO doubly-linked list for time-priority engines). Each order’s queue position is known so cancellations are O(1).

Order lookup

For cancellation by order ID, a hash map from order ID to a pointer to the order’s node in the queue. This makes cancel O(1).

Memory layout

For performance, orders are stored in pre-allocated arrays / object pools to avoid allocator overhead and improve cache behavior. The hash map and linked list use indices into the order pool rather than raw pointers.

Data structure choice depends heavily on the venue’s needs. HFT-style matching simulators optimize for cache locality; production exchange engines also require auditability, durability, and replay capability.

The Matching Algorithm (Price-Time Priority)

When a new order arrives:

Determine if it’s marketable (i.e., crosses the spread or is a market order).
If marketable, match against the opposite side starting from the best price:
- Take the order at the front of the best level’s queue.
- Match as much as possible (the smaller of the incoming and resting quantities).
- Generate a trade.
- If the resting order is fully filled, remove it from the queue.
- If the incoming order still has remaining quantity, continue with the next resting order.
- If the level is exhausted, move to the next-best level.
- Stop when the incoming order is fully filled or no more marketable resting orders exist.
If the incoming order is a limit order with remaining unfilled quantity, add it to its side of the book at the appropriate price level (back of queue at that level).
If it’s a market order with remaining unfilled quantity, cancel the residual or queue it depending on time-in-force rules.

Cancellation: locate the order via hash map; remove from its queue; remove the price level if the queue is empty.

Concurrency Model

Matching engines fall into two main concurrency patterns:

Single-threaded

All matching happens on one thread. Other threads (network I/O, audit logging, persistence) are separate but never touch the order book directly. This gives strong determinism and simplifies reasoning. Performance is excellent because there’s no locking overhead. Many production exchanges use this model.

Sharded

Different instruments are matched on different threads or different physical hosts. Cross-instrument matching (rare but exists) requires coordination. This sharding scales horizontally with the number of instruments.

Multi-threaded matching of a single instrument is rare because the synchronization overhead defeats the latency benefits. The standard pattern is single-threaded matching with multi-threaded support layers.

Determinism

Matching engines must be deterministic: given the same input sequence, they produce the same output. This matters for:

Auditability: regulators may need to replay matching decisions.
Disaster recovery: a backup engine receiving the same input must produce the same state.
Testing: replaying historical sessions during development must give consistent results.

Sources of non-determinism that must be controlled:

Floating-point order in summations (use integers or fixed-point for prices and quantities).
Hash map iteration order (use ordered structures where order matters).
Multi-threading (single-threaded matching avoids this).
Random ID generation (use deterministic ID schemes).

Common Interview Questions

Implement a basic matching engine

“Write a price-time priority matching engine for a single symbol.” Walk through: data structures (sorted map of price levels, FIFO queues, order lookup hash map), order acceptance logic, matching algorithm, cancellation. Strong candidates discuss the trade-offs of different data structures (tree-based vs price-indexed array).

Discuss algorithmic complexity

“What’s the complexity of inserting a new order, matching, and cancelling?” New order at a new price level: O(log n) for tree-based, O(1) for price-indexed. Matching: O(k) where k is the number of orders consumed. Cancellation: O(1) with hash map. Discuss memory vs latency trade-offs.

Handle pro-rata matching

“How would you implement pro-rata matching?” Different from price-time: at the matched level, compute total quantity, allocate fills proportionally, handle remainders. Discuss subtleties: minimum lot sizes, rounding conventions, leftover quantity disposition.

Discuss self-trade prevention

“A market participant submits both a buy and sell that would match against each other. What happens?” Common rules: cancel newer, cancel older, or cancel both. Implementation: track participant identity; check at matching time. Discuss why this matters (regulatory and gaming concerns).

Handle market data publication

“After matching, how do you publish updates to subscribers?” Generate trade messages, generate book-change messages, sequence them, distribute via multicast or stream. Discuss latency: publication should happen as soon as possible after matching but must be after the participant who triggered the match has been confirmed.

Discuss disaster recovery

“How do you recover after a matching engine crash?” Persistent log of all input messages with sequence numbers. After crash, replay log into a fresh engine to recover state. Discuss snapshot frequency, log sizes, recovery time. Strong candidates discuss zero-data-loss guarantees and the trade-offs.

Frequently Asked Questions

How important is matching engine knowledge for non-exchange roles?

Important even for non-exchange roles. HFT firms build matching simulators for backtesting; trading firms build simulators for new product testing; market makers’ research tools include order book replay infrastructure. The data structures and algorithms transfer. For interviews at HRT, Jump, Citadel Securities, and similar, expect questions about matching engine internals even if you wouldn’t be writing one in production.

What’s the typical latency for a production matching engine?

Major exchanges target single-digit microseconds for matching latency (the time between order arrival and match output). NASDAQ, NYSE, CME, and major options exchanges all operate in this range. Optimized HFT-built simulators can run faster (sub-microsecond) but production exchanges have additional overhead from auditability, durability, and regulatory features. The gap between best-case research-tool performance and production exchange performance is usually 5–20x.

Why do exchanges use single-threaded matching?

Determinism and simplicity. Single-threaded matching is trivially deterministic; the order of message arrival fully determines the output. Multi-threaded matching introduces non-determinism (which thread processes which order in what order) that’s hard to reproduce. The performance cost is small — modern CPUs can match millions of messages per second on a single core — while the regulatory and operational benefits are large.

What happens when a matching engine is overwhelmed?

Most exchanges implement queueing with backpressure: incoming messages are queued; if the queue grows beyond a threshold, the matching engine signals participants to slow down (pause new orders, restrict to cancels only). Some exchanges have explicit rate limits per participant. In extreme cases, exchanges have implemented circuit breakers (halt trading if message volume exceeds capacity). The Knight Capital incident and several flash crashes have led to industry standards for exchange behavior under load.

Should I build a matching engine for practice?

Yes, if you have time and target firms care about it. A simple price-time priority engine for a single symbol can be built in a weekend in C++ or Python. Add features incrementally: cancellation, multiple symbols, market-data publication, persistence, replay. The exercise builds intuition about the data structures, edge cases, and performance trade-offs that come up in interviews. Bonus: it’s a portfolio piece you can discuss in interviews.