Question 1

What are the tradeoffs between Dijkstra, A*, and Contraction Hierarchies for traffic routing?

Accepted Answer

Dijkstra's algorithm finds the shortest path in O((V + E) log V) time but explores the graph in all directions, making it slow on continental road networks with hundreds of millions of nodes. A* adds a geographic heuristic (great-circle distance to destination) that biases exploration toward the goal, cutting query time by 2-5x on typical road graphs but offering no worst-case guarantee when the heuristic is inadmissible. Contraction Hierarchies (CH) preprocess the graph by iteratively contracting less-important nodes and adding shortcut edges. At query time, bidirectional Dijkstra runs on the contracted graph from both endpoints, meeting in the middle. CH achieves query times under 1 ms on full continental graphs — orders of magnitude faster than plain Dijkstra — at the cost of an offline preprocessing step (minutes to hours) that must be re-run when edge weights change significantly.

Question 2

How does Hidden Markov Model map matching work for GPS traces?

Accepted Answer

Raw GPS traces have positional error of 5-15 meters and sample at intervals of 1-5 seconds, making it impossible to determine which road segment a vehicle is on by proximity alone. HMM map matching models each GPS observation as an emission from a hidden state (the true road segment) and uses the Viterbi algorithm to find the most likely sequence of road segments. The emission probability is based on the GPS-to-segment distance (Gaussian distribution). The transition probability combines road connectivity — only transitions via connected segments are allowed — with a great-circle distance consistency check between consecutive GPS points. The result is a smooth, topologically valid path on the road network that accurately represents the vehicle's trajectory even through tunnels and dense urban canyons.

Question 3

How does the k-shortest paths algorithm work in a routing service?

Accepted Answer

K-shortest paths (KSP) finds the k lowest-cost paths between a source and destination, enabling the routing service to offer alternatives. Yen's algorithm is the standard approach: it iteratively computes the next-best path by finding deviations from previously found paths. For each candidate path, a spur node is selected and the graph is temporarily modified to exclude edges already used in the root sub-path, then Dijkstra finds the spur path. The root + spur = a candidate that is added to a min-heap. Eppstein's algorithm provides better worst-case complexity for large k. In production, KSP is typically run with k=3 to 5 and results are post-filtered by diversity heuristics (minimum Jaccard distance between paths) so alternatives are meaningfully different rather than slight variations of the same route.

Question 4

How do you ingest real-time traffic data into a routing service?

Accepted Answer

Real-time traffic data arrives from multiple sources: probe vehicles reporting GPS traces (first-party fleet data), HERE/TomTom traffic feeds (third-party), and roadside sensor APIs. Each source is consumed by a dedicated ingestor that normalizes data into a canonical traffic event format (segment ID, speed, free-flow ratio, timestamp, confidence). Events are written to a Kafka topic and consumed by a traffic fusion service that applies sensor fusion — weighted averaging by confidence and recency — to produce a current-speed estimate per segment. Segment speeds are written to a distributed key-value store (Redis or DynamoDB) with a short TTL. The routing engine reads segment costs from this store at query time, using free-flow speed as fallback when no recent data is available. Batch historical aggregation runs hourly to compute time-of-day speed profiles used for departure-time routing.

Question 5

How do you build ETA prediction with machine learning in a routing service?

Accepted Answer

Graph-based ETA (sum of segment travel times) is accurate on free-flowing roads but degrades at intersections, traffic signals, and merge points where queuing effects dominate. ML-based ETA stacks a gradient-boosted model (XGBoost or LightGBM) or a sequence model (LSTM/Transformer) on top of the graph estimate. Features include: graph ETA, time of day, day of week, current congestion index along the route, weather conditions, number of turns and signals, and historical ETA error for the same OD pair and time window. The model is trained on historical trip records where actual arrival time is the label. Inference runs per-route at query time with p99 latency under 5 ms. The model is retrained daily on a rolling 90-day window. Calibration is monitored via a holdout set; if mean absolute error exceeds a threshold, an alert fires and the fallback graph-only ETA is served.

Low Level Design: Real-Time Traffic Routing Service

Road Graph Representation

Shortest Path Algorithms

Dijkstra

A* Search

Contraction Hierarchies

Real-Time Traffic Weight Updates

ETA Calculation

Turn Restrictions

Map Matching

Route Alternatives

Frequently Asked Questions