How do ride-sharing platforms predict ETA?

ETA prediction combines routing engine estimates (based on road graph edge weights) with machine learning models trained on historical trip data. Features include time of day, day of week, weather, current traffic conditions, and route-specific speed profiles. Uber's DeepETA model uses neural networks to correct routing engine baseline estimates.

What is the system design for an ETA prediction service?

An ETA prediction service typically includes a routing layer to get baseline path and distance, a feature store that provides real-time and historical features (traffic, weather, driver speed history), an ML inference service that applies a gradient-boosted or neural model, and a caching layer to avoid redundant computation for repeated origin-destination pairs.

How does Uber handle ETA prediction at scale?

Uber processes millions of ETA requests per minute. They use a tiered approach: fast routing-based estimates for initial display, followed by ML-corrected ETAs using their DeepETA model. Features are served from a low-latency feature store (Hive + Redis), and inference runs on a horizontally scaled prediction service with p99 latency targets under 100ms.

What are the main sources of error in ETA prediction?

Common ETA error sources include stale traffic data, road graph inaccuracies (missing turns, incorrect speeds), driver behavior variance, unpredictable events (accidents, weather), and pickup/dropoff delays not captured by pure routing. ML models reduce systematic bias but must be retrained frequently to adapt to changing conditions.

Low Level Design: ETA Prediction Service

⏱ 4 min read

What Is an ETA Prediction Service?

An ETA (Estimated Time of Arrival) prediction service answers the question: given a route from A to B, when will the traveler arrive? The challenge is that static graph weights are insufficient — ETA depends on current traffic, historical patterns, time of day, weather, and route-specific variability. A production ETA service combines graph-based travel time estimation with machine learning to produce calibrated arrival time distributions.

Data Model

Historical segment speeds: (edge_id BIGINT, day_of_week TINYINT, hour_of_day TINYINT, speed_p50 FLOAT, speed_p85 FLOAT, speed_p95 FLOAT) — percentile speeds by time bucket, precomputed from probe data.
Live traffic: (edge_id BIGINT, observed_at TIMESTAMP, travel_time_s FLOAT, source ENUM('probe','sensor','incident'))
Incident: (incident_id BIGINT, edge_id BIGINT, type ENUM('accident','construction','closure'), delay_factor FLOAT, starts_at TIMESTAMP, ends_at TIMESTAMP)
ETA request log: (request_id UUID, route_id UUID, predicted_eta TIMESTAMP, actual_arrival TIMESTAMP, error_s INT) — used to monitor model accuracy and trigger retraining.
Feature store: precomputed route-level features (total distance, number of turns, road class distribution, historical variance) stored in Redis or a feature store (Feast) for low-latency ML inference.

Core Algorithm: Hybrid Graph + ML

Step 1 — Base Travel Time

Sum edge-level travel times along the route using the best available speed estimate: live > historical percentile > speed limit. This gives a baseline ETA.

Step 2 — ML Correction Layer

A gradient boosted model (XGBoost or LightGBM) takes as input:

Baseline travel time from Step 1
Time of day and day of week
Historical variance of the route (coefficient of variation of past travel times)
Number and severity of active incidents on the route
Weather features (precipitation, visibility) from a weather API
Recent probe speed ratio: live speed / historical speed for key segments

The model outputs a corrected expected travel time and optionally a confidence interval. Training uses the ETA request log, pairing route features at request time with actual arrival times as labels. The model is retrained daily on a rolling 90-day window.

Step 3 — Uncertainty Quantification

For high-variance routes (busy highways, event venues) the service returns a range: best-case (p15 speed profile), expected (p50), and worst-case (p85). The UI surfaces this as a range (e.g., “35–50 min”) rather than a point estimate, improving user trust.

Failure Handling

ML model serving failure: fall back to the graph-only baseline ETA. Accuracy degrades but the service remains functional.
Stale live traffic: blend live data with historical using a staleness-weighted average; weight live data at 0 if older than 10 minutes.
Feature store unavailability: precompute a minimal feature set on the fly from the route graph; skip features requiring external lookups (weather).
Model drift: monitor mean absolute error (MAE) of ETA predictions against actuals in real time. Alert and trigger emergency retraining if MAE exceeds a threshold (e.g., 15% above baseline).

Scalability Considerations

ETA inference is fast (<10 ms) once features are assembled; the bottleneck is feature retrieval. Redis-backed feature store with sub-millisecond reads keeps p99 latency under 50 ms end-to-end.
Heavy probe data ingestion flows through Kafka; a Flink streaming job aggregates per-edge speeds and writes to the live traffic table every 30 seconds.
Batch retraining runs on a Spark cluster overnight; the model artifact is pushed to an artifact store (MLflow) and rolled out to inference servers via a canary deployment.
For very long routes (cross-country), the route is split into segments and ETA is computed per segment, then summed, reducing per-request feature volume.

Summary

A production ETA prediction service layers a graph-based travel time estimator with a machine learning correction model trained on historical arrival data. The key design decisions are: maintain live and historical speed profiles per edge, build a low-latency feature store, and always have a graph-only fallback. Continuous monitoring of prediction error against actuals closes the feedback loop and keeps accuracy high as traffic patterns evolve.