ETA Calculator Service: Low Level Design
Road Network Graph
Schema
Node (Intersection)
-------------------
id BIGINT PK
lat DOUBLE PRECISION
lng DOUBLE PRECISION
Edge (Road Segment)
-------------------
id BIGINT PK
from_node_id BIGINT FK
to_node_id BIGINT FK
distance_meters INT
base_travel_time_ms INT -- at free-flow speed
road_type ENUM('motorway','trunk','primary','secondary','residential')
speed_limit_kph SMALLINT
The graph is stored in PostgreSQL and loaded into memory (adjacency list) on service startup. Updates to the road network trigger a graph reload via a versioned snapshot.
Shortest Path Computation
A* search with Euclidean distance heuristic finds the minimum-travel-time path between origin and destination nodes. Dijkstra is used as a fallback for cases where the heuristic is inadmissible (e.g., unusual traffic patterns).
-- Adjusted edge weight used during search:
adjusted_ms = base_travel_time_ms * congestion_factor
-- congestion_factor fetched from:
EdgeTrafficFactor
-----------------
edge_id BIGINT FK
time_bucket SMALLINT -- 0-167 (hour of week: 0=Mon 00:00, 167=Sun 23:00)
congestion_factor FLOAT -- 0.5 (clear) to 2.0 (heavy traffic)
At query time the current time_bucket is computed from UTC wall clock + timezone offset, and congestion factors are looked up from an in-process cache (refreshed every 60 seconds from the DB).
Real-Time Traffic Ingestion
TrafficSensorReading
--------------------
edge_id BIGINT
measured_at TIMESTAMPTZ
observed_speed_kph FLOAT
Speed sensors (and probe vehicle GPS data) publish readings every 60 seconds to a Kafka topic. A stream processor computes the live congestion factor:
congestion_factor = speed_limit_kph / MAX(observed_speed_kph, 1)
clamped to [0.5, 2.0]
The result is written to EdgeTrafficFactor for the current time_bucket and also pushed to the in-process cache of all ETA service instances via a Redis key traffic:{edge_id}:{time_bucket}.
Historical Traffic Pattern Cache
168 time buckets (one per hour of the week) are pre-computed nightly from 90 days of historical sensor data. This gives a baseline congestion factor for any edge at any hour, smoothing out missing real-time data. The batch job runs at 02:00 UTC and writes results to EdgeTrafficFactor.
ML-Based ETA Model
Features
- route_distance_meters
- graph_travel_time_ms (from A* with current congestion)
- time_of_day_sin / cos (cyclical encoding)
- day_of_week_sin / cos
- weather_code (clear/rain/snow/fog)
- special_event_flag (stadium, holiday within 5km)
Output
- eta_p50_seconds (median predicted ETA)
- eta_p90_seconds (90th percentile — shown to user as "arrives by")
The model (gradient-boosted trees) is retrained daily using completed trips as ground truth. The serving layer loads the model artifact from object storage at startup and hot-reloads on new artifact availability without downtime.
API
Single ETA
POST /eta
{
"origin_lat": 37.7749,
"origin_lng": -122.4194,
"dest_lat": 37.3382,
"dest_lng": -121.8863
}
Response:
{
"eta_seconds": 2640,
"eta_p90_seconds": 3120,
"distance_meters": 72400,
"polyline": "encodedPolylineString..."
}
Batch ETA
POST /eta/batch
{
"pairs": [
{ "origin_lat": ..., "origin_lng": ..., "dest_lat": ..., "dest_lng": ... },
...
]
}
Pairs are computed in parallel using a goroutine/thread pool. Useful for dispatch systems that need to evaluate multiple driver-to-rider assignments simultaneously.
Scalability Notes
- The road graph fits in RAM (~2 GB for a metro area); loaded once per instance.
- Congestion factor cache is refreshed every 60 seconds; stale data falls back to historical bucket.
- ETA service instances are stateless; scale horizontally behind a load balancer.
- For city-scale routing, the graph is partitioned into tiles; cross-tile queries use contraction hierarchies or CH-Dijkstra for sub-second performance.
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
See also: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems