Low Level Design: ETA Calculator Service

ETA Calculator Service: Low Level Design

Road Network Graph

Schema

Node (Intersection)
-------------------
id              BIGINT PK
lat             DOUBLE PRECISION
lng             DOUBLE PRECISION

Edge (Road Segment)
-------------------
id              BIGINT PK
from_node_id    BIGINT FK
to_node_id      BIGINT FK
distance_meters INT
base_travel_time_ms INT      -- at free-flow speed
road_type       ENUM('motorway','trunk','primary','secondary','residential')
speed_limit_kph SMALLINT

The graph is stored in PostgreSQL and loaded into memory (adjacency list) on service startup. Updates to the road network trigger a graph reload via a versioned snapshot.

Shortest Path Computation

A* search with Euclidean distance heuristic finds the minimum-travel-time path between origin and destination nodes. Dijkstra is used as a fallback for cases where the heuristic is inadmissible (e.g., unusual traffic patterns).

-- Adjusted edge weight used during search:
adjusted_ms = base_travel_time_ms * congestion_factor

-- congestion_factor fetched from:
EdgeTrafficFactor
-----------------
edge_id         BIGINT FK
time_bucket     SMALLINT    -- 0-167 (hour of week: 0=Mon 00:00, 167=Sun 23:00)
congestion_factor FLOAT     -- 0.5 (clear) to 2.0 (heavy traffic)

At query time the current time_bucket is computed from UTC wall clock + timezone offset, and congestion factors are looked up from an in-process cache (refreshed every 60 seconds from the DB).

Real-Time Traffic Ingestion

TrafficSensorReading
--------------------
edge_id         BIGINT
measured_at     TIMESTAMPTZ
observed_speed_kph FLOAT

Speed sensors (and probe vehicle GPS data) publish readings every 60 seconds to a Kafka topic. A stream processor computes the live congestion factor:

congestion_factor = speed_limit_kph / MAX(observed_speed_kph, 1)
                    clamped to [0.5, 2.0]

The result is written to EdgeTrafficFactor for the current time_bucket and also pushed to the in-process cache of all ETA service instances via a Redis key traffic:{edge_id}:{time_bucket}.

Historical Traffic Pattern Cache

168 time buckets (one per hour of the week) are pre-computed nightly from 90 days of historical sensor data. This gives a baseline congestion factor for any edge at any hour, smoothing out missing real-time data. The batch job runs at 02:00 UTC and writes results to EdgeTrafficFactor.

ML-Based ETA Model

Features

- route_distance_meters
- graph_travel_time_ms       (from A* with current congestion)
- time_of_day_sin / cos      (cyclical encoding)
- day_of_week_sin / cos
- weather_code               (clear/rain/snow/fog)
- special_event_flag         (stadium, holiday within 5km)

Output

- eta_p50_seconds   (median predicted ETA)
- eta_p90_seconds   (90th percentile — shown to user as "arrives by")

The model (gradient-boosted trees) is retrained daily using completed trips as ground truth. The serving layer loads the model artifact from object storage at startup and hot-reloads on new artifact availability without downtime.

API

Single ETA

POST /eta
{
  "origin_lat": 37.7749,
  "origin_lng": -122.4194,
  "dest_lat": 37.3382,
  "dest_lng": -121.8863
}

Response:
{
  "eta_seconds": 2640,
  "eta_p90_seconds": 3120,
  "distance_meters": 72400,
  "polyline": "encodedPolylineString..."
}

Batch ETA

POST /eta/batch
{
  "pairs": [
    { "origin_lat": ..., "origin_lng": ..., "dest_lat": ..., "dest_lng": ... },
    ...
  ]
}

Pairs are computed in parallel using a goroutine/thread pool. Useful for dispatch systems that need to evaluate multiple driver-to-rider assignments simultaneously.

Scalability Notes

The road graph fits in RAM (~2 GB for a metro area); loaded once per instance.
Congestion factor cache is refreshed every 60 seconds; stale data falls back to historical bucket.
ETA service instances are stateless; scale horizontally behind a load balancer.
For city-scale routing, the graph is partitioned into tiles; cross-tile queries use contraction hierarchies or CH-Dijkstra for sub-second performance.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What shortest-path algorithm is used for ETA calculation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A* with a Euclidean distance heuristic is the primary algorithm, using adjusted travel time (base time multiplied by a congestion factor) as the edge weight. Dijkstra serves as a fallback. For city-scale routing, contraction hierarchies (CH-Dijkstra) are used to achieve sub-second performance on large graphs.”
}
},
{
“@type”: “Question”,
“name”: “How is real-time traffic data incorporated into ETA calculations?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Speed sensors and probe vehicle GPS data publish readings every 60 seconds to a Kafka topic. A stream processor computes a congestion factor (speed_limit / observed_speed, clamped to 0.5–2.0) and writes it to an EdgeTrafficFactor table. ETA service instances cache these factors in-process and refresh every 60 seconds.”
}
},
{
“@type”: “Question”,
“name”: “What ML features are used to improve ETA predictions beyond graph travel time?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Features include route distance, graph-computed travel time, time-of-day and day-of-week (cyclically encoded), weather code, and a special event flag for stadiums or holidays near the route. The model (gradient-boosted trees) is retrained daily on completed trips and returns p50 and p90 ETA estimates.”
}
},
{
“@type”: “Question”,
“name”: “How does a batch ETA endpoint work for ride-sharing dispatch?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The POST /eta/batch endpoint accepts a list of origin/destination pairs and computes all ETAs in parallel using a thread or goroutine pool. This allows a dispatch system to evaluate multiple driver-to-rider assignments simultaneously and pick the optimal assignment based on predicted arrival times.”
}
}
]
}