Dispatch System Requirements
A ride-sharing dispatch system matches riders with nearby available drivers in real time. Core requirements: find available drivers within N km of a rider in < 100ms; send driver location updates from millions of drivers every 4 seconds; compute ETA accurately; implement surge pricing when supply is low; track trip state machine in real time; handle driver and rider app disconnections gracefully. Uber processes 15+ million trips per day across 10,000+ cities with 5 million drivers active globally.
Geospatial Indexing
Finding nearby drivers requires efficient geospatial queries. Three approaches:
# Approach 1: Geohash
# Geohash encodes a lat/lng into a string where common prefix = spatial proximity
# e.g., "dr5ru" = Manhattan, "dr5rv" = adjacent cell
# Driver lookup: compute geohash for rider location, query all drivers in same cell
# and adjacent 8 cells (to handle boundary effects)
import geohash2
def find_nearby_drivers(rider_lat: float, rider_lng: float, precision: int = 6) -> list:
rider_hash = geohash2.encode(rider_lat, rider_lng, precision=precision)
neighbors = geohash2.neighbors(rider_hash) # 8 adjacent cells
search_hashes = [rider_hash] + list(neighbors.values())
# Redis: store drivers as sets per geohash cell
# SUNION all cells to get candidate drivers
driver_ids = redis.sunion([f"drivers:{h}" for h in search_hashes])
return list(driver_ids)
# Approach 2: Redis GEOADD / GEOSEARCH (recommended)
# Redis GEO commands store coordinates and support radius queries natively
# Store driver location:
redis.geoadd("drivers:available", lng, lat, driver_id)
# Find drivers within 2km of rider:
drivers = redis.geosearch(
"drivers:available",
longitude=rider_lng,
latitude=rider_lat,
radius=2,
unit="km",
sort="ASC", # sort by distance
count=20, # return top 20 nearest
withcoord=True,
withdist=True
)
# Returns: [(driver_id, distance_km, (lng, lat)), ...]
# Approach 3: S2 Geometry (Google)
# S2 divides the sphere into hierarchical cells (30 levels)
# Uber H3 (similar but hexagonal grid) enables more uniform area cells
# Better than geohash for circular radius queries (geohash cells are rectangular)
Driver Location Update Pipeline
# Scale: 5M active drivers, each updating location every 4 seconds
# = 1.25M location updates per second
# Architecture:
# Driver app → WebSocket/HTTP → Location Service → Kafka → Location Processor → Redis GEO
# Location Service (WebSocket gateway):
# - Accepts WebSocket connections from driver apps
# - Receives location updates, publishes to Kafka topic "driver-locations"
# - Key = driver_id (ensures same driver's updates go to same Kafka partition = ordered)
# Location Processor (Kafka consumer):
# - Consumes driver location events
# - Updates Redis GEO index: GEOADD drivers:{city_id} lng lat driver_id
# - Also updates driver metadata: status, car_type, rating
# - Removes inactive drivers (no update in > 30 seconds)
# Geospatial sharding:
# A global Redis instance cannot handle all 5M drivers
# Shard by city_id: all drivers in city 42 → Redis shard #(42 % NUM_SHARDS)
# Rider search: determine city → query appropriate shard
Driver-Rider Matching
# Matching algorithm considerations:
# 1. ETA accuracy: nearest driver ≠ fastest driver (traffic, turns)
# Use routing engine (OSRM, Google Maps, Valhalla) to compute actual drive time
# 2. Car type filter: rider requests UberX → only match UberX drivers
# 3. Driver rating: prefer higher-rated drivers (above threshold)
# 4. Multi-rider batching (UberPool): match rider to existing trip if detour Driver:
# Filter candidates by car type and availability
eligible = [d for d in candidates if d.car_type == rider.car_type and d.status == "available"]
# Compute ETA for each candidate (batch routing API call)
etas = routing_service.batch_eta(
origins=[(d.lat, d.lng) for d in eligible],
destination=(rider.lat, rider.lng)
)
# Score each driver (lower is better)
def score(driver, eta):
return (
eta * 1.0 # ETA in minutes (primary factor)
- driver.rating * 2 # bonus for high rating
)
best = min(zip(eligible, etas), key=lambda x: score(x[0], x[1]))
return best[0]
# Dispatch flow:
# 1. Find candidates (Redis GEOSEARCH): < 20ms
# 2. Compute ETAs via routing engine: 50-100ms (batch)
# 3. Select best driver: < 1ms
# 4. Send trip offer to driver (push notification via WebSocket)
# 5. Driver has 15 seconds to accept; if timeout → reassign to next candidate
Trip State Machine
# Trip states and valid transitions:
# searching → driver_assigned → driver_en_route → pickup → in_progress → completed
# → cancelled (any state before pickup)
class TripStatus(Enum):
SEARCHING = "searching" # finding a driver
ASSIGNED = "driver_assigned" # driver accepted, heading to pickup
EN_ROUTE = "en_route" # driver confirmed pickup, en route to destination
COMPLETED = "completed"
CANCELLED = "cancelled"
# State stored in: Redis (for real-time queries) + PostgreSQL (for billing/history)
# Redis TTL: active trip data expires 24h after completion
# Real-time trip tracking:
# Driver app → updates GPS every 4s → Location Service → updates Redis GEO
# Rider app → polls trip status endpoint every 4s (or WebSocket push)
# → gets current driver location and ETA to destination
Surge Pricing
# Surge = supply/demand imbalance: more riders requesting than drivers available
def compute_surge_multiplier(city_id: int, zone_id: int) -> float:
# Count available drivers in zone
available_drivers = redis.zcard(f"drivers:{city_id}:{zone_id}:available")
# Count pending ride requests in zone (last 5 minutes)
pending_requests = redis.llen(f"requests:{city_id}:{zone_id}:pending")
if available_drivers == 0:
return 3.0 # max surge
supply_demand_ratio = available_drivers / max(pending_requests, 1)
if supply_demand_ratio >= 1.5:
return 1.0 # no surge (plenty of drivers)
elif supply_demand_ratio >= 1.0:
return 1.2 # light surge
elif supply_demand_ratio >= 0.7:
return 1.5 # moderate surge
elif supply_demand_ratio >= 0.4:
return 2.0 # high surge
else:
return 3.0 # extreme surge
# Surge zones: city divided into H3 hexagonal cells
# Each cell computes its own surge multiplier
# Smoothing: apply Gaussian blur across adjacent cells (avoid sharp boundaries)
# Display: show surge map to riders in-app
ETA Calculation
Accurate ETA requires more than straight-line distance. A routing engine computes the shortest path through the road network, accounting for turn restrictions, traffic, and road speed.
# ETA pipeline:
# 1. Real-time traffic data: aggregate driver GPS speeds on each road segment
# Uber collects speeds from all active driver phones → road network speed map
# 2. Map matching: snap driver GPS coordinates to road network (Viterbi HMM)
# 3. Routing engine (OSRM): Dijkstra or A* on road graph with live traffic weights
# 4. Machine learning adjustment:
# Historical data shows OSRM ETA is 20% optimistic on I-95 at 5pm on Fridays
# XGBoost model adjusts raw OSRM ETA based on time-of-day, day-of-week,
# weather, events nearby, historical patterns
# 5. Uncertainty bounds: P50 ETA (median) and P90 ETA (90th percentile)
# Show rider P50; internally use P90 for driver assignment decisions
Interview Questions
- Design the location update system for 5 million concurrent drivers
- How do you match riders to drivers when thousands of requests arrive simultaneously?
- Design the ETA system — how do you incorporate real-time traffic?
- How does surge pricing work and how do you prevent it from being gamed?
- How do you handle a driver app disconnect mid-trip?