System Design Interview: Design a Maps and Navigation System (Google Maps)

System Design Interview: Design a Maps and Navigation System (Google Maps)

Designing a maps and navigation system is asked at Uber, Lyft, Google, Apple, and Grab. It covers geospatial data storage, routing algorithms, real-time traffic, map tile serving, and ETA prediction.

Requirements Clarification

Functional Requirements

  • Display map tiles at different zoom levels
  • Search for locations (addresses, POIs)
  • Route from point A to point B (driving, walking, transit)
  • Turn-by-turn navigation with live traffic updates
  • ETA estimation
  • Report incidents (accidents, road closures)

Non-Functional Requirements

  • Scale: 1B users, 1M active navigators at peak
  • Map tile serving: <100ms
  • Routing: <500ms for cross-city routes
  • Traffic update freshness: <1 minute

Map Data Storage

Road Network: Graph Database

Roads and intersections modeled as a directed weighted graph:

  • Nodes: intersections, points of interest
  • Edges: road segments with properties (distance, speed limit, road type, one-way)
  • Edge weights: current travel time (updated with real-time traffic)

Storage: custom binary graph format for routing engine (not a general-purpose graph DB). Graph partitioned by geographic tile for distributed routing. OpenStreetMap (OSM) is the source data for many mapping systems.

Geospatial Indexing

Find nearest nodes, POIs, or road segments to a GPS coordinate:

  • Geohash: encode lat/lng as a base-32 string. Nearby geohashes are lexicographically similar. Use as index key for proximity queries. Resolution depends on string length (6 chars = ~1km).
  • Quadtree: recursively divide map into 4 quadrants. Navigate tree to find entities in a bounding box. Used for spatial indexing in PostGIS, S2.
  • S2 (Google): Hilbert curve-based spatial indexing. Maps 2D sphere to 1D curve preserving locality. Google Maps uses S2 cells internally.

Map Tile Serving

Maps are rendered as tiles (256×256 or 512×512 pixel PNG/WebP images) at different zoom levels (0-22). Tile URL: /tiles/{z}/{x}/{y}.png where z=zoom, x/y=tile coordinates.

Tile generation pipeline:
Raw OSM data -> Tile rendering (Mapnik/vector tiles) -> Tile storage (S3)
                                                              |
                                                        CDN edge cache
                                                              |
                                                         User browser

Tiles are pre-rendered and cached aggressively (TTL: days for base tiles, minutes for traffic overlay). Cache hit rate: 99%+ for popular areas. Vector tiles (MVT format) are smaller and rendered client-side with styling flexibility.

Routing Algorithm

Dijkstra vs A*

  • Dijkstra: finds shortest path in weighted graph. Explores all nodes with cost <= current best. O((V+E) log V) with priority queue. Too slow for continental routing (billions of nodes).
  • A* (A-star): Dijkstra + heuristic (straight-line distance to destination). Guides search toward destination, exploring fewer nodes. Optimal when heuristic is admissible (never overestimates).

Contraction Hierarchies (CH)

Used in production routing engines (OSRM, Valhalla, Google Maps). Preprocessing step contracts unimportant nodes, adding shortcut edges. Query: bidirectional Dijkstra on contracted graph. Route from New York to LA in milliseconds. 1000x faster than plain Dijkstra on real road networks.

Real-Time Traffic

Two sources of traffic data:

  • Probe data: GPS coordinates from phones of active navigators. Aggregate speed on road segments. Privacy: aggregate anonymized data, not individual tracks.
  • Incident reports: user reports (accidents, closures) + automated detection from probe speed drops.
Phone GPS -> Traffic Ingest API -> Kafka
                                    |
                            Stream Processor (Flink)
                            - Map-match GPS to road segment
                            - Compute average speed per segment
                            - Detect speed anomalies (incidents)
                                    |
                            Traffic DB (edge weight updates)
                                    |
                            Routing engine reads updated weights

ETA Prediction

Simple ETA: sum of (segment_distance / current_speed) along route. Production ETA uses ML: gradient boosted model trained on historical travel times. Features: time of day, day of week, weather, current traffic, route characteristics. ETA updated every 30s during navigation.

Search and Geocoding

  • Forward geocoding: address string to lat/lng. Elasticsearch index on POI and address data.
  • Reverse geocoding: lat/lng to address. Spatial index lookup (quadtree/S2) to find nearest road segment and building.
  • Autocomplete: prefix search on indexed place names. Redis sorted set or Elasticsearch edge n-grams.

Interview Tips

  • Lead with geospatial indexing (geohash, S2) – it is the foundational concept
  • Explain why plain Dijkstra is too slow and introduce Contraction Hierarchies
  • Discuss map tiles: pre-rendering, CDN caching, vector vs raster
  • Explain how probe GPS data becomes traffic edge weights (map-matching)
  • Know ETA = route time sum, updated with real-time traffic


Companies that ask this: Snap Interview Guide

Companies that ask this: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

Companies that ask this: Apple Interview Guide 2026: iOS Systems, Hardware-Software Integration, and iCloud Architecture

Companies that ask this: DoorDash Interview Guide

Companies that ask this: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems

Companies that ask this: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is geohash and how is it used in a maps system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Geohash encodes a latitude/longitude coordinate as a base-32 string. Nearby locations share long geohash prefixes, enabling proximity queries as prefix searches. A 6-character geohash represents an area of about 1.2km x 0.6km. For finding nearby restaurants: encode user location as geohash, search for all POIs with matching prefix. Shorter prefix = larger area (zoom out). Geohash is simple to implement but has edge cases at cell boundaries (two nearby points may have very different geohashes at the cell edge).”
}
},
{
“@type”: “Question”,
“name”: “Why is Dijkstra too slow for continent-scale routing and what is the alternative?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Dijkstra explores all nodes with cost up to the optimal path cost. On a continental road network with billions of nodes and edges, this is too slow (minutes per query). Contraction Hierarchies (CH) preprocess the graph by contracting unimportant nodes (adding shortcut edges for frequently used paths). Query uses bidirectional Dijkstra on the contracted graph, exploring only the most important nodes. This gives millisecond-level routing for continent-scale queries. A* with a good heuristic is an intermediate solution for smaller networks.”
}
},
{
“@type”: “Question”,
“name”: “How does real-time traffic data get integrated into routing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “GPS probe data from active navigation users is continuously sent to backend systems. A stream processor (Flink) map-matches raw GPS coordinates to road segments in the graph. It computes average speed per segment by aggregating GPS probe speeds. These speeds become the edge weights in the routing graph. The routing engine reads updated weights from a traffic database (updated every 30-60 seconds). User incident reports (accidents, closures) are also incorporated as edge weight increases or edge removals.”
}
},
{
“@type”: “Question”,
“name”: “How are map tiles served at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Map tiles are pre-rendered 256×256 pixel images at each zoom level (0-22). At zoom 0, the entire world is one tile. Each zoom level doubles the number of tiles in each dimension. Tiles are addressed by (zoom, x, y). Pre-rendered tiles are stored in S3 and served via CDN. Popular areas (cities) have near-100% CDN cache hit rates. Tile TTL: days for base tiles (roads rarely change), minutes for traffic overlays. Vector tiles (MVT format) are sent as raw data and rendered client-side, allowing dynamic styling and smaller transfer sizes.”
}
},
{
“@type”: “Question”,
“name”: “What geospatial data structure does Google Maps use internally?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Google Maps uses S2, a library based on Hilbert space-filling curves. S2 maps the Earth sphere to a cube, then uses Hilbert curves to create a 1D index where nearby 2D points remain nearby in the 1D ordering. S2 cells can represent regions at varying granularities. Benefits over geohash: no discontinuities at cell edges, better area coverage, hierarchical decomposition, and efficient set operations (union, intersection of regions). S2 is open-source and used for spatial indexing, coverage areas, and POI lookup.”
}
}
]
}

Scroll to Top