Question 1

What is geohash and how is it used in a maps system?

Accepted Answer

Geohash encodes a latitude/longitude coordinate as a base-32 string. Nearby locations share long geohash prefixes, enabling proximity queries as prefix searches. A 6-character geohash represents an area of about 1.2km x 0.6km. For finding nearby restaurants: encode user location as geohash, search for all POIs with matching prefix. Shorter prefix = larger area (zoom out). Geohash is simple to implement but has edge cases at cell boundaries (two nearby points may have very different geohashes at the cell edge).

Question 2

Why is Dijkstra too slow for continent-scale routing and what is the alternative?

Accepted Answer

Dijkstra explores all nodes with cost up to the optimal path cost. On a continental road network with billions of nodes and edges, this is too slow (minutes per query). Contraction Hierarchies (CH) preprocess the graph by contracting unimportant nodes (adding shortcut edges for frequently used paths). Query uses bidirectional Dijkstra on the contracted graph, exploring only the most important nodes. This gives millisecond-level routing for continent-scale queries. A* with a good heuristic is an intermediate solution for smaller networks.

Question 3

How does real-time traffic data get integrated into routing?

Accepted Answer

GPS probe data from active navigation users is continuously sent to backend systems. A stream processor (Flink) map-matches raw GPS coordinates to road segments in the graph. It computes average speed per segment by aggregating GPS probe speeds. These speeds become the edge weights in the routing graph. The routing engine reads updated weights from a traffic database (updated every 30-60 seconds). User incident reports (accidents, closures) are also incorporated as edge weight increases or edge removals.

Question 4

How are map tiles served at scale?

Accepted Answer

Map tiles are pre-rendered 256x256 pixel images at each zoom level (0-22). At zoom 0, the entire world is one tile. Each zoom level doubles the number of tiles in each dimension. Tiles are addressed by (zoom, x, y). Pre-rendered tiles are stored in S3 and served via CDN. Popular areas (cities) have near-100% CDN cache hit rates. Tile TTL: days for base tiles (roads rarely change), minutes for traffic overlays. Vector tiles (MVT format) are sent as raw data and rendered client-side, allowing dynamic styling and smaller transfer sizes.

Question 5

What geospatial data structure does Google Maps use internally?

Accepted Answer

Google Maps uses S2, a library based on Hilbert space-filling curves. S2 maps the Earth sphere to a cube, then uses Hilbert curves to create a 1D index where nearby 2D points remain nearby in the 1D ordering. S2 cells can represent regions at varying granularities. Benefits over geohash: no discontinuities at cell edges, better area coverage, hierarchical decomposition, and efficient set operations (union, intersection of regions). S2 is open-source and used for spatial indexing, coverage areas, and POI lookup.

System Design Interview: Design a Maps and Navigation System (Google Maps)

System Design Interview: Design a Maps and Navigation System (Google Maps)

Requirements Clarification

Functional Requirements

Non-Functional Requirements

Map Data Storage

Road Network: Graph Database

Geospatial Indexing

Map Tile Serving

Routing Algorithm

Dijkstra vs A*

Contraction Hierarchies (CH)

Real-Time Traffic

ETA Prediction

Search and Geocoding

Interview Tips