Introduction
Geospatial indexing enables efficient proximity queries — find all restaurants within 2km — over large datasets of lat/lng coordinates. A naive linear scan is O(N) and becomes too slow once datasets exceed tens of thousands of points. Spatial indexes reduce the search space by organizing coordinates into hierarchical structures that can be traversed in O(log N) or better.
Geohash
Geohash encodes a (lat, lng) pair into a base-32 string of variable length. Each additional character roughly doubles precision by subdividing the geographic cell. A geohash prefix corresponds to a rectangular geographic cell; nearby locations share common prefixes — with one important edge case: locations near a cell boundary may have completely different prefixes despite being physically close.
Geohash is stored as an indexed string column in the database. Proximity query: compute the geohash of the query point, then search the 9 surrounding cells (target cell + 8 neighbors) to handle the boundary edge case. Precision reference: length 6 ≈ 1.2km cell, length 7 ≈ 150m cell. Prefix search with WHERE geohash LIKE ‘abc%’ uses a B-tree index efficiently.
H3 Hexagonal Grid (Uber)
H3 divides the Earth into hexagonal cells at 16 resolution levels. Each cell is identified by a 64-bit integer cell_id. Hexagons have uniform area at each resolution level — unlike geohash rectangles — and equal distance to all 6 neighbors, eliminating the directional bias present in rectangular grids.
k-ring(cell_id, k) returns all cells within k rings of the target cell, providing a clean API for proximity queries. Uber uses H3 for surge pricing zones, driver heat maps, and demand forecasting. H3 resolution 9 cells are approximately 0.1 km² — suitable for city-block-level granularity.
R-tree Index
An R-tree is a tree data structure for spatial indexing. Leaf nodes hold bounding rectangles of spatial objects. Internal nodes hold the minimum bounding rectangle (MBR) of all their children. Range query: traverse the tree, descending only into nodes whose bounding rectangle intersects the query rectangle — branches that do not intersect are pruned entirely.
PostgreSQL’s PostGIS extension uses an R-tree implemented as a GiST index on geometry columns. It supports point, line, and polygon queries. Insertions update MBRs up the tree; the tree periodically rebalances. R-trees outperform geohash for arbitrary polygon queries because they handle non-rectangular shapes natively.
Proximity Query with Geohash
Example: find all drivers within 5km of (lat, lng). Steps: (1) compute geohash of query point at precision 5 (cell size ≈ 4.9km); (2) identify 9 surrounding geohash cells (target + 8 neighbors); (3) query index: WHERE geohash IN (cell1, cell2, …, cell9); (4) filter results to exact distance using the Haversine formula; (5) sort by distance; (6) return top K results.
The geohash index reduces candidates from millions of rows to hundreds, after which the Haversine filter runs cheaply in application memory. The 9-cell search is necessary because a point near a cell edge may have neighbors in adjacent cells that are physically closer than points within its own cell.
Bounding Box Search
A bounding box is defined by (min_lat, max_lat, min_lng, max_lng). Used for map viewport queries: show all POIs visible in the current map view. PostGIS query: SELECT * FROM locations WHERE ST_Within(geom, ST_MakeEnvelope(min_lng, min_lat, max_lng, max_lat, 4326)). The R-tree GiST index handles the envelope intersection efficiently without a full table scan.
Bounding box search is simpler than radius search because rectangles compose naturally with the tree structure. For map tiles, the viewport is a bounding box; the spatial index returns only objects within that box, avoiding scan of the entire dataset regardless of total size.
Quad-tree
A quad-tree divides 2D space into four quadrants recursively. Each quadrant is subdivided until cells contain fewer than a threshold number of points. Unlike geohash, quad-trees are dynamic: they adapt to data density, so dense urban areas are subdivided into finer cells while sparse rural areas remain coarse. This makes quad-trees suitable for non-uniform data distributions.
Google Maps uses quad-trees for tile rendering region management. Spatial queries traverse the tree, descending into quadrants that intersect the query region. Insertion finds the appropriate leaf node, inserts the point, and splits the node if it exceeds the threshold — splitting propagates up the tree as needed.
Frequently Asked Questions: Geospatial Indexing System Design
Q: What are geohash precision levels and their corresponding cell sizes?
A: Geohash precision determines the geographic resolution of each encoded cell. Precision 1 encodes a cell of roughly 5,000 km x 5,000 km. Precision 4 yields ~40 km x 20 km cells, suitable for city-level queries. Precision 6 gives ~1.2 km x 0.6 km cells, useful for neighborhood searches. Precision 8 produces ~38 m x 19 m cells for point-of-interest proximity. Precision 12, the maximum, encodes cells of ~3.7 cm x 1.9 cm. Each additional character halves the cell dimensions alternately in longitude then latitude, doubling spatial resolution.
Q: Why is a 9-cell neighbor search necessary for geohash boundary edge cases?
A: A geohash cell shares borders with eight neighbors. Points near a cell boundary may be physically close to a query origin but fall in an adjacent cell with a completely different hash prefix. Searching only the origin cell misses these nearby points. The standard fix is to compute all eight neighbors of the query cell and search all nine cells (origin plus eight neighbors) in the spatial index. Most geohash libraries expose a neighbors() function that returns the eight surrounding hashes at the same precision level.
Q: How does the H3 hexagonal grid differ from geohash for geospatial indexing?
A: Geohash divides the world into rectangular cells using a Z-order (Morton) curve, which causes variable distortion at high latitudes and unequal neighbor distances at corners vs edges. H3, developed by Uber, uses a hierarchical hexagonal grid. Hexagons have only one distance to all six neighbors (equidistant centroids), which reduces the directional bias present in square grids. H3 supports 16 resolution levels (0–15), with resolution 9 cells covering roughly 0.1 km². H3 is better suited for uniform-distance radius queries and ride-sharing demand heat maps; geohash is simpler to implement and widely supported in databases like Redis and Elasticsearch.
Q: How does an R-tree traverse bounding rectangles to answer a spatial query?
A: An R-tree organizes spatial objects into a hierarchy of minimum bounding rectangles (MBRs). To answer a range or nearest-neighbor query the tree is traversed top-down from the root. At each internal node, the algorithm checks which child MBRs intersect the query rectangle; non-intersecting subtrees are pruned entirely. Leaf nodes contain the actual geometries, which are then tested for precise intersection or distance. Insertion and deletion maintain the invariant that each node’s MBR tightly encloses all its children. Variants like R*-tree improve query performance by minimizing MBR overlap and area during insertion, at the cost of more expensive writes.
Q: What is the Haversine formula and when is it used in geospatial indexing?
A: The Haversine formula computes the great-circle distance between two points on a sphere given their latitudes and longitudes. It is used as the exact distance filter after a coarse candidate set has been retrieved from a spatial index (geohash, H3, R-tree, or a database spatial index). The formula: a = sin²(Δlat/2) + cos(lat1)·cos(lat2)·sin²(Δlon/2); distance = 2·R·arcsin(√a), where R ≈ 6,371 km. For most LLD interview purposes Haversine is accurate enough. Vincenty’s formula or PostGIS geography types handle oblate-spheroid corrections when sub-meter accuracy is required over long distances.
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering