Question 1

What are geohash precision levels and their corresponding cell sizes?

Accepted Answer

Geohash precision determines the geographic resolution of each encoded cell. Precision 1 encodes a cell of roughly 5,000 km x 5,000 km. Precision 4 yields ~40 km x 20 km cells, suitable for city-level queries. Precision 6 gives ~1.2 km x 0.6 km cells, useful for neighborhood searches. Precision 8 produces ~38 m x 19 m cells for point-of-interest proximity. Precision 12, the maximum, encodes cells of ~3.7 cm x 1.9 cm. Each additional character halves the cell dimensions alternately in longitude then latitude, doubling spatial resolution.

Question 2

Why is a 9-cell neighbor search necessary for geohash boundary edge cases?

Accepted Answer

A geohash cell shares borders with eight neighbors. Points near a cell boundary may be physically close to a query origin but fall in an adjacent cell with a completely different hash prefix. Searching only the origin cell misses these nearby points. The standard fix is to compute all eight neighbors of the query cell and search all nine cells (origin plus eight neighbors) in the spatial index. Most geohash libraries expose a neighbors() function that returns the eight surrounding hashes at the same precision level.

Question 3

How does the H3 hexagonal grid differ from geohash for geospatial indexing?

Accepted Answer

Geohash divides the world into rectangular cells using a Z-order (Morton) curve, which causes variable distortion at high latitudes and unequal neighbor distances at corners vs edges. H3, developed by Uber, uses a hierarchical hexagonal grid. Hexagons have only one distance to all six neighbors (equidistant centroids), which reduces the directional bias present in square grids. H3 supports 16 resolution levels (0–15), with resolution 9 cells covering roughly 0.1 km². H3 is better suited for uniform-distance radius queries and ride-sharing demand heat maps; geohash is simpler to implement and widely supported in databases like Redis and Elasticsearch.

Question 4

How does an R-tree traverse bounding rectangles to answer a spatial query?

Accepted Answer

An R-tree organizes spatial objects into a hierarchy of minimum bounding rectangles (MBRs). To answer a range or nearest-neighbor query the tree is traversed top-down from the root. At each internal node, the algorithm checks which child MBRs intersect the query rectangle; non-intersecting subtrees are pruned entirely. Leaf nodes contain the actual geometries, which are then tested for precise intersection or distance. Insertion and deletion maintain the invariant that each node's MBR tightly encloses all its children. Variants like R*-tree improve query performance by minimizing MBR overlap and area during insertion, at the cost of more expensive writes.

Question 5

What is the Haversine formula and when is it used in geospatial indexing?

Accepted Answer

The Haversine formula computes the great-circle distance between two points on a sphere given their latitudes and longitudes. It is used as the exact distance filter after a coarse candidate set has been retrieved from a spatial index (geohash, H3, R-tree, or a database spatial index). The formula: a = sin²(Δlat/2) + cos(lat1)·cos(lat2)·sin²(Δlon/2); distance = 2·R·arcsin(√a), where R ≈ 6,371 km. For most LLD interview purposes Haversine is accurate enough. Vincenty's formula or PostGIS geography types handle oblate-spheroid corrections when sub-meter accuracy is required over long distances.

Low Level Design: Geospatial Indexing System

Introduction

Geohash

H3 Hexagonal Grid (Uber)

R-tree Index

Proximity Query with Geohash

Bounding Box Search

Quad-tree

Frequently Asked Questions: Geospatial Indexing System Design