What is a geospatial index and why is it used in system design?

A geospatial index is a data structure that organizes geographic data to enable efficient spatial queries such as finding all points within a bounding box or nearest-neighbor lookups. It is used in system design to avoid full-table scans over millions of location records, reducing query latency from seconds to milliseconds for location-aware features like restaurant search or ride matching.

What are the most common geospatial indexing techniques compared in interviews?

The most common techniques are QuadTree, R-Tree, Geohash, and S2 cells. QuadTree recursively subdivides a 2D plane into four quadrants and is intuitive to explain. Geohash encodes latitude/longitude into a base-32 string where shared prefixes indicate proximity. S2 uses a Hilbert curve to map the sphere to a 1D index. R-Tree is the standard in relational databases like PostGIS. Interviewers typically expect you to justify your choice based on update frequency, query shape, and storage constraints.

How does Geohash enable proximity search, and what are its limitations?

Geohash encodes a location into a hierarchical string where longer strings represent finer precision. Nearby locations share a common prefix, so a proximity query becomes a prefix search. The main limitation is boundary anomalies: two points very close to each other across a Geohash cell boundary may have completely different hashes, requiring queries across all 8 neighboring cells to avoid missing results.

How would you design a scalable geospatial index for a service handling 100 million locations?

At 100 million locations, an in-memory QuadTree or Geohash shard per region is a common approach. Partition data by geohash prefix so each shard owns a geographic tile. Use a consistent hashing ring or a directory service to route queries to the correct shard. For high write throughput such as live driver locations, keep the index in Redis using sorted sets with geospatial commands (GEOADD, GEORADIUS), which provide O(N+log M) range queries. Persist to a database asynchronously and rebuild shards on failure.

Low Level Design: Geospatial Index Service

⏱ 4 min read

What Is a Geospatial Index Service?

A Geospatial Index Service stores, indexes, and queries geographic coordinates at scale. It powers features like restaurant search, ride-hailing dispatch, friend-nearby lists, and asset tracking dashboards. The core challenge is answering bounding-box and radius queries over millions of lat/lng points with single-digit-millisecond latency.

Data Model and Schema

Every indexed entity is stored with a canonical row that captures its position and a pre-computed geohash string:

CREATE TABLE locations (
    entity_id   BIGINT       NOT NULL,
    entity_type VARCHAR(32)  NOT NULL,  -- e.g. 'driver', 'restaurant', 'asset'
    lat         DOUBLE       NOT NULL,
    lng         DOUBLE       NOT NULL,
    geohash     CHAR(12)     NOT NULL,
    updated_at  BIGINT       NOT NULL,  -- Unix ms timestamp
    PRIMARY KEY (entity_id, entity_type),
    INDEX idx_geohash (geohash),
    INDEX idx_updated (updated_at)
);

Geohash encodes a latitude/longitude pair into a base-32 string where shared prefixes mean geographic proximity. A precision-6 geohash covers roughly a 1.2 km x 0.6 km cell, making prefix queries on the index fast and predictable.

For more complex polygon queries, a secondary spatial index using an R-tree is maintained either via PostGIS (GEOGRAPHY columns with GIST indexes) or an in-memory structure built on startup from a Redis sorted set.

Core Algorithm: Radius Search via Geohash Ring

Convert the query point (lat, lng) and radius r into a target geohash precision level p such that each cell diagonal is roughly equal to r.
Compute the cell containing the query point and its 8 neighbors, yielding a 9-cell cover of the search area.
Prefix-scan the geohash index for all 9 prefixes: SELECT * FROM locations WHERE geohash LIKE 'u4pruyd%' for each neighbor cell.
Post-filter results with the Haversine formula to discard candidates that fall outside the exact circle.
Rank by distance and return the top-K results.

For rectangular bounding-box queries the neighbor expansion step is replaced with a multi-prefix union covering all cells that intersect the box.

Quadtree Alternative

When the entity distribution is highly non-uniform (dense city centers, sparse rural areas) a quadtree outperforms geohash because it adapts cell granularity to point density. Each leaf node stores up to B points; when full it splits into four quadrants. Traversal depth is O(log n) and nearest-neighbor search prunes entire subtrees using min-corner distance.

Scalability Considerations

Sharding: Partition by top-level geohash prefix (e.g., first 2 characters) so each shard owns a geographic region. Cross-shard queries near region boundaries need a scatter-gather fan-out, but these are rare.
Read replicas: Radius queries are read-heavy. Route all reads to replicas with lag tolerance of <5 seconds; most location data is stale within seconds anyway.
Caching: Cache the 9-cell geohash neighbor lists in application memory; they are pure functions of the precision level and do not change.
Index warm-up: On startup, load the full geohash column into a Redis sorted set scored by geohash numeric encoding for sub-millisecond in-memory prefix scans.

Summary

A Geospatial Index Service is a thin spatial query engine over a geohash-indexed table, a Redis sorted set for hot reads, and optional R-tree/PostGIS for polygon workloads. Design interviews should highlight the geohash neighbor expansion pattern, the Haversine post-filter step, and shard-by-prefix to handle geographic hotspots without cross-region joins.