What Is a Geospatial Index Service?
A Geospatial Index Service stores, indexes, and queries geographic coordinates at scale. It powers features like restaurant search, ride-hailing dispatch, friend-nearby lists, and asset tracking dashboards. The core challenge is answering bounding-box and radius queries over millions of lat/lng points with single-digit-millisecond latency.
Data Model and Schema
Every indexed entity is stored with a canonical row that captures its position and a pre-computed geohash string:
CREATE TABLE locations (
entity_id BIGINT NOT NULL,
entity_type VARCHAR(32) NOT NULL, -- e.g. 'driver', 'restaurant', 'asset'
lat DOUBLE NOT NULL,
lng DOUBLE NOT NULL,
geohash CHAR(12) NOT NULL,
updated_at BIGINT NOT NULL, -- Unix ms timestamp
PRIMARY KEY (entity_id, entity_type),
INDEX idx_geohash (geohash),
INDEX idx_updated (updated_at)
);
Geohash encodes a latitude/longitude pair into a base-32 string where shared prefixes mean geographic proximity. A precision-6 geohash covers roughly a 1.2 km x 0.6 km cell, making prefix queries on the index fast and predictable.
For more complex polygon queries, a secondary spatial index using an R-tree is maintained either via PostGIS (GEOGRAPHY columns with GIST indexes) or an in-memory structure built on startup from a Redis sorted set.
Core Algorithm: Radius Search via Geohash Ring
- Convert the query point (lat, lng) and radius r into a target geohash precision level p such that each cell diagonal is roughly equal to r.
- Compute the cell containing the query point and its 8 neighbors, yielding a 9-cell cover of the search area.
- Prefix-scan the geohash index for all 9 prefixes:
SELECT * FROM locations WHERE geohash LIKE 'u4pruyd%'for each neighbor cell. - Post-filter results with the Haversine formula to discard candidates that fall outside the exact circle.
- Rank by distance and return the top-K results.
For rectangular bounding-box queries the neighbor expansion step is replaced with a multi-prefix union covering all cells that intersect the box.
Quadtree Alternative
When the entity distribution is highly non-uniform (dense city centers, sparse rural areas) a quadtree outperforms geohash because it adapts cell granularity to point density. Each leaf node stores up to B points; when full it splits into four quadrants. Traversal depth is O(log n) and nearest-neighbor search prunes entire subtrees using min-corner distance.
Scalability Considerations
- Sharding: Partition by top-level geohash prefix (e.g., first 2 characters) so each shard owns a geographic region. Cross-shard queries near region boundaries need a scatter-gather fan-out, but these are rare.
- Read replicas: Radius queries are read-heavy. Route all reads to replicas with lag tolerance of <5 seconds; most location data is stale within seconds anyway.
- Caching: Cache the 9-cell geohash neighbor lists in application memory; they are pure functions of the precision level and do not change.
- Index warm-up: On startup, load the full geohash column into a Redis sorted set scored by geohash numeric encoding for sub-millisecond in-memory prefix scans.
Summary
A Geospatial Index Service is a thin spatial query engine over a geohash-indexed table, a Redis sorted set for hot reads, and optional R-tree/PostGIS for polygon workloads. Design interviews should highlight the geohash neighbor expansion pattern, the Haversine post-filter step, and shard-by-prefix to handle geographic hotspots without cross-region joins.
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering