Low Level Design: Geospatial Index Service

What Is a Geospatial Index Service?

A Geospatial Index Service stores, indexes, and queries geographic coordinates at scale. It powers features like restaurant search, ride-hailing dispatch, friend-nearby lists, and asset tracking dashboards. The core challenge is answering bounding-box and radius queries over millions of lat/lng points with single-digit-millisecond latency.

Data Model and Schema

Every indexed entity is stored with a canonical row that captures its position and a pre-computed geohash string:

CREATE TABLE locations (
    entity_id   BIGINT       NOT NULL,
    entity_type VARCHAR(32)  NOT NULL,  -- e.g. 'driver', 'restaurant', 'asset'
    lat         DOUBLE       NOT NULL,
    lng         DOUBLE       NOT NULL,
    geohash     CHAR(12)     NOT NULL,
    updated_at  BIGINT       NOT NULL,  -- Unix ms timestamp
    PRIMARY KEY (entity_id, entity_type),
    INDEX idx_geohash (geohash),
    INDEX idx_updated (updated_at)
);

Geohash encodes a latitude/longitude pair into a base-32 string where shared prefixes mean geographic proximity. A precision-6 geohash covers roughly a 1.2 km x 0.6 km cell, making prefix queries on the index fast and predictable.

For more complex polygon queries, a secondary spatial index using an R-tree is maintained either via PostGIS (GEOGRAPHY columns with GIST indexes) or an in-memory structure built on startup from a Redis sorted set.

Core Algorithm: Radius Search via Geohash Ring

  1. Convert the query point (lat, lng) and radius r into a target geohash precision level p such that each cell diagonal is roughly equal to r.
  2. Compute the cell containing the query point and its 8 neighbors, yielding a 9-cell cover of the search area.
  3. Prefix-scan the geohash index for all 9 prefixes: SELECT * FROM locations WHERE geohash LIKE 'u4pruyd%' for each neighbor cell.
  4. Post-filter results with the Haversine formula to discard candidates that fall outside the exact circle.
  5. Rank by distance and return the top-K results.

For rectangular bounding-box queries the neighbor expansion step is replaced with a multi-prefix union covering all cells that intersect the box.

Quadtree Alternative

When the entity distribution is highly non-uniform (dense city centers, sparse rural areas) a quadtree outperforms geohash because it adapts cell granularity to point density. Each leaf node stores up to B points; when full it splits into four quadrants. Traversal depth is O(log n) and nearest-neighbor search prunes entire subtrees using min-corner distance.

Scalability Considerations

  • Sharding: Partition by top-level geohash prefix (e.g., first 2 characters) so each shard owns a geographic region. Cross-shard queries near region boundaries need a scatter-gather fan-out, but these are rare.
  • Read replicas: Radius queries are read-heavy. Route all reads to replicas with lag tolerance of <5 seconds; most location data is stale within seconds anyway.
  • Caching: Cache the 9-cell geohash neighbor lists in application memory; they are pure functions of the precision level and do not change.
  • Index warm-up: On startup, load the full geohash column into a Redis sorted set scored by geohash numeric encoding for sub-millisecond in-memory prefix scans.

Summary

A Geospatial Index Service is a thin spatial query engine over a geohash-indexed table, a Redis sorted set for hot reads, and optional R-tree/PostGIS for polygon workloads. Design interviews should highlight the geohash neighbor expansion pattern, the Haversine post-filter step, and shard-by-prefix to handle geographic hotspots without cross-region joins.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

Scroll to Top