Low Level Design: Geo-Spatial Indexing

⏱ 4 min read

Geo-spatial indexing enables efficient queries on geographic data: finding restaurants within 1km, matching drivers to riders within 500m, showing events near the user’s location. Naive approaches (scanning all locations and computing Haversine distance) are O(N) and unscalable. Geo-spatial indexes — Geohash, R-tree, S2 — reduce location queries to sub-millisecond lookups across millions of points.

Geohash

Geohash encodes latitude/longitude into a short string by recursively dividing the Earth into a grid. At precision 1 (1 character), the world is divided into 32 cells (~5000km x 5000km). At precision 6 (6 characters), cells are ~1.2km x 0.6km. At precision 9, cells are ~5m x 5m. Nearby locations share a common prefix: two points in the same geohash cell share all characters; adjacent cells share most characters. Proximity query: find all points with geohash prefix “u4pruyd” (the cell containing the query point) and all 8 neighboring cells. Store geohash as a string column in a B-tree index and query: WHERE geohash LIKE ‘u4pru%’. Redis GEOADD / GEORADIUSBYMEMBER uses geohash internally.

Redis Geo Commands

Redis provides native geo commands built on geohash and sorted sets. GEOADD locations {lng} {lat} {member}: add a location. GEODIST locations {member1} {member2} km: distance between two members. GEOSEARCH locations FROMMEMBER {member} BYRADIUS 1 km ASC COUNT 10: find the nearest 10 members within 1km, sorted by distance. The underlying sorted set score is the geohash integer value, enabling O(log N) range queries. For ride-sharing, store driver locations and use GEOSEARCH to find the nearest available drivers. GEOADD on every driver GPS update (every 4 seconds) → sorted set sorted by geohash → GEOSEARCH finds nearby drivers in O(log N + K).

R-Tree Index

R-trees (Rectangular Trees) are the standard spatial index in relational databases. PostgreSQL with PostGIS, MySQL 8, SQLite (with SpatiaLite) use R-trees for geometry columns. An R-tree groups nearby objects into bounding rectangles, recursively. Queries: find all restaurants within a polygon, find the nearest point to a location (KNN query), and spatial joins (find all rides that intersect a geographic region). PostGIS: CREATE INDEX ON locations USING GIST (geom); ST_DWithin(geom, ST_MakePoint(lng, lat)::geography, 1000) — find all points within 1000 meters using the spatial index. R-trees handle complex geometries (polygons, lines) beyond what geohash supports.

Google S2 Library

Google S2 maps the Earth’s surface onto a unit cube face and subdivides it into a quadtree. S2 cells are hierarchical: level 0 is one of 6 cube faces (continent-scale), level 30 is ~1cm². S2 cells have consistent area at each level (unlike geohash, which has variable cell shapes). S2 region covering: given a circle or polygon, compute the minimal set of S2 cell IDs that cover the region — then query the index for points whose S2 cell ID falls within those ranges. Used by Google Maps, Foursquare, and ride-sharing companies. Better accuracy for proximity queries (fewer edge cases at cell boundaries) than geohash at the cost of higher implementation complexity.

Indexing Moving Objects

Driver locations update continuously (every 4 seconds for a ride-sharing fleet). A database index optimized for reads (R-tree, geohash) is expensive to update at high write rates — every driver position update requires index modification. Pattern: write location updates to an in-memory store (Redis geo sorted set) optimized for point updates. The location database (PostgreSQL + PostGIS) is updated less frequently or used for historical queries. Read path: GEOSEARCH on Redis for real-time nearby queries. Write path: GEOADD on Redis (O(log N)); async sync to the database for persistence. This separates the real-time location index (Redis) from the persistent location history (database).

Proximity Matching Architecture

A proximity matching system (rider requests a ride → find nearest available driver) must complete in under 1 second. Architecture: drivers publish location updates to a geo index service (Redis geo); when a rider requests a ride, the matching service queries GEOSEARCH for the nearest N available drivers within M km; the matching service contacts each driver candidate in parallel via gRPC and offers the ride; the first driver to accept is matched; unmatched candidates are tried in order. Sharding: for city-scale deployments, one Redis instance per city (partition by geographic region) — no cross-region geo queries needed. Monitor: match latency (p99 N% indicates supply shortage in a region).

Geofencing

Geofencing determines whether a location is inside or outside a defined geographic boundary (airport pickup zone, delivery area, restricted zone). Point-in-polygon test: ray casting algorithm — cast a horizontal ray from the point and count how many polygon edges it crosses (odd = inside, even = outside). For simple convex regions, use bounding box pre-filter (reject obvious misses before expensive polygon test). Index geofences for “which geofences does this point fall in?”: store each geofence as a PostGIS polygon; spatial index query ST_Within(point, geofence_polygon). For real-time geofence entry/exit events, compare the current and previous location against all relevant geofences on each location update — trigger events when status changes from outside to inside.