System Design Interview: Design a Geospatial Service (Nearby Search)

System Design Interview: Design a Geospatial Service (Nearby Search)

Geospatial services power “find restaurants near me,” Uber driver lookup, and Airbnb property search. The core challenge is efficiently querying for points within a radius or bounding box from billions of stored locations. This guide covers the data structures and databases used in production.

Requirements

Functional: store location data (latitude, longitude) for entities (restaurants, drivers, properties), query all entities within radius R of a given point, query within a bounding box, support real-time updates (moving drivers), support metadata filtering (category, rating).

Non-functional: 1B locations stored, <100ms radius query, 100K writes/second (moving drivers), support for global scale.

Why Naive Approaches Fail

-- Naive: compute distance for all 1B rows:
SELECT id, lat, lng,
       ST_Distance(point(lat, lng), point(user_lat, user_lng)) AS dist
FROM locations
WHERE dist < 1000  -- meters
ORDER BY dist;
-- Problem: full table scan on 1B rows = too slow

We need a spatial index that eliminates the vast majority of rows before computing distances.

Geohash

Geohash encodes a (lat, lng) coordinate into a string of characters. Properties:

  • Characters are added to increase precision: “9q8y” covers a 40km² cell; “9q8yy” covers a 1.2km² cell; “9q8yyh” covers 150m²
  • Nearby locations share a common prefix (usually, with exceptions at cell boundaries)
  • 8 characters ≈ 20-meter precision. 6 characters ≈ 1.2km precision.
import geohash2 as geohash

# Encode location:
gh = geohash.encode(37.7749, -122.4194, precision=7)  # "9q8yy9m"

# Find all geohash cells covering a radius:
neighbors = geohash.neighbors("9q8yy9m")   # 8 surrounding cells
cells = [gh] + neighbors   # 9 cells to search

# Query: find all entities whose geohash starts with any of the 9 cells
SELECT * FROM locations WHERE geohash7 IN ('9q8yy9m', '9q8yy9n', ...)

Index on geohash prefix column: O(log n) B-tree lookup. The 9-cell approach handles the boundary problem (entities near a cell edge are in an adjacent cell but still within the radius).

Quadtree

A quadtree recursively divides a 2D area into quadrants. Each leaf node represents a region with <= K points (e.g., K=100). Querying: start at root, recurse into all quadrants that overlap the search circle. Advantages over geohash: dynamically adjusts to data density (urban areas have more splits, rural areas fewer).

class QuadTreeNode:
    def __init__(self, bounds):
        self.bounds = bounds      # (min_lat, min_lng, max_lat, max_lng)
        self.points = []          # up to K points if leaf
        self.children = None      # [NW, NE, SW, SE] if internal node

    def insert(self, point):
        if self.children:
            quadrant = self.get_quadrant(point)
            quadrant.insert(point)
        else:
            self.points.append(point)
            if len(self.points) > K:
                self.subdivide()

    def query_radius(self, center, radius, results):
        if not self.overlaps_circle(center, radius): return
        if self.children:
            for child in self.children:
                child.query_radius(center, radius, results)
        else:
            for p in self.points:
                if distance(center, p) <= radius:
                    results.append(p)

PostGIS (PostgreSQL Extension)

For production systems, PostGIS provides native spatial types and indexes:

-- Schema:
CREATE TABLE locations (
    id       BIGSERIAL PRIMARY KEY,
    name     TEXT,
    category TEXT,
    geom     GEOGRAPHY(POINT, 4326)   -- WGS84 coordinate system
);
CREATE INDEX idx_geom ON locations USING GIST(geom);

-- Insert:
INSERT INTO locations (name, geom) VALUES
  ('Philz Coffee', ST_MakePoint(-122.4194, 37.7749)::geography);

-- Radius query (1 km):
SELECT id, name,
       ST_Distance(geom, ST_MakePoint(-122.4194, 37.7749)::geography) AS dist_m
FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(-122.4194, 37.7749)::geography, 1000)
ORDER BY dist_m
LIMIT 20;
-- Uses GIST index — only examines nearby cells, not full table scan

Redis GEO for Real-Time Data

For frequently-updated locations (drivers, riders):

GEOADD drivers -122.4194 37.7749 "driver_123"
GEOADD drivers -122.4100 37.7800 "driver_456"

-- Find drivers within 5km:
GEORADIUS drivers -122.4194 37.7749 5 km ASC COUNT 10
→ ["driver_456", "driver_123"]

-- With distance:
GEORADIUS drivers -122.4194 37.7749 5 km ASC COUNT 10 WITHDIST
→ [["driver_456", 0.8], ["driver_123", 0.0]]

Redis GEO stores coordinates as geohash scores in a sorted set. GEORADIUS executes in O(N+log M) where N is results and M is total entries in the key. Sharding: partition by region/city to keep each key manageable.

Interview Tips

  • Lead with why a database index alone is insufficient — 2D coordinates cannot be efficiently indexed with a 1D B-tree
  • Geohash is the most interviewer-friendly answer — easy to explain, widely used
  • The 9-cell (geohash + 8 neighbors) approach solves the boundary problem — mention it
  • Redis GEO for real-time moving objects (drivers), PostGIS for static persistent data (restaurants)
  • For scale: shard by region, keep each Redis GEO key to <1M entries

  • Twitter Interview Guide
  • Snap Interview Guide
  • Airbnb Interview Guide
  • DoorDash Interview Guide
  • Lyft Interview Guide
  • Uber Interview Guide
  • {
    “@context”: “https://schema.org”,
    “@type”: “FAQPage”,
    “mainEntity”: [
    {
    “@type”: “Question”,
    “name”: “What is geohash and how does it handle the boundary problem?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Geohash encodes a (lat, lng) coordinate into a base32 string by recursively subdividing the globe: first split into east/west halves (longitude), then north/south (latitude), alternating. Longer strings = smaller area. A 6-character geohash covers ~1.2km × 0.6km. The boundary problem: two nearby points near a geohash cell boundary get different hashes and won't appear in the same prefix search. Solution: always query the target cell PLUS its 8 neighbors (a 3×3 grid). Redis GEORADIUS uses geohash internally and handles this automatically. For SQL, use the 9-cell approach: compute 9 geohash prefixes and query WHERE geohash LIKE prefix%.” }
    },
    {
    “@type”: “Question”,
    “name”: “When should you use a quadtree instead of geohash for geospatial indexing?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Quadtree: recursively subdivide space into four quadrants, splitting only when a cell exceeds a density threshold (e.g., >100 points). Cells adapt to data density — dense cities get small cells, sparse rural areas get large cells. Best for: non-uniform data distribution, variable-precision queries, in-memory spatial indexing (e.g., game engines, ride-sharing dispatch). Geohash: uniform fixed-precision grid. Best for: Redis GEO (built-in), simple proximity queries, caching by region, when you need string-sortable spatial keys. In most system design interviews, either works — explain the tradeoff and pick geohash for simplicity unless the interviewer asks about non-uniform density.” }
    },
    {
    “@type”: “Question”,
    “name”: “How does Redis GEORADIUS work and what are its performance characteristics?”,
    “acceptedAnswer”: { “@type”: “Answer”, “text”: “Redis GEO stores locations in a sorted set where the score is a 52-bit geohash of (lat, lng). GEOADD: encodes coordinates to geohash score, stores as sorted set member. GEORADIUS (deprecated in favor of GEOSEARCH): converts the center+radius to a set of covering geohash ranges, performs range queries on the sorted set, then filters by exact distance. Time complexity: O(N+log M) where N is members in the radius and M is total members. At 1M drivers updating every 4 seconds = 250K writes/sec. Shard by city: each city gets its own Redis GEO key. Reads: typically <1ms. For large-scale deployment, use Redis Cluster with city-based sharding.” }
    }
    ]
    }

    Scroll to Top