System Design Interview: Design a Geo-Proximity Service (Yelp / Nearby)

System Design: Geo-Proximity Service (Yelp / Google Maps Nearby)

A geo-proximity service finds places (restaurants, drivers, stores) near a given location. It powers Yelp’s “restaurants near me,” Uber’s driver matching, and Google Maps’ nearby search. The core challenge: efficiently querying a 2D space for points within a radius — a problem that traditional B-tree indexes handle poorly.

Requirements

Functional: Add/update business locations. Search for businesses within radius R of a point. Filter by category, rating. Sort by distance or relevance. Support 200M businesses globally.

Non-functional: Search latency <100ms, 100K QPS read, eventual consistency on writes, handle global distribution.

Why Standard Indexes Don’t Work

A query “find all restaurants within 1km of (lat=40.7, lng=-74.0)” translates to: WHERE lat BETWEEN 40.69 AND 40.71 AND lng BETWEEN -74.01 AND -73.99. A compound index on (lat, lng) only sorts by lat first — the lng range scan has poor selectivity. Actual 2D proximity queries require spatial indexing.

Geospatial Indexing Approaches

Approach 1: Geohash (Recommended for Interviews)

Geohash divides the world into a grid of cells using a base-32 encoding. Each character added to a geohash string halves the cell size:

  • 4 characters → ~40km × 20km cell
  • 6 characters → ~1.2km × 0.6km cell (good for “nearby” searches)
  • 8 characters → ~38m × 19m cell (precise location)

Key property: two locations that share a geohash prefix are close to each other. A standard B-tree index on the geohash column handles prefix queries efficiently.

def find_nearby(lat: float, lng: float, radius_km: float) -> list[Business]:
    precision = 6   # ~1km cells
    center_hash = geohash.encode(lat, lng, precision)

    # Include neighboring cells (8 neighbors + center)
    neighbor_hashes = geohash.neighbors(center_hash) + [center_hash]

    candidates = db.query(
        "SELECT * FROM businesses WHERE geohash6 LIKE ANY(%s)",
        [h + '%' for h in neighbor_hashes]
    )
    # Filter by exact distance
    return [b for b in candidates if haversine(lat, lng, b.lat, b.lng) <= radius_km]

Edge case: Cells near grid boundaries have different hashes but are geographically close. Always include 8 neighboring cells, not just the center cell. This ensures a point at the border of a cell is correctly matched with nearby points in the adjacent cell.

Approach 2: Quadtree

A quadtree recursively divides 2D space into 4 quadrants. Nodes split when they contain more than M points (M=100). Search: traverse from root, descend into any quadrant that overlaps the search circle, collect points from leaf nodes. Pros: dynamic insertion without hash collision. Cons: complex to implement distributed. Used in Uber’s H3 system (hexagonal quadtrees).

Approach 3: PostGIS / MySQL Spatial

Databases with spatial extensions use R-trees for geographic indexing. Query: SELECT * FROM businesses WHERE ST_Distance(location, ST_Point(-74.0, 40.7)) < 1000. Pros: SQL-based, no custom infrastructure. Cons: limited to single-node throughput; sharding spatial data is complex.

System Architecture

Client → API Gateway → Location Service
                            ↓
              [Geohash Cache (Redis)]
                    ↓ (cache miss)
              [Location DB (PostgreSQL + PostGIS)]
                            ↓
              [Business Service] → Business DB (Cassandra)

Read path: Compute geohash of search center → look up Redis for nearby business IDs (TTL=300s, rarely changes) → cache miss: query Location DB → fetch business details from Business Service → rank by distance/rating → return paginated results.

Write path: Business owner updates location → write to Location DB → update geohash index → invalidate Redis cache for affected geohash cells.

Caching Strategy

Cache key: nearby:{geohash6}:{category}. Value: list of business IDs sorted by distance. TTL: 5 minutes. This works because: (1) Most users search the same popular areas. (2) Business locations change infrequently. (3) The geohash cell covers ~1km² — any business in the cell fits one cache entry.

Hot spots (Times Square, tourist areas): higher QPS, same cache key — Redis handles millions of reads per key. No thundering herd because TTL is staggered (randomized ± 10% on write).

Distance Calculation

import math

def haversine(lat1, lng1, lat2, lng2) -> float:
    """Returns distance in kilometers."""
    R = 6371   # Earth radius in km
    phi1, phi2 = math.radians(lat1), math.radians(lat2)
    dphi = math.radians(lat2 - lat1)
    dlambda = math.radians(lng2 - lng1)
    a = math.sin(dphi/2)**2 + math.cos(phi1)*math.cos(phi2)*math.sin(dlambda/2)**2
    return 2 * R * math.asin(math.sqrt(a))

For most proximity searches, Euclidean distance approximation (treating lat/lng as Cartesian) is sufficient within 100km. Use Haversine for accurate distances, especially near poles or for large radii.

Handling Real-Time Location Updates (Driver Tracking)

For dynamic locations (drivers, delivery workers): drivers send GPS updates every 4 seconds. Use H3 hexagonal grid: encode location to H3 cell at resolution 9 (~174m). Store in Redis: SADD cell:{h3_cell} driver_id with 30-second TTL per driver entry. On match request: query center cell + k-ring of neighbors. Kafka ingests the ~125K location updates/second, consumer workers update Redis cells.

Interview Tips

  • Lead with geohash — it’s the simplest explanation of spatial indexing and directly maps to standard DB indexes.
  • Always mention the boundary cell problem and the neighbor-cell solution.
  • Distinguish static (Yelp) from dynamic (Uber) location use cases — different staleness requirements.
  • Haversine vs Euclidean: Euclidean is fine for <100km; mention Haversine for precision.
  • QuadTree vs Geohash: QuadTree handles density-varying distributions better (dense cities, sparse countryside); Geohash is simpler to implement and cache.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is a geohash and how is it used to find nearby locations?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A geohash is a hierarchical string encoding of a geographic coordinate. The world is recursively divided into cells: each character added to the hash string divides the cell further. At precision 6 (6-character hash), each cell is ~1.2km x 0.6km — a good granularity for “nearby” searches. Two locations that share a longer geohash prefix are geographically closer. To find nearby businesses: encode the search center to a geohash at precision 6, compute the 8 neighboring cells (N, NE, E, SE, S, SW, W, NW, plus center = 9 cells total), query the database for all businesses whose geohash starts with any of these 9 prefixes, then filter the candidates by exact Haversine distance. Geohash queries map to standard B-tree prefix scans (LIKE ‘dr5ru%’) — no special spatial index required. Cache key: the 9-cell set rarely changes, enabling aggressive caching (TTL=5min).”}},{“@type”:”Question”,”name”:”How does Uber use geospatial indexing for driver matching?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Uber uses H3, Uber’s open-source hexagonal hierarchical geospatial indexing system. H3 divides the Earth into hexagonal cells at multiple resolutions. At resolution 9, each cell is ~174m in diameter — granular enough to distinguish drivers on different blocks. Driver location updates (sent every 4 seconds) are encoded to an H3 cell. Redis stores: SADD cell:{h3_cell} driver_id with 30-second TTL per driver entry. For matching: encode the rider’s pickup to H3, query center cell + k-ring (concentric rings of neighboring hexagons) until sufficient drivers are found. Hexagons are preferred over squares because all 6 neighbors of a hexagon are equidistant from the center — square grids have diagonal neighbors that are farther away (sqrt(2) x cell size), causing non-uniform spatial coverage. Redis GEO commands (GEOADD, GEORADIUS) are an alternative for simpler implementations.”}},{“@type”:”Question”,”name”:”What is the boundary problem with geohash and how do you solve it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The boundary problem: two locations very close to each other may have completely different geohash strings if they are on opposite sides of a cell boundary. Example: two restaurants 50 meters apart, one in cell “dr5ru” and one in cell “dr5rs” — a single-cell query misses one of them. Solution: always query 9 cells — the center cell and all 8 immediate neighbors. The geohash.neighbors() function returns the 8 adjacent cells. This guarantees that any two points within 1km of each other (at precision 6) will be in cells that are either the same or adjacent — so one will appear in the 9-cell query. A follow-up edge case: at precision 6, cells near the equator are larger than cells near the poles. For polar regions, use lower precision (larger cells) or switch to H3 hexagonal cells which are more uniform in size globally.”}}]}

🏢 Asked at: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

🏢 Asked at: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems

🏢 Asked at: DoorDash Interview Guide

🏢 Asked at: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

🏢 Asked at: Snap Interview Guide

Scroll to Top