System Design Interview: Location-Based Service (Yelp / Google Maps)

⏱ 8 min read

Problem Overview

Design a location-based service that allows users to find nearby businesses (Yelp) or navigate between locations (Google Maps). Two sub-problems: (1) Proximity search — given a location, find the N closest businesses within radius R. (2) Routing — given origin and destination, find the fastest route accounting for real-time traffic. Both are asked separately and together in senior engineering interviews at Google, Uber, Lyft, DoorDash, and Yelp.

Geospatial Indexing: Geohashing

A geohash encodes a latitude/longitude into a string by recursively dividing the world into a grid. Each additional character doubles the precision. Geohash “9q8yy” (length 5) represents a 4.9km x 4.9km cell; “9q8yyzn” (length 7) represents a 76m x 76m cell. The key insight: locations with the same geohash prefix are geographically close. Nearby search: (1) Compute geohash of search center at appropriate precision. (2) Find the 8 neighboring cells (the searched cell plus 8 surrounding cells). (3) Query the database for all businesses with geohash starting with those 9 prefixes. This is a simple string prefix query — efficient with a standard B-tree index on the geohash column.


import geohash2 as geohash

def find_nearby(lat, lon, radius_km):
    # Choose precision based on radius
    # 5 chars = 4.9km x 4.9km, 6 chars = 1.2km x 0.6km
    precision = 5 if radius_km > 2 else 6

    center_hash = geohash.encode(lat, lon, precision=precision)
    neighbors = geohash.neighbors(center_hash)
    search_hashes = [center_hash] + list(neighbors.values())

    # Single database query with OR across 9 geohash prefixes
    businesses = db.query(
        "SELECT * FROM businesses WHERE geohash_prefix = ANY(%s)",
        search_hashes
    )
    # Filter by exact distance (geohash has bounding box, not circle)
    return [b for b in businesses if haversine(lat, lon, b.lat, b.lon) <= radius_km]

Quadtree: The Alternative Approach

A quadtree recursively divides a 2D space into four quadrants. Each leaf node holds up to K businesses (e.g., 100). When a leaf is full, it splits into four children. The tree is deep in dense urban areas (Manhattan) and shallow in rural areas. Search: traverse from root, following quadrants that intersect the search circle, collect businesses from matching leaf nodes. Quadtrees are dynamic (insert/delete without rebuilding) and density-adaptive. They are used internally at Uber and Lyft for driver location tracking. Geohashing is simpler to implement and better for database-backed storage; quadtrees are better for in-memory structures with frequent updates.

Business Search and Ranking

Nearby search returns a candidate set — 50-200 businesses within the radius. Ranking sorts them by relevance. For Yelp-style search: rank = f(distance, rating, review_count, relevance_to_query, business_activity). Distance has diminishing returns — a 4-star place 0.5km away often beats a 5-star place 5km away. Text relevance uses Elasticsearch with BM25 scoring on business name, categories, and reviews. Personalization boosts businesses the user visited before or that match their past behavior. The ranking model is typically a gradient-boosted tree trained on click-through and booking data. Real-time signals (current wait time, current availability) are fetched and injected into the ranking computation.

Map Tiles

Google Maps serves the map itself as a grid of static image tiles (256×256 or 512×512 pixels). Tiles are organized by zoom level: zoom 0 is the whole world in one tile; each additional zoom level quadruples the number of tiles. At zoom 15 (street level), there are 2^30 tiles for the whole world. The tile URL encodes zoom/x/y coordinates: /tiles/{zoom}/{x}/{y}.png. Tiles are pre-rendered offline and served from CDN with long TTL (tiles change rarely). The map client requests only the tiles visible in the current viewport, fetching ahead as the user pans. Rendering vector tiles (data sent to client, rendered in browser via WebGL) instead of raster tiles reduces bandwidth and enables smooth zoom.

Routing: Shortest Path at Scale

Dijkstra finds shortest path on a road network graph. The world road network has 1 billion+ nodes — naive Dijkstra is too slow (O(E log V) = minutes). Optimizations: (1) Bidirectional Dijkstra: expand from both source and destination simultaneously. Terminates when the two frontiers meet. Reduces explored nodes by ~50%. (2) A*: Dijkstra with a heuristic — prioritize nodes closer to the destination (haversine distance as lower bound). 10x faster than Dijkstra on road networks. (3) Contraction Hierarchies (CH): precompute shortcuts. Important nodes (highways) are contracted — a shortcut edge represents many-hop paths. Routing then uses only shortcuts through a hierarchy of importance, reducing search space by 1000x. Google Maps uses a variant of CH for its core routing engine.

Real-Time Traffic

Static road networks give distance-optimal routes. Traffic-aware routing uses time as the edge weight instead of distance. Edge weight = distance / current_speed. Current speed comes from: (1) Historical patterns (Tuesday 8am on Highway 101 is usually congested). (2) Real-time probe data: GPS speeds from millions of navigation app users. (3) Incident reports: accidents, road closures. A stream processing job (Kafka + Flink) aggregates probe data every 60 seconds and updates edge weights in the routing graph. The routing engine re-computes routes with updated weights. Traffic prediction ML models extend historical + real-time data to estimate conditions 15-60 minutes ahead for better ETA accuracy.

Interview Tips

Geohashing vs quadtree: geohashing for database-backed search, quadtree for in-memory with frequent updates
Always filter by exact distance after geohash search (bounding box vs circle)
For routing: mention A* or Contraction Hierarchies — naive Dijkstra at scale is too slow
Map tiles are CDN-served pre-rendered images — separate from the business/routing systems
Real-time traffic requires probe data aggregation — not just static road data

Frequently Asked Questions

What is geohashing and how does it enable nearby search?

Geohashing encodes a latitude/longitude coordinate into a short string by recursively dividing the world into a grid. Each additional character in the geohash halves the cell size: a 5-character geohash represents roughly a 4.9km x 4.9km area; a 7-character geohash represents a 76m x 76m area. The critical property: locations with the same geohash prefix are geographically close. Nearby search uses this: compute the geohash of the search center at appropriate precision, find the 8 neighboring cells, then query the database for all businesses whose geohash starts with any of those 9 prefixes. This turns a geospatial query into a simple string prefix query, which a standard B-tree index handles efficiently. After retrieving the candidate set, filter by exact Haversine distance to handle the bounding-box-to-circle conversion.

How does Google Maps find the fastest route at scale?

The world road network has over 1 billion nodes — naive Dijkstra would take minutes per query. Google Maps uses Contraction Hierarchies (CH): in a preprocessing step, less-important nodes (local roads) are contracted by adding shortcut edges that represent multi-hop paths. During routing, the algorithm searches upward through the hierarchy (from local roads to highways), finds the meeting point at a high-level node, then searches downward to the destination. This reduces the search space by 1000x compared to plain Dijkstra. For real-time traffic, edge weights (travel times) are updated continuously using probe data from millions of GPS-equipped devices. A stream processing pipeline aggregates speed readings by road segment every 60 seconds and feeds updated weights into the routing engine.

What is the difference between geohashing and a quadtree for proximity search?

Both geohashing and quadtrees partition geographic space for fast proximity queries, but they have different tradeoffs. Geohashing encodes locations as strings with a fixed precision level — it is simple to implement, works natively with SQL string indexes, and is easy to shard by geohash prefix. The main weakness: geohash cells are fixed-size, so a dense city and a rural area get the same cell size at the same precision level. A quadtree recursively subdivides space only where needed — cells in Manhattan are tiny, cells in the Sahara are huge. Quadtrees are density-adaptive, which is ideal for in-memory structures tracking real-time object positions (driver locations in a ride-sharing app). Geohashing is better for database-backed business directories; quadtrees are better for in-memory, high-frequency update scenarios like live location tracking.