How does geohashing enable efficient proximity searches?

Geohashing encodes a (lat, lng) coordinate into a base-32 string where nearby locations share a common prefix. To search within a radius, find the geohash cell containing the query point and its 8 neighboring cells, then query the database for all entities with matching geohash prefixes. This turns a 2D radius search into a fast string-prefix lookup, which is well-supported by standard database indexes.

How do you choose the right geohash precision for a proximity search?

Geohash length determines cell size: length 5 ≈ 5km × 5km, length 6 ≈ 1.2km × 0.6km, length 7 ≈ 150m × 150m. Choose the precision so that the 9 cells (current + 8 neighbors) cover your search radius without including too many extra entities. For a 1km radius search, use precision 6. Over-precise hashes return too few results; too-coarse hashes return too many.

How do you handle the high write volume of driver location updates?

Use a write-optimized two-layer architecture. Drivers send UDP updates every 4 seconds to a location ingestion service. The service writes the current position to Redis (TTL 30 seconds) for fast reads, and publishes to Kafka for durable async processing. Kafka consumers persist location history to Cassandra, partitioned by driver_id and sorted by timestamp for efficient trip history queries.

What is the difference between Redis GEORADIUS and a custom geohash approach?

Redis GEORADIUS/GEOSEARCH is a built-in solution — simple to implement, O(N + log M) per query, handles the 9-cell neighbor logic internally. A custom geohash approach gives more control (shard by prefix, tune precision, combine with other filters) but requires more code. Redis is the right answer for moderate scale (millions of entities); a custom solution is needed for very large scale or when complex filtering is required.

How do you push live location updates to a rider waiting for their driver?

Use Redis pub/sub. When a driver's location is updated, the ingestion service publishes to a channel keyed by driver_id (e.g., driver:loc:12345). The rider's app connects via WebSocket to a notification server. The server subscribes to the relevant driver's channel. When a new location arrives, it's forwarded to the WebSocket connection. The rider sees the driver move on the map in near-real-time.

System Design: Location Service — Proximity Search, Geohashing, and Real-Time Tracking

⏱ 5 min read

Requirements

A location service has two core functions: proximity search (find entities near a given coordinate — restaurants, drivers, friends) and real-time location tracking (store and retrieve the current position of moving entities). Examples: Yelp’s “restaurants near me”, Uber’s driver tracking, WhatsApp “share live location”, Pokémon GO. Scale: Google Maps indexes 200M+ businesses. Uber has 5M+ active drivers sending location updates every 4 seconds. Yelp serves 150M+ monthly unique visitors. The key technical challenge is efficient geospatial indexing — standard B-tree indexes don’t support 2D radius queries efficiently.

Geospatial Indexing Approaches

Problem: find all entities within R km of (lat, lng). Naive: compute distance to every entity — O(N). Unacceptable at scale. Geohash: encode a (lat, lng) pair into a base-32 string (e.g., “9q8yy”). Nearby locations share a common prefix. Partition the map into a grid; each cell has a geohash. To find nearby entities: look up entities in the same geohash cell and its 8 neighbors. Fast prefix-based lookup in a standard index (LIKE ‘9q8yy%’). Precision: geohash length 5 = ~5km × 5km cells; length 6 = ~1.2km × 0.6km cells; length 7 = ~150m × 150m cells. Choose precision based on the search radius. Quadtree: recursively divide a 2D space into quadrants until each cell contains ≤ K points. Variable density — densely populated areas have smaller cells. Efficient for point-in-polygon and range queries. Used by Twitter for geotagged tweets. R-tree: tree structure where each node covers a bounding rectangle. Supports range queries and nearest-neighbor search. Used by PostGIS (PostgreSQL spatial extension). Redis GEOSEARCH: Redis natively supports geospatial indexing via GEOADD (stores points) and GEORADIUS / GEOSEARCH (radius search). Internally uses a sorted set with geohash-encoded scores. O(N + log M) per radius query. The simplest production-ready solution for moderate scale.

Proximity Search Architecture

Read-heavy workload (searches heavily outnumber writes). Architecture: location data is stored in a Location table (entity_id, entity_type, lat, lng, geohash6, geohash7, updated_at). Index on geohash6 and geohash7 for prefix queries. For search: given a query (lat, lng, radius_km): compute the geohash prefix for the query point at the appropriate precision. Query all entities matching the 9 geohash cells (current + 8 neighbors). Filter results to those within the exact radius (Haversine or Euclidean). Rank by distance, return top K. Caching: for popular fixed queries (e.g., “restaurants near Times Square”), cache the result in Redis for 5 minutes. For user-specific queries (near current GPS location), caching is impractical — locations differ by meters. Database sharding: shard by geohash prefix (all entities in geohash prefix “9q” go to shard 1, “9r” to shard 2). This clusters nearby entities together on the same shard, minimizing cross-shard queries for proximity searches.

Real-Time Location Tracking

Use case: track the live position of drivers, delivery couriers, or friends. Write patterns: 5M drivers * 1 update / 4s = 1.25M writes/second. This requires a write-optimized architecture. Write path: device sends UDP location update → location ingestion service → writes current position to Redis (fast, for serving) + publishes to Kafka (for durability and async processing). Redis key: SET driver:{id}:location “{lat},{lng},{timestamp}” with TTL of 30 seconds (auto-expires stale locations). Kafka consumer: persists location history to Cassandra (time-series, write-optimized). Schema: (entity_id, timestamp, lat, lng) partitioned by entity_id. Read path: get a driver’s current location → read from Redis (< 1ms). Get a driver's trip history → query Cassandra by entity_id and time range. Subscribe to live updates: WebSocket or Server-Sent Events; the server subscribes to a Redis pub/sub channel per entity. When a new location arrives, it's published to the channel. All subscribers (e.g., the rider's app) receive it in real time.

Interview Tips

Geohash is the most interview-friendly geospatial index — explain prefix sharing, cell size by precision, and the 9-cell neighbor query.
For driver tracking writes at scale, always separate the fast path (Redis for current position) from the durable path (Kafka → Cassandra for history).
Redis GEOSEARCH is the “just works” answer for moderate scale; mention geohash + SQL or quadtree for custom implementations.
Distinguish proximity search (read-heavy, index-optimized) from real-time tracking (write-heavy, Redis + stream).