Question 1

What is a geohash and how does it enable proximity queries?

Accepted Answer

Geohash divides the world into a rectangular grid by recursively bisecting latitude and longitude ranges. Each subdivision is encoded as a character (base32). A 6-character geohash represents a ~1.2km x 0.6km cell; 8 characters represent ~38m x 19m. Key property: nearby locations share long common prefixes. To find all locations within radius R: determine the geohash cell for the query point, also check the 8 neighboring cells (same prefix level), and query all records in those cells. The set of 9 cells guarantees coverage if the cell size is appropriate for the radius. Prefix matching: WHERE geohash LIKE 'dr5ru%' scans one cell. String index on geohash makes this O(log n). Redis GEO commands use geohash internally.

Question 2

How do Redis GEO commands work for proximity search?

Accepted Answer

Redis GEO stores coordinates as geohash encoded into a sorted set score. Commands: GEOADD key longitude latitude member - stores the member with geohash as score. GEORADIUS key lon lat radius km [WITHCOORD] [WITHDIST] [COUNT n] [ASC] - returns members within the radius, optionally with distances, sorted by distance. GEOPOS key member - retrieves stored coordinates. GEODIST key m1 m2 km - great-circle distance. GEORADIUSBYMEMBER key member radius km - search around an existing member. Time complexity: O(log n + m) where m is result size. For driver tracking: GEOADD drivers lon lat driver_id on each update; GEORADIUS drivers lon lat 5 km COUNT 20 ASC on each ride request.

Question 3

How do you choose between geohash, quadtree, and spatial database index?

Accepted Answer

Geohash: best for simple radius queries and Redis-based implementations. O(log n) query with prefix index. Limitation: cells are rectangular, not circular - some false positives at cell boundaries. Good for: location tracking, find-nearby, driver matching. Quadtree: adaptive - subdivides dense areas more finely. Better for: variable-density datasets (city centers vs rural), range queries of arbitrary shapes. Harder to implement than geohash. Spatial database (PostGIS/MySQL spatial): best when location data is joined with other relational data, complex geographic queries needed (polygon containment, road network), or write frequency is low. GIST index supports O(log n) spatial queries. Use PostGIS when you need the full power of SQL with spatial predicates.

Question 4

How do you scale location tracking for 1M moving objects?

Accepted Answer

At 1M drivers updating every 4 seconds = 250K writes/second. Architecture: (1) Client batches GPS updates and sends every 4-10 seconds (not every second). (2) Location update service receives updates and writes to Redis GEOADD - O(log n), Redis handles 100K+ writes/second per instance. (3) For 250K/second: use a Redis cluster, shard by geographic region (each shard handles one city or region). (4) Async persistence: also publish to Kafka, consumers write to a DriverLocation DB table for trip history and analytics. (5) Stale driver filtering: store last_update_time in a Redis hash alongside each driver. GEORADIUS results are filtered to exclude drivers with last_update > 30 seconds ago. Never write every GPS update to the primary DB - it cannot sustain 250K writes/second.

Question 5

How do you handle the edge case of geohash cell boundaries in proximity search?

Accepted Answer

A radius query using a single geohash cell misses nearby points in adjacent cells. Solution: always query the target cell AND the 8 neighboring cells (3x3 grid centered on the target). The 9-cell search guarantees complete coverage for any radius smaller than the cell size. For larger radii: expand to 5x5 or 7x7 grid, or switch to a different spatial index. In Redis: use GEORADIUS which handles this automatically - it computes the appropriate set of geohash cells to cover the search radius. If implementing geohash search manually: compute the geohash of the query point, use a geohash library to get the 8 neighbors, query all 9 cells, filter results by actual distance (some results from neighboring cells may fall outside the radius).

Parameter	Value
Active drivers (peak)	1,000,000
Location update frequency	Every 4 seconds
Write throughput	250,000 writes/sec
Redis memory per driver	~50 bytes (geohash + member name)
Total Redis memory for all drivers	~50 MB
Radius query latency (Redis)	< 1 ms
Radius query latency (PostGIS)	1-10 ms with GIST index

Proximity Service (Find Nearby) Low-Level Design

Requirements

Naive Approach – Why It Fails

Geohash

Redis GEO Commands

Quadtree

Database with Spatial Index (PostGIS)

Read-Write Split Architecture

Handling High Write Throughput for Moving Objects

Scale Numbers and Capacity Estimates

Summary – Which Approach to Use