Proximity Search System: Overview
A proximity search system finds entities (restaurants, drivers, users) within a geographic radius of a query point. Low-level design covers coordinate storage, geohash-based bounding box filtering, haversine distance computation, spatial indexing, k-nearest neighbor retrieval, and strategies for handling moving entities.
Coordinate Storage
Latitude and longitude are stored as DECIMAL(9,6) columns, providing ~11cm precision at the equator — more than sufficient for any proximity use case:
lat DECIMAL(9,6) -- range: -90.000000 to 90.000000
lng DECIMAL(9,6) -- range: -180.000000 to 180.000000
Alternatively, PostGIS uses a GEOGRAPHY(POINT, 4326) column that stores coordinates as WGS84 and unlocks native spatial functions. For non-PostGIS environments, the DECIMAL approach with application-level haversine is standard.
Geohash Encoding
A geohash encodes a coordinate pair into a short alphanumeric string. Each additional character halves the cell size:
- 5 characters: ~4.9km x 4.9km cell
- 6 characters: ~1.2km x 0.6km cell
- 7 characters: ~152m x 152m cell
- 8 characters: ~38m x 19m cell
Entities within the same geohash prefix are geographically close. A prefix search on the geohash column returns a bounding box of candidates without computing exact distances for all rows.
For a given query point and radius, compute the minimum geohash prefix length that contains the radius circle, then search that prefix plus its 8 neighbors (to handle boundary crossing at cell edges).
Neighbor Geohash Cells for Boundary Crossing
A query point near a geohash cell boundary will have nearby entities in adjacent cells. Always expand the search to the 8 neighboring geohash cells at the chosen precision level. Most geohash libraries expose a neighbors(geohash) function returning the 8 adjacent cells.
After fetching candidates from all 9 cells (center + 8 neighbors), apply exact haversine filtering to discard candidates outside the true radius.
Haversine Formula
The haversine formula computes the great-circle distance between two points on a sphere:
a = sin^2((lat2-lat1)/2) + cos(lat1)*cos(lat2)*sin^2((lng2-lng1)/2)
distance = 2 * R * arcsin(sqrt(a)) -- R = 6371 km (Earth radius)
This is accurate for distances up to a few hundred km. For precision over long distances, use the Vincenty formula on the WGS84 ellipsoid.
PostGIS for Accurate Radius Queries
When the database has PostGIS, use ST_DWithin on a GEOGRAPHY column for accurate spherical distance filtering:
SELECT id, entity_type, entity_id
FROM Location
WHERE ST_DWithin(
geog,
ST_SetSRID(ST_MakePoint(:lng, :lat), 4326)::geography,
:radius_meters
)
ORDER BY ST_Distance(geog, ST_SetSRID(ST_MakePoint(:lng, :lat), 4326)::geography)
LIMIT :k;
ST_DWithin on a GEOGRAPHY column uses the GiST spatial index automatically for efficient bounding box pre-filtering followed by exact distance computation.
Spatial Index: GiST vs R-tree
- GiST (Generalized Search Tree): PostgreSQL's native spatial index, used by PostGIS. Supports bounding box overlap queries (
&&) and distance ordering (<->operator). Default choice for PostGIS. - R-tree: The underlying data structure in many spatial databases (SpatiaLite, MySQL spatial, Oracle Spatial). Hierarchically groups bounding boxes. GiST in PostgreSQL implements an R-tree variant.
Without PostGIS, index the geohash column with a standard B-tree index and use LIKE 'prefix%' queries for prefix matching. This is less accurate but works in plain SQL.
K-Nearest Neighbor Query
To find the k closest entities regardless of a fixed radius:
SELECT id, entity_id, lat, lng,
ST_Distance(geog, :query_point::geography) AS dist_meters
FROM Location
WHERE entity_type = :entity_type
ORDER BY geog <-> :query_point::geography
LIMIT :k;
The <-> distance operator in PostGIS uses the GiST index for efficient kNN without a radius pre-filter. For non-PostGIS, fetch candidates from geohash prefix + neighbors, compute haversine for each, sort, and take the top k.
Partitioning: Tile-Based Sharding
For global scale, partition the Location table by geohash prefix (first 2-3 characters). Each shard handles a geographic region. Queries for a given area route to one or two shards (for boundary queries). This collapses hotspots in dense cities to predictable shards that can be independently scaled.
Caching
Popular area results (e.g., restaurants near city center) are cached with a short TTL (30-60 seconds). Cache key: nearby:{geohash_prefix}:{radius}:{entity_type}. For moving entities (ride-share drivers), skip caching or use a very short TTL (5 seconds) since positions change rapidly.
SQL Schema
-- Location table (non-PostGIS variant)
CREATE TABLE Location (
id BIGSERIAL PRIMARY KEY,
entity_type VARCHAR(32) NOT NULL, -- driver, restaurant, user
entity_id BIGINT NOT NULL,
lat DECIMAL(9,6) NOT NULL,
lng DECIMAL(9,6) NOT NULL,
geohash VARCHAR(12) NOT NULL, -- pre-computed geohash
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- B-tree index for geohash prefix search
CREATE INDEX idx_location_geohash ON Location(geohash, entity_type);
-- Index for point lookups by entity
CREATE UNIQUE INDEX idx_location_entity ON Location(entity_type, entity_id);
-- PostGIS variant: add a GEOGRAPHY column and GiST index
-- ALTER TABLE Location ADD COLUMN geog GEOGRAPHY(POINT, 4326);
-- CREATE INDEX idx_location_geog ON Location USING GIST(geog);
Python Implementation
import math
import geohash2 # pip install geohash2
EARTH_RADIUS_KM = 6371.0
def encode_geohash(lat: float, lng: float, precision: int = 7) -> str:
"""Encode coordinates to geohash string at given precision."""
return geohash2.encode(lat, lng, precision=precision)
def geohash_precision_for_radius(radius_km: float) -> int:
"""Choose geohash precision so cell size 4.9:
return 4
elif radius_km > 1.2:
return 5
elif radius_km > 0.15:
return 6
return 7
def compute_haversine(lat1: float, lng1: float, lat2: float, lng2: float) -> float:
"""Return great-circle distance in km."""
r = EARTH_RADIUS_KM
phi1, phi2 = math.radians(lat1), math.radians(lat2)
dphi = math.radians(lat2 - lat1)
dlambda = math.radians(lng2 - lng1)
a = math.sin(dphi/2)**2 + math.cos(phi1)*math.cos(phi2)*math.sin(dlambda/2)**2
return 2 * r * math.asin(math.sqrt(a))
def find_nearby(lat: float, lng: float, radius_km: float,
entity_type: str, limit: int = 20) -> list:
"""Find nearby entities using geohash prefix filter + haversine exact check."""
precision = geohash_precision_for_radius(radius_km)
center_hash = encode_geohash(lat, lng, precision)
neighbor_hashes = geohash2.neighbors(center_hash)
search_hashes = [center_hash] + list(neighbor_hashes.values())
placeholders = ','.join(['%s'] * len(search_hashes))
rows = db.execute(
f"SELECT entity_id, lat, lng FROM Location"
f" WHERE entity_type=%s"
f" AND LEFT(geohash, %s) IN ({placeholders})",
(entity_type, precision, *search_hashes)
).fetchall()
results = []
for entity_id, elat, elng in rows:
dist = compute_haversine(lat, lng, float(elat), float(elng))
if dist <= radius_km:
results.append({"entity_id": entity_id, "distance_km": round(dist, 4)})
results.sort(key=lambda x: x["distance_km"])
return results[:limit]
Key Design Decisions Summary
- Geohash prefix search reduces candidates from millions to hundreds before exact distance computation.
- Always search 8 neighbor cells to prevent boundary-crossing misses.
- PostGIS
ST_DWithin+ GiST index is the most accurate and performant option when available. - Haversine is sufficient for distances up to ~500km; use Vincenty for global precision requirements.
- Moving entities (drivers) require frequent location updates — use Redis geospatial (
GEOADD/GEORADIUS) for sub-second update and query latency.
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering
See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering