Location Tracking System Low-Level Design

Requirements

Track real-time location of mobile assets (drivers, delivery couriers, field workers)
1M active devices sending location updates every 4 seconds
Query nearby assets within a radius in <100ms
Store location history for trip replay and analytics
Detect geofence entry/exit events

Data Flow

Device (GPS ping every 4s) → Location API → Redis Geo (live location)
                           → Kafka        → Cassandra (location history)
                                          → Geofence Service (zone events)
                                          → Analytics Pipeline

Live Location Storage: Redis Geospatial

Redis stores live locations using the Geo API (internally a sorted set with geohash scores):

# Update device location
GEOADD locations {lng} {lat} {device_id}

# Find all devices within 5km of a point
GEORADIUS locations {lng} {lat} 5 km ASC COUNT 20

# Get location of specific device
GEOPOS locations {device_id}

# Distance between two devices
GEODIST locations {device_id_1} {device_id_2} km

At 1M active devices: GEOADD throughput = 250K writes/second. Redis handles this easily (single-threaded, ~1M ops/second). Memory: ~100 bytes per member → 100MB for 1M devices. GEORADIUS runs in O(N+log(M)) where N is results and M is total members.

Stale Location Filtering

Devices go offline without explicit deregistration. Maintain a Redis hash alongside: HSET device_meta {device_id} last_update {epoch_ms}. When querying nearby devices: filter out devices with last_update > 30 seconds ago (they are offline). Alternatively, use Redis EXPIREAT on each device’s geo entry — but GEOADD doesn’t support TTL on individual members. The metadata hash approach is cleaner.

Location History: Cassandra

CREATE TABLE location_history (
    device_id  UUID,
    trip_id    UUID,
    recorded_at TIMESTAMP,
    lat        DOUBLE,
    lng        DOUBLE,
    speed_kmh  FLOAT,
    heading    SMALLINT,
    PRIMARY KEY (device_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC);

Partition by device_id — all locations for a device on the same node. Clustering by recorded_at DESC for efficient “latest N locations” queries. Write throughput: Cassandra handles 250K writes/second with proper replication (RF=3). Retention policy: keep 90 days in Cassandra (hot), archive to S3/Parquet after 90 days (cold).

Geofencing

A geofence is a geographic boundary (polygon or circle) that triggers an event on entry or exit. Implementation:

Store geofence definitions: Geofence(geofence_id, name, type ENUM(CIRCLE,POLYGON), center_lat, center_lng, radius_m, polygon_coords JSON)
For each location update: determine which geofences the device is currently inside (point-in-polygon test)
Compare to the device’s previous geofence set: if newly inside → ENTRY event; if no longer inside → EXIT event
Publish ENTRY/EXIT events to Kafka for downstream processing

Point-in-polygon test (ray casting algorithm): O(n) where n is polygon vertices. For circles: distance to center <= radius. Store device’s current geofences in Redis hash: HSET device_geofences {device_id} {json_set_of_geofence_ids}.

Spatial Indexing for Geofence Lookup

For each location update, finding which geofences contain the point: checking all geofences O(G) is too slow for large G. Spatial index: R-tree (spatial index for rectangles/polygons) or H3 hexagonal grid. H3 approach: convert each geofence to a set of H3 cells at resolution 9 (~0.1km²). Build an inverted index: H3_cell → [geofence_ids]. For each location update: convert device location to H3 cell, look up geofence_ids, verify each with exact point-in-polygon test. Reduces candidates from all geofences to those overlapping the device’s H3 cell.

Key Design Decisions

Redis Geo for live spatial queries — purpose-built for radius queries, O(log n) per query
Cassandra for location history — high write throughput, time-series partitioning by device_id
Kafka as the async path — Cassandra writes and geofence checks never block the location API
H3 spatial index for geofencing — reduces candidate geofences from O(G) to O(1) expected

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does Redis Geo work for real-time location tracking?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Redis Geo commands (GEOADD, GEORADIUS, GEOPOS, GEODIST) are built on a sorted set where the score is a geohash of the (lat, lng) coordinates. GEOADD adds or updates a member's location: GEOADD locations {lng} {lat} {device_id}. GEORADIUS finds all members within a given radius: GEORADIUS locations {lng} {lat} 5 km ASC COUNT 20 — returns up to 20 device_ids sorted by distance. GEOPOS retrieves a specific member's coordinates. At 1M active devices sending updates every 4 seconds, write throughput is ~250K ops/second. Redis handles this on a single node (single-threaded, ~1M simple ops/second) since GEOADD is O(log n). Memory: ~100 bytes per member = ~100MB for 1M devices. For production, use Redis Cluster to shard across multiple nodes.”}},{“@type”:”Question”,”name”:”How do you detect and filter stale device locations?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Devices go offline without explicitly deregistering. Two approaches: (1) Metadata hash: maintain a parallel Redis hash HSET device_meta {device_id} last_update {epoch_ms} alongside the geo set. When serving GEORADIUS results, filter out device_ids where now() – last_update > 30 seconds. The metadata hash is updated on every location ping. (2) Expiring entries: GEOADD does not support TTL on individual geo members. Workaround: run a periodic cleanup job (every 30s) that fetches device_meta entries older than 30s and removes them from the geo set with ZREM. Approach 1 (client-side filtering) is simpler and more accurate. Approach 2 keeps the geo set clean but requires a background job and introduces up to 30s delay in cleanup.”}},{“@type”:”Question”,”name”:”How does geofencing work with H3 spatial indexing?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A geofence is a geographic boundary (circle or polygon) that triggers ENTRY/EXIT events. Naive approach: for each location update, test the device against every geofence O(G). Too slow for G > 1000. H3 approach: H3 is a hexagonal grid that divides Earth into cells at varying resolutions. At resolution 9, each cell is ~0.1km². Offline indexing: convert each geofence polygon to the set of H3 cells it overlaps. Build an inverted index: H3_cell → [geofence_ids]. Store in Redis HSET. Online check: for each device ping, convert (lat, lng) to H3 cell. Look up candidate geofences from the inverted index. Verify each candidate with exact point-in-polygon test (ray casting for polygons, distance check for circles). This reduces candidates from O(G) to O(1) expected. Track each device's current geofence set in Redis; compare to previous to detect ENTRY/EXIT events.”}},{“@type”:”Question”,”name”:”Why is Cassandra a good fit for location history storage?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Location history is a time-series write-heavy workload: 1M devices × 1 update/4s = 250K writes/second continuously. Requirements: high write throughput, efficient time-range queries per device (for trip replay: give me all locations for device X between time A and B), and multi-year retention. Cassandra fits because: (1) Write throughput — Cassandra's LSM-tree storage (SSTable + memtable) handles hundreds of thousands of writes/second per node with horizontal scaling. (2) Data model — partition by device_id places all of a device's history on the same node for efficient reads. Cluster by recorded_at DESC for efficient "latest N locations" queries. (3) No joins needed — each location row is self-contained. (4) Tunable consistency — use LOCAL_QUORUM for reads/writes (strong within a datacenter) or LOCAL_ONE for maximum write throughput if some staleness is acceptable.”}},{“@type”:”Question”,”name”:”How do you scale the location API to 250K writes per second?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The location API must be stateless and horizontally scalable. Architecture: devices POST to load-balanced location API servers. Each API server: (1) validates the request (auth token, format), (2) writes to Redis Geo atomically (GEOADD — O(log n), ~1μs), (3) publishes to Kafka asynchronously (fire-and-forget, never await the response). The Kafka producer uses async sends with batching (linger.ms=5, batch.size=16KB) to achieve high throughput without blocking the API. Kafka consumers (separate process pool) handle Cassandra writes and geofence checks asynchronously. This way the API response time is dominated by only the Redis write (~1ms), not the slow Cassandra write (~5-10ms). Kafka acts as a buffer, smoothing out write spikes and decoupling the fast API from slow downstream consumers.”}}]}

Uber system design is the canonical location tracking interview topic. See common questions for Uber interview: real-time location tracking system design.

Lyft system design covers real-time location tracking for drivers. Review patterns for Lyft interview: driver location tracking system design.

Airbnb system design covers geolocation and proximity queries. See design patterns for Airbnb interview: geolocation and proximity search design.