Location History Service Low-Level Design: GPS Ingestion, Compression, and Privacy Controls

Location History Service: Low-Level Design

A location history service ingests continuous GPS streams from mobile clients, compresses raw coordinate sequences, enforces retention policies, and serves historical traces to authorized consumers. At scale this means handling tens of thousands of GPS pings per second while keeping storage costs under control and respecting user privacy preferences.

Requirements

Functional

Ingest GPS events (latitude, longitude, accuracy, timestamp, device ID) at up to 1 Hz per device
Store compressed path segments with configurable retention windows (7 days, 30 days, 1 year)
Query historical trace for a device within a time range with optional resolution downsampling
Support privacy controls: pause recording, delete history, export personal data (GDPR)
Emit enriched location events (reverse-geocoded place, speed, heading) to downstream consumers

Non-Functional

Ingest latency under 100 ms p99; query latency under 500 ms for 30-day traces
Durability: no GPS event loss after acknowledgment
Horizontal scalability to 50 million active devices

Data Model

Raw events land in a time-series store partitioned by device_id and truncated hour. Each row holds a packed binary blob of delta-encoded coordinates to minimize row overhead.

raw_events: device_id (UUID), bucket_hour (TIMESTAMP), points (BYTEA — packed delta list), accuracy_m (FLOAT)
compressed_segments: device_id, segment_start (TIMESTAMP), segment_end, polyline (TEXT — Encoded Polyline format), point_count (INT), distance_m (FLOAT)
privacy_settings: device_id, recording_paused (BOOL), retention_days (INT), last_purge_at (TIMESTAMP)
place_visits: device_id, place_id, arrived_at, departed_at, confidence (FLOAT)

ScyllaDB or Cassandra suits the raw ingest table due to high write throughput and time-range scan efficiency. PostgreSQL with TimescaleDB handles compressed segments and privacy metadata where consistency matters.

Core Algorithms

Douglas-Peucker Path Compression

After each 5-minute rolling window closes, a compression worker applies the Douglas-Peucker algorithm with an epsilon of 10 meters. This reduces a 300-point raw trace to roughly 15 to 40 anchor points while preserving the visual shape of the path. The compressed polyline is stored in Google Encoded Polyline format, cutting storage by up to 85 percent compared to raw coordinates.

Delta Encoding for Ingest

Each GPS event is delta-encoded against the previous point in the same device bucket: only the difference in latitude, longitude, and timestamp is stored as a varint. A full coordinate pair costs 16 bytes raw; the delta typically costs 3 to 5 bytes, enabling efficient bulk writes and smaller Kafka messages.

Stay-Point Detection

A sliding window checks whether a sequence of points falls within a 50-meter radius for more than 5 minutes. Qualifying clusters are collapsed into a single place-visit record, reducing segment noise and enabling place inference.

Scalability and Architecture

Ingest follows a pipeline pattern: mobile clients POST to a regional edge gateway, which publishes to a Kafka topic partitioned by device_id. A fleet of stateless ingest workers consumes the topic, performs delta encoding, and batches writes to ScyllaDB. Compression workers consume a separate compaction topic triggered when a time bucket closes.

Kafka provides at-least-once delivery; idempotency keys prevent duplicate storage
A TTL-based purge job runs nightly, enforcing per-device retention_days from the privacy_settings table
Read path uses a query service that merges compressed segments and optionally re-samples points using Ramer-Douglas-Peucker at the requested resolution
Reverse geocoding results are cached in Redis with a 24-hour TTL to avoid redundant API calls

API Design

Ingest Endpoint

POST /v1/locations — accepts a batch of up to 100 GPS events, returns HTTP 202 with an ack token. Clients retry on 5xx; the server deduplicates using the event timestamp plus device ID.

History Query

GET /v1/devices/{device_id}/history?start=ISO8601&end=ISO8601&resolution=high|medium|low — returns a GeoJSON LineString. Resolution controls the Douglas-Peucker epsilon applied at read time (10 m, 50 m, 200 m).

Privacy Controls

PUT /v1/devices/{device_id}/privacy — set retention_days, toggle recording_paused
DELETE /v1/devices/{device_id}/history — enqueues an async purge job, returns a job ID
GET /v1/devices/{device_id}/export — triggers GDPR data export to a pre-signed S3 URL

Interview Tips

Interviewers often probe the tradeoff between compression fidelity and storage cost. Be ready to quantify: a 10-meter epsilon on Douglas-Peucker drops 80 to 90 percent of points for typical urban driving while preserving turn detection. Also discuss how you handle GPS drift indoors (accuracy filter: drop points with accuracy_m greater than 50) and the privacy-by-design angle of server-side aggregation before long-term storage.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you handle high-frequency GPS ingestion at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “High-frequency GPS ingestion uses a write-optimized pipeline: devices batch points locally and flush over a persistent connection to an ingestion gateway. The gateway publishes to a partitioned message queue (Kafka), with partitions keyed by device ID to preserve ordering. Stream consumers write raw points to a time-series store and trigger async compression.”
}
},
{
“@type”: “Question”,
“name”: “What is Douglas-Peucker path compression and why is it used for location history?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Douglas-Peucker is a polyline simplification algorithm that removes GPS points that deviate less than a configurable epsilon from the straight line between their neighbors. It reduces storage by 60-90% on straight roads while preserving turns and stops. The epsilon is tuned per use-case: tighter for navigation replay, looser for analytics.”
}
},
{
“@type”: “Question”,
“name”: “How are privacy retention controls implemented for location data?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each point is stored with a user ID, timestamp, and a retention-class tag derived from user consent tier. A scheduled job runs TTL sweeps against the time-series store, deleting or anonymizing points past their class deadline. Sensitive zones (home, medical) can be obfuscated at write time rather than deleted later, per configurable policy.”
}
},
{
“@type”: “Question”,
“name”: “What data structures enable efficient range queries over location history?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Spatial range queries use a combination of geohash-bucketed indexes (for bounding-box lookups) and time-partitioned tables so scans are bounded by both geography and time window. For radius queries, an S2 or H3 cell covering is computed first, rows are fetched by cell ID, then an exact distance filter is applied in the application layer.”
}
}
]
}