Design Mobile On-Device Translation: Offline Models and Privacy

⏱ 2 min read

Modern mobile translation apps (Google Translate, Apple Translate) increasingly run inference on-device. The interview tests whether you understand the tradeoffs of running ML models on a phone, the realities of language pack distribution, and the privacy benefits of keeping translations local.

Functional requirements

Translate text input across many language pairs
Voice input → translated voice output
Camera-based translation (point at a sign, see translation overlay)
Conversation mode (two-way real-time translation)
Work offline for downloaded language pairs

On-device vs cloud

Tradeoffs:

On-device: works offline, no privacy concern, faster latency, works in low-network areas
Cloud: better quality (larger models), supports more languages, no storage cost on device

Modern apps default to on-device for popular pairs (en-es, en-fr, en-zh) and fall back to cloud for less common pairs or when higher quality is needed.

Language packs

Per-language-pair model files. Sizes:

Text translation: 30–100MB per language pair
Voice: 100–500MB additional
Camera/OCR: depends on script complexity

User downloads on first use of a pair, or eagerly via “download for offline” UI. Storage budget: a few GB for serious users.

Camera translation

Pipeline:

Camera frame captured (continuous video)
OCR detects text regions
Recognize text per region
Translate
Render translated text overlaid on the camera frame, replacing the original

The challenge: do this at 30+ fps on a phone. Each step is GPU-accelerated. ML Kit (Android) and Vision framework (iOS) provide pre-built models.

Voice translation

Pipeline:

Audio captured
Speech recognized to text (in source language)
Translated
TTS produces audio (in target language)

For conversation mode, the device runs the pipeline in both directions concurrently.

Privacy

The strongest argument for on-device translation:

Translated text never leaves the device
Voice input never reaches a server
Sensitive conversations (medical, legal) stay local

Some apps offer explicit “Private mode” that disables any cloud fallback.

Model updates

Models improve over time. Background updates:

App periodically checks for newer model versions
Downloads on Wi-Fi only
Replaces old model atomically

Battery

On-device inference is GPU-intensive. Mitigations:

Use platform-optimized inference (CoreML on iOS, TensorFlow Lite or ONNX on Android)
Quantize models to int8 where quality permits
For camera mode, throttle frame rate based on device temperature

Frequently Asked Questions

How does on-device translation compare to cloud quality in 2026?

For popular language pairs, on-device is now within 5–10% of cloud BLEU scores. For low-resource languages, cloud still wins meaningfully.

Why does my translation app sometimes use the cloud?

If the language pair is not downloaded locally, the app falls back to cloud. Some apps explicitly use cloud for more nuanced text (longer documents, formal language).

Can on-device models be updated without app updates?

Yes. Models are typically distributed separately from the binary, downloaded on demand or in the background.