Modern mobile translation apps (Google Translate, Apple Translate) increasingly run inference on-device. The interview tests whether you understand the tradeoffs of running ML models on a phone, the realities of language pack distribution, and the privacy benefits of keeping translations local.
Functional requirements
- Translate text input across many language pairs
- Voice input → translated voice output
- Camera-based translation (point at a sign, see translation overlay)
- Conversation mode (two-way real-time translation)
- Work offline for downloaded language pairs
On-device vs cloud
Tradeoffs:
- On-device: works offline, no privacy concern, faster latency, works in low-network areas
- Cloud: better quality (larger models), supports more languages, no storage cost on device
Modern apps default to on-device for popular pairs (en-es, en-fr, en-zh) and fall back to cloud for less common pairs or when higher quality is needed.
Language packs
Per-language-pair model files. Sizes:
- Text translation: 30–100MB per language pair
- Voice: 100–500MB additional
- Camera/OCR: depends on script complexity
User downloads on first use of a pair, or eagerly via “download for offline” UI. Storage budget: a few GB for serious users.
Camera translation
Pipeline:
- Camera frame captured (continuous video)
- OCR detects text regions
- Recognize text per region
- Translate
- Render translated text overlaid on the camera frame, replacing the original
The challenge: do this at 30+ fps on a phone. Each step is GPU-accelerated. ML Kit (Android) and Vision framework (iOS) provide pre-built models.
Voice translation
Pipeline:
- Audio captured
- Speech recognized to text (in source language)
- Translated
- TTS produces audio (in target language)
For conversation mode, the device runs the pipeline in both directions concurrently.
Privacy
The strongest argument for on-device translation:
- Translated text never leaves the device
- Voice input never reaches a server
- Sensitive conversations (medical, legal) stay local
Some apps offer explicit “Private mode” that disables any cloud fallback.
Model updates
Models improve over time. Background updates:
- App periodically checks for newer model versions
- Downloads on Wi-Fi only
- Replaces old model atomically
Battery
On-device inference is GPU-intensive. Mitigations:
- Use platform-optimized inference (CoreML on iOS, TensorFlow Lite or ONNX on Android)
- Quantize models to int8 where quality permits
- For camera mode, throttle frame rate based on device temperature
Frequently Asked Questions
How does on-device translation compare to cloud quality in 2026?
For popular language pairs, on-device is now within 5–10% of cloud BLEU scores. For low-resource languages, cloud still wins meaningfully.
Why does my translation app sometimes use the cloud?
If the language pair is not downloaded locally, the app falls back to cloud. Some apps explicitly use cloud for more nuanced text (longer documents, formal language).
Can on-device models be updated without app updates?
Yes. Models are typically distributed separately from the binary, downloaded on demand or in the background.