“Design a camera app” is a senior-level mobile-system-design question that probes whether you understand the capture pipeline, on-device ML for computational photography, and the data flow from sensor to photo library. iPhone and Pixel cameras are software products as much as hardware; the interview tests whether you can articulate that.
Clarify scope
- Stock camera or social camera (Instagram, Snapchat)?
- Photo only, or also video and slow-mo?
- Real-time filters/effects?
- HDR, Night mode, Live Photo, ProRAW?
- Editor included, or hand off to Photos?
The capture pipeline
- Sensor → ISP (image signal processor) hardware: demosaic, white balance, noise reduction
- Multi-frame buffer: the app captures a rolling buffer of recent frames so the “shutter” is zero-latency
- On shutter: select frames, fuse (HDR / Night), denoise, sharpen
- Encode: HEIF/JPEG/RAW
- Save to photo library; trigger thumbnail generation
HDR and multi-frame fusion
Single-exposure capture has limited dynamic range. Modern cameras capture multiple frames at different exposures (and sometimes the same exposure for noise) and fuse them. Steps:
- Align frames (tiny camera shake between frames; use feature matching or hardware gyro hints)
- Compute per-pixel weights based on motion and exposure
- Blend in linear (not sRGB) space
- Tone-map back to display gamut
Night mode
Long-exposure single shot is bad on a handheld phone (motion blur). Night mode captures many short exposures, aligns and stacks them, and tone-maps. The capture indicator shows a 3–10 s “hold still” countdown. Done on the GPU/Neural Engine.
Burst
Hold the shutter (or press volume button) and the app captures frames continuously. Each frame is timestamped and stored as a “burst” group in the photo library. The app shows the user the best frame (smile/eyes-open detection) by default; user can pick any frame from the group.
Live Photo
The pre-roll buffer is part of the magic. The camera is constantly capturing; on shutter, the app saves 1.5 s before and 1.5 s after as a 3-second clip stored alongside the still. Storage is one HEIF still + one HEVC clip referenced from the same asset.
RAW vs HEIF
- HEIF: small, processed, ready to share
- RAW (DNG / ProRAW): large, sensor-level data, lets editors recover shadows and highlights
- Apple ProRAW: a hybrid — Bayer data plus ISP processing metadata, so you get RAW flexibility with computational benefits
On-device ML
- Smart HDR: per-pixel deep learning model decides exposure blending
- Photonic Engine / Deep Fusion: fuse frames at multiple resolutions guided by an ML model
- Portrait mode: depth from dual camera or ML; segmentation refines the matte
- Subject detection: faces, pets, text — drives focus, exposure, and album organization
The editor
Non-destructive edit graph. Each adjustment (exposure, contrast, crop, filter) is a node; the original is preserved. Save = serialize the graph. On read, the renderer applies the graph to the original and produces the displayed image. Undo/redo = navigate the graph history.
Storage and library
- Photo library is the system feature; your app writes via PHPhotoLibrary (iOS) or MediaStore (Android)
- Thumbnails generated on save and cached at multiple sizes
- iCloud Photos / Google Photos sync is handled by the OS, not your app
Performance and battery
- Hand off heavy work to the Neural Engine / GPU; CPU is for orchestration
- Drop the preview frame rate when the device is hot
- Defer noncritical work (thumbnail generation, ML scene classification) to idle
What interviewers reward
- Naming the rolling pre-roll buffer (the “zero-shutter-lag” trick)
- Explaining HDR as a software fusion problem, not a hardware setting
- Discussing the edit graph as immutable original + recipe
- Mentioning RAW vs HEIF tradeoffs
- Discussing thermal throttling and battery
Frequently Asked Questions
How do I handle a camera permission prompt smoothly?
Pre-prime with a screen explaining why you need camera access before triggering the system prompt. If the user denies, route them to Settings; do not crash the app.
What about real-time filters (Snapchat-style)?
Process the camera preview through a Metal/OpenGL shader pipeline. Each filter is a fragment shader. Heavy ML filters (face mesh) run on the Neural Engine and feed parameters to the shader.
Should I use AVFoundation/CameraX directly or a higher-level library?
Direct is right for a camera-first app. Libraries are fine for in-app QR scanners or simple capture flows where you don’t need ProRAW or burst.