Design a Mobile Camera App: HDR, Burst, and On-Device Editing

⏱ 3 min read

“Design a camera app” is a senior-level mobile-system-design question that probes whether you understand the capture pipeline, on-device ML for computational photography, and the data flow from sensor to photo library. iPhone and Pixel cameras are software products as much as hardware; the interview tests whether you can articulate that.

Clarify scope

Stock camera or social camera (Instagram, Snapchat)?
Photo only, or also video and slow-mo?
Real-time filters/effects?
HDR, Night mode, Live Photo, ProRAW?
Editor included, or hand off to Photos?

The capture pipeline

Sensor → ISP (image signal processor) hardware: demosaic, white balance, noise reduction
Multi-frame buffer: the app captures a rolling buffer of recent frames so the “shutter” is zero-latency
On shutter: select frames, fuse (HDR / Night), denoise, sharpen
Encode: HEIF/JPEG/RAW
Save to photo library; trigger thumbnail generation

HDR and multi-frame fusion

Single-exposure capture has limited dynamic range. Modern cameras capture multiple frames at different exposures (and sometimes the same exposure for noise) and fuse them. Steps:

Align frames (tiny camera shake between frames; use feature matching or hardware gyro hints)
Compute per-pixel weights based on motion and exposure
Blend in linear (not sRGB) space
Tone-map back to display gamut

Night mode

Long-exposure single shot is bad on a handheld phone (motion blur). Night mode captures many short exposures, aligns and stacks them, and tone-maps. The capture indicator shows a 3–10 s “hold still” countdown. Done on the GPU/Neural Engine.

Burst

Hold the shutter (or press volume button) and the app captures frames continuously. Each frame is timestamped and stored as a “burst” group in the photo library. The app shows the user the best frame (smile/eyes-open detection) by default; user can pick any frame from the group.

Live Photo

The pre-roll buffer is part of the magic. The camera is constantly capturing; on shutter, the app saves 1.5 s before and 1.5 s after as a 3-second clip stored alongside the still. Storage is one HEIF still + one HEVC clip referenced from the same asset.

RAW vs HEIF

HEIF: small, processed, ready to share
RAW (DNG / ProRAW): large, sensor-level data, lets editors recover shadows and highlights
Apple ProRAW: a hybrid — Bayer data plus ISP processing metadata, so you get RAW flexibility with computational benefits

On-device ML

Smart HDR: per-pixel deep learning model decides exposure blending
Photonic Engine / Deep Fusion: fuse frames at multiple resolutions guided by an ML model
Portrait mode: depth from dual camera or ML; segmentation refines the matte
Subject detection: faces, pets, text — drives focus, exposure, and album organization

The editor

Non-destructive edit graph. Each adjustment (exposure, contrast, crop, filter) is a node; the original is preserved. Save = serialize the graph. On read, the renderer applies the graph to the original and produces the displayed image. Undo/redo = navigate the graph history.

Storage and library

Photo library is the system feature; your app writes via PHPhotoLibrary (iOS) or MediaStore (Android)
Thumbnails generated on save and cached at multiple sizes
iCloud Photos / Google Photos sync is handled by the OS, not your app

Performance and battery

Hand off heavy work to the Neural Engine / GPU; CPU is for orchestration
Drop the preview frame rate when the device is hot
Defer noncritical work (thumbnail generation, ML scene classification) to idle

What interviewers reward

Naming the rolling pre-roll buffer (the “zero-shutter-lag” trick)
Explaining HDR as a software fusion problem, not a hardware setting
Discussing the edit graph as immutable original + recipe
Mentioning RAW vs HEIF tradeoffs
Discussing thermal throttling and battery

Frequently Asked Questions

How do I handle a camera permission prompt smoothly?

Pre-prime with a screen explaining why you need camera access before triggering the system prompt. If the user denies, route them to Settings; do not crash the app.

What about real-time filters (Snapchat-style)?

Process the camera preview through a Metal/OpenGL shader pipeline. Each filter is a fragment shader. Heavy ML filters (face mesh) run on the Neural Engine and feed parameters to the shader.

Should I use AVFoundation/CameraX directly or a higher-level library?

Direct is right for a camera-first app. Libraries are fine for in-app QR scanners or simple capture flows where you don’t need ProRAW or burst.