Design a Mobile Camera App: HDR, Burst, and On-Device Editing

“Design a camera app” is a senior-level mobile-system-design question that probes whether you understand the capture pipeline, on-device ML for computational photography, and the data flow from sensor to photo library. iPhone and Pixel cameras are software products as much as hardware; the interview tests whether you can articulate that.

Clarify scope

  • Stock camera or social camera (Instagram, Snapchat)?
  • Photo only, or also video and slow-mo?
  • Real-time filters/effects?
  • HDR, Night mode, Live Photo, ProRAW?
  • Editor included, or hand off to Photos?

The capture pipeline

  1. Sensor → ISP (image signal processor) hardware: demosaic, white balance, noise reduction
  2. Multi-frame buffer: the app captures a rolling buffer of recent frames so the “shutter” is zero-latency
  3. On shutter: select frames, fuse (HDR / Night), denoise, sharpen
  4. Encode: HEIF/JPEG/RAW
  5. Save to photo library; trigger thumbnail generation

HDR and multi-frame fusion

Single-exposure capture has limited dynamic range. Modern cameras capture multiple frames at different exposures (and sometimes the same exposure for noise) and fuse them. Steps:

  • Align frames (tiny camera shake between frames; use feature matching or hardware gyro hints)
  • Compute per-pixel weights based on motion and exposure
  • Blend in linear (not sRGB) space
  • Tone-map back to display gamut

Night mode

Long-exposure single shot is bad on a handheld phone (motion blur). Night mode captures many short exposures, aligns and stacks them, and tone-maps. The capture indicator shows a 3–10 s “hold still” countdown. Done on the GPU/Neural Engine.

Burst

Hold the shutter (or press volume button) and the app captures frames continuously. Each frame is timestamped and stored as a “burst” group in the photo library. The app shows the user the best frame (smile/eyes-open detection) by default; user can pick any frame from the group.

Live Photo

The pre-roll buffer is part of the magic. The camera is constantly capturing; on shutter, the app saves 1.5 s before and 1.5 s after as a 3-second clip stored alongside the still. Storage is one HEIF still + one HEVC clip referenced from the same asset.

RAW vs HEIF

  • HEIF: small, processed, ready to share
  • RAW (DNG / ProRAW): large, sensor-level data, lets editors recover shadows and highlights
  • Apple ProRAW: a hybrid — Bayer data plus ISP processing metadata, so you get RAW flexibility with computational benefits

On-device ML

  • Smart HDR: per-pixel deep learning model decides exposure blending
  • Photonic Engine / Deep Fusion: fuse frames at multiple resolutions guided by an ML model
  • Portrait mode: depth from dual camera or ML; segmentation refines the matte
  • Subject detection: faces, pets, text — drives focus, exposure, and album organization

The editor

Non-destructive edit graph. Each adjustment (exposure, contrast, crop, filter) is a node; the original is preserved. Save = serialize the graph. On read, the renderer applies the graph to the original and produces the displayed image. Undo/redo = navigate the graph history.

Storage and library

  • Photo library is the system feature; your app writes via PHPhotoLibrary (iOS) or MediaStore (Android)
  • Thumbnails generated on save and cached at multiple sizes
  • iCloud Photos / Google Photos sync is handled by the OS, not your app

Performance and battery

  • Hand off heavy work to the Neural Engine / GPU; CPU is for orchestration
  • Drop the preview frame rate when the device is hot
  • Defer noncritical work (thumbnail generation, ML scene classification) to idle

What interviewers reward

  • Naming the rolling pre-roll buffer (the “zero-shutter-lag” trick)
  • Explaining HDR as a software fusion problem, not a hardware setting
  • Discussing the edit graph as immutable original + recipe
  • Mentioning RAW vs HEIF tradeoffs
  • Discussing thermal throttling and battery

Frequently Asked Questions

How do I handle a camera permission prompt smoothly?

Pre-prime with a screen explaining why you need camera access before triggering the system prompt. If the user denies, route them to Settings; do not crash the app.

What about real-time filters (Snapchat-style)?

Process the camera preview through a Metal/OpenGL shader pipeline. Each filter is a fragment shader. Heavy ML filters (face mesh) run on the Neural Engine and feed parameters to the shader.

Should I use AVFoundation/CameraX directly or a higher-level library?

Direct is right for a camera-first app. Libraries are fine for in-app QR scanners or simple capture flows where you don’t need ProRAW or burst.

Scroll to Top