Design a Mobile Audiobook App: Background Playback and Sync

“Design an audiobook app” is the audio-only cousin of the music-streaming and podcast prompts. Audible, Libby, Spotify Audiobooks, Apple Books are the references. The interview tests whether you understand long-form audio playback, cross-device position sync, the ergonomics of the listening UX, and the constraints of background playback on mobile OS.

Clarify scope

  • Library + listening, or also purchase / borrowing?
  • Streaming, offline downloads, or both?
  • Speed control, sleep timer, chapter navigation?
  • Cross-device sync (phone, tablet, watch, web)?
  • CarPlay and Android Auto?

Audio playback architecture

  • iOS: AVAudioSession + AVPlayer (or AudioKit for more control)
  • Android: ExoPlayer / Media3 with MediaSession integration
  • Background-capable audio category configured at app launch
  • Lock-screen / Dynamic Island controls registered via MediaSession (Android) or MPNowPlayingInfoCenter (iOS)

Background playback constraints

  • iOS: enable “Audio, AirPlay, and Picture in Picture” background mode in Info.plist; the OS keeps the process alive while audio plays
  • Android: foreground service required for audio-while-locked playback; a notification is mandatory
  • Both: when audio stops, the OS may suspend or kill the process within seconds
  • Resume on Bluetooth headset reconnect, headphone re-plug, alarm finishing

Audio interruption handling

Phone call comes in, alarm fires, navigation announcement plays. The OS issues an interruption event. Your app should:

  • Pause cleanly on interruption begin
  • Resume on interruption end if the interruption was brief and the user was actively listening
  • Save current position regardless

Position sync — the senior signal

Cross-device sync is the killer feature. State to sync:

  • Current book, chapter, byte/time offset
  • Last-played timestamp (for conflict resolution)
  • Listening speed, sleep timer, bookmark list

Sync strategy:

  • Optimistic local update; periodic upload (every 30s of playback)
  • On app foreground: pull latest from server; if remote is newer than local, ask user “Continue from device X at 2:34:15?”
  • Conflict resolution: latest-write-wins is fine for a single-user product

Downloads

  • Per-chapter downloads, not whole-book — shorter chunks resume better
  • Background download via OS background-transfer APIs (URLSession on iOS, WorkManager on Android)
  • Automatic download of next chapter on Wi-Fi, configurable
  • Storage management: show usage, allow user to delete books from local
  • Encrypted at rest if licensed content (DRM)

Sleep timer

  • Fixed durations (15, 30, 45, 60 min) plus “end of chapter”
  • Smooth fade-out audio over the last 5–10 seconds
  • State persists if user backgrounds and returns

Speed control

  • 0.5× to 3.0× typical
  • Pitch correction (preserve voice naturalness) — AVAudioEngine on iOS, ExoPlayer can do it
  • Per-book preference saved (some users speed up only certain narrators)

Bookmarks and notes

  • Audio bookmark = (book ID, time offset, optional text note)
  • Sync alongside position
  • Export to user’s notes app or share by URL

CarPlay and Android Auto

  • iOS: CarPlay audio category supported by MPRemoteCommandCenter + CarPlay UI templates
  • Android Auto: Media browser + MediaSession; templates for Now Playing and library
  • Voice commands (“Play the next chapter”) via Siri/Assistant intent extensions
  • Constrained UI — no scroll lists past N items, no inputs requiring keyboard

Whispersync-style “last position from any device”

Audible’s Whispersync is the gold standard. Pattern:

  • On each playback start, fetch latest server position
  • If server position is newer than local last-known and meaningfully different, prompt the user to jump
  • Otherwise resume locally
  • On every position upload, server stamps the device that made the update

Performance

  • Audio buffers should sustain 30+ minutes of playback if network drops
  • Pre-fetch next chapter when current is 80% complete
  • Memory budget tight; release decoded buffers behind the playback head
  • Battery: avoid CPU-heavy effects (heavy DSP); rely on hardware acceleration

Edge cases interviewers love

  • Audio session reactivation after the user dismisses then reopens an alarm
  • User puts on AirPods after starting playback on speaker — route audio
  • Headphones unplugged — pause (industry convention)
  • Bluetooth disconnect — pause and surface a clear error
  • App killed while listening — restore on relaunch with a “Continue listening” CTA

Frequently Asked Questions

Audiobook vs podcast — what differs?

Audiobooks have stable position-of-record (you do not skip around as much), longer chapters, and stronger speed-control expectations. Podcasts have RSS-driven catalog updates and ad insertion. UX overlap is large; backend differs.

How do I handle DRM?

FairPlay on iOS, Widevine on Android. Decrypt at the playback boundary; never expose raw audio to the file system. License renewal happens periodically; handle expired-license errors gracefully.

What about variable narrator audio quality?

Encode at multiple bitrates; let the user pick or auto-select based on Wi-Fi vs cellular. Loudness normalization (LUFS-based) so chapters do not vary in volume.

Scroll to Top