TikTok’s mobile feed is one of the hardest moving targets in mobile engineering. Vertical full-screen video, instant-tap scroll, autoplay with audio, sub-100ms perceived latency between videos. The interviewer wants to see if you understand video decoding pipelines, prefetch budgets, and the tradeoff between data usage and snap-to-the-next-video responsiveness.
Functional requirements
- Vertical scrolling feed of short-form videos (15s–3min)
- Autoplay with sound
- Like, comment, share, follow
- Effects, filters, and the camera (out of scope for this question)
Non-functional
- Feed first video starts within 500ms of cold start
- Swipe-to-next video plays within 100ms
- Smooth 60fps scroll
- Reasonable cellular data usage with data-saver mode
Architecture
The unit of work is a “video card.” A scroll list of cards, each with its own player, but only ~3 players ever active (current, next, prev) — others are idle.
Player pool
Three-player carousel: current, next, prev. As the user scrolls down:
- Current → prev (kept around for swipe-back)
- Next → current (already prepared, plays immediately)
- A new player is allocated for the new “next” and starts prefetching
Reusing players avoids the cold-start cost of allocating AVPlayer/ExoPlayer instances, which can be tens of milliseconds.
Prefetch budget
The next video is prefetched to ~500KB or first 2 seconds, whichever comes first. We do not download the entire next video — that wastes data. The server returns a manifest with byte-range hints so the client can do range requests.
HLS or progressive MP4?
Both work. HLS lets you switch quality on the fly but has manifest overhead. Many short-form apps use progressive MP4 with a single-bitrate fast-start CDN — the manifest cost outweighs the bitrate-switching benefit at ~15s clip length.
Data saver
On cellular with data-saver, default to lower bitrate (480p), pause prefetch, and disable autoplay until the user explicitly taps.
Caching
Disk cache up to ~200MB. LRU eviction. Videos already watched are kept longer to support swipe-back. The same CDN URL hits the cache; CDN returns appropriate Cache-Control headers.
Telemetry
Client logs every video impression, watch duration, and skip event. Batched and uploaded over 30s windows to minimize battery and network. The recommendation system is served entirely server-side; the client only reports signals.
Frequently Asked Questions
Why does TikTok feel so much faster than other video apps?
Aggressive prefetch, dedicated player pool, and tight integration with the recommendation feed (next video is always known and pre-warmed). Most general-purpose video apps cannot do this because they do not control the playlist.
How do you handle network drops mid-video?
Buffer enough that a 5-second drop is invisible. If buffer underruns, pause and show a small spinner; resume automatically when bandwidth returns.
What is the right video format for short-form mobile?
H.264 progressive MP4 with moov atom at the front (fast-start). H.265/HEVC for newer devices to save ~30% bandwidth, with H.264 fallback for compatibility.