System Design: Design Spotify — Music Streaming, Audio Encoding, Playlist, Recommendation, Offline Mode, CDN

Spotify serves 600+ million users across 180+ markets, streaming over 100 million tracks with sub-second start times. Designing a music streaming platform tests your understanding of audio encoding, content delivery, recommendation systems, and offline synchronization. This guide covers the architecture from audio ingestion to playback — a popular system design interview question at senior levels.

Audio Encoding and Storage

When a label uploads a track, the ingestion pipeline encodes it into multiple quality levels: Low (24 kbps AAC) for very slow connections, Normal (96 kbps OGG Vorbis), High (160 kbps OGG Vorbis), Very High (320 kbps OGG Vorbis), and Lossless (FLAC, ~1000 kbps) for premium subscribers. Each quality level is stored as a separate file in object storage (GCS). A 4-minute song at 320 kbps = ~10 MB. With 100 million tracks at 5 quality levels: ~2.5 PB of audio storage (before replication). Audio is segmented for streaming: instead of downloading the entire file, the player requests byte ranges. The first few seconds are pre-fetched for instant playback. The player buffers 30-60 seconds ahead. If the connection drops, buffered audio continues playing while the app attempts reconnection. DRM (Digital Rights Management): audio files are encrypted. The player obtains a decryption key from the license server on playback. Keys are session-scoped and device-bound. This prevents unauthorized redistribution while allowing seamless playback for paying subscribers.

Content Delivery for Audio

Audio CDN strategy: Spotify uses a CDN with edge servers in major cities. Popular tracks (top 1% of the catalog) are cached at edge servers — these represent 80%+ of all streams (extreme Pareto distribution in music). The long tail (millions of tracks rarely played) is served from regional caches or origin. Pre-caching: when a user opens the app, Spotify predicts which tracks they are likely to play next (based on current playlist, listening history, time of day) and pre-fetches the first 30 seconds of those tracks. By the time the user presses play, the audio is already cached locally. Crossfade and gapless playback: the player starts downloading the next track before the current one finishes. For gapless albums, the transition is seamless. For crossfade mode, both tracks overlap for a configurable duration. Bandwidth adaptation: similar to video ABR, the player monitors download speed. If bandwidth drops, it switches to a lower quality level for subsequent segments. The quality setting (Normal/High/Very High) sets the maximum — the player may use lower quality to prevent buffering.

Music Recommendation

Spotify recommendations power Discover Weekly, Release Radar, Daily Mixes, and the home page. Three approaches combined: (1) Collaborative filtering — users with similar listening histories receive similar recommendations. If User A and User B both listen to artists X, Y, Z, and User B also listens to W, recommend W to User A. Spotify uses matrix factorization on the user-track interaction matrix (billions of rows). (2) Content-based — analyze audio features (tempo, key, energy, danceability, acousticness) extracted by ML models from the raw audio waveform. Recommend tracks with similar audio features to tracks the user enjoys. This helps recommend new/unknown tracks that have no collaborative filtering data yet (the cold start problem). (3) Natural language processing — analyze playlists, reviews, blog posts, and social media to understand how tracks are described. “Chill Sunday morning vibes” and similar text descriptions create semantic embeddings that cluster similar tracks. Discover Weekly: generated every Monday. A 30-track playlist of songs the user has not heard but is predicted to enjoy. The pipeline runs offline (Spark on GCS), combining all three approaches. Engagement metric: skip rate. Tracks with high skip rates are deprioritized in future recommendations.

Offline Mode

Premium users can download tracks for offline playback. Architecture: the user selects a playlist or album for download. The app downloads all tracks at the user preferred quality, encrypted with a device-specific key. Downloaded files are stored in the app local storage with metadata (track info, album art, playlist order). Offline license: the app must periodically verify the user subscription status (every 30 days). If the subscription expires or the user does not connect to the internet within 30 days, offline playback is disabled. The license check uses a signed token with an expiration date. Storage management: display total space used by downloads. Allow quality selection for downloads (lower quality = less storage). Auto-remove tracks not played in 30+ days if storage is low. Sync: when the user comes back online, sync play counts, skip events, and listening history to the server. This data feeds the recommendation engine. Playlists modified offline (adding/removing tracks) are merged with server-side changes using last-write-wins per track with conflict detection for simultaneous modifications.

Playlist and Social Features

Playlists are the primary content organization unit. Data model: playlist_id, owner_id, name, description, cover_image, track_list (ordered list of track_ids), follower_count, collaborative (boolean), created_at, updated_at. Collaborative playlists: multiple users can add/remove tracks. Last-write-wins for simple operations (add/remove). The track list is an ordered list — concurrent reordering uses OT-like conflict resolution (simpler than text editing because the operations are coarser: add at position, remove at position). Social: users follow other users and artists. The activity feed shows what friends are listening to (opt-in privacy). Social listening: shared queues and group sessions where multiple users control the same playback (using a real-time sync protocol similar to collaborative editing). Artist pages: display discography, popular tracks (ranked by stream count), monthly listeners, and related artists. The artist page is a read-heavy page served from a denormalized cache — updated daily with new stream counts and monthly listener calculations from the analytics pipeline.

Scroll to Top