System Design: Instagram/Photo Sharing — Image Upload, News Feed, Stories, Explore Page, CDN, Image Processing

Designing a photo-sharing platform like Instagram tests your ability to handle media-heavy workloads, social graph queries, feed generation, and content discovery. Instagram processes over 100 million photo uploads per day and serves billions of feed requests. This guide covers the end-to-end architecture from image upload to feed delivery — with depth expected at senior engineering interviews.

Image Upload Pipeline

Upload flow: (1) The client requests a presigned S3 upload URL from the backend. The backend generates the URL with constraints (max file size 10MB, allowed content types: JPEG, PNG, HEIC). (2) The client uploads the image directly to S3 using the presigned URL. This bypasses the application server — no bandwidth or CPU consumed on the backend for the raw upload. (3) An S3 event notification triggers an image processing Lambda function. (4) The Lambda function: validates the image (checks for corruption, runs content moderation via AWS Rekognition or a custom ML model), generates multiple resized versions (150×150 thumbnail, 640×640 feed, 1080×1080 full), converts to WebP/AVIF for modern browsers while keeping JPEG as fallback, strips EXIF data (privacy — remove GPS location unless the user explicitly adds a location), and stores all versions in S3 with a deterministic key pattern: images/{user_id}/{post_id}/{size}.webp. (5) After processing completes, the Lambda publishes a “post ready” event to Kafka. (6) The post service consumes the event, creates the post record in the database (post_id, user_id, image_urls, caption, location, created_at), and triggers feed fanout.

News Feed Generation

Instagram uses a hybrid fanout approach similar to Twitter. When a user with fewer than 10,000 followers posts: the fanout service pushes the post_id to each follower timeline cache (Redis sorted set, score = timestamp). When a celebrity (>10,000 followers) posts: skip fanout. Their posts are fetched at read time and merged with the pre-computed timeline. Feed loading: the client requests GET /feed?cursor=last_post_id. The backend reads the pre-computed timeline from Redis (ZREVRANGEBYSCORE for the next page of post_ids), fetches celebrity posts from the celebrity post cache, merges and ranks by the ML ranking model, hydrates the post_ids into full post objects (image URLs, captions, like counts, author info — fetched from cache or database), and returns the hydrated feed. Pagination: cursor-based using the post_id (which is time-sorted via Snowflake). The client sends the last_post_id from the previous page, and the backend returns posts older than that ID. This is stable under concurrent inserts (unlike offset pagination).

Stories Architecture

Stories are ephemeral content that disappear after 24 hours. Architecture differences from feed: (1) TTL-based storage — stories are stored with a 24-hour TTL. After expiration, they are deleted from the active store (moved to archive if the user has “Highlights” enabled). (2) Stories tray — the horizontal list of story circles at the top of the feed. This is a separate data structure: for each user the viewer follows, check if they have active stories (posted within 24 hours). Sort by: unseen stories first, then by recency. Pre-compute the stories tray per user and cache it. Invalidate when a followed user posts a new story or a story expires. (3) Viewing order — within one user stories, show in chronological order (oldest first). Between users, show unseen stories first, then stories from users the viewer engages with most. (4) View tracking — when a user views a story, record the view (viewer_id, story_id, timestamp). The story creator sees the view count and viewer list. This generates massive write volume (a celebrity story with 10M views = 10M write operations). Batch writes and use a counter service (Redis INCR for real-time count, Kafka + batch write for the view list).

Explore Page and Content Discovery

The Explore page shows personalized content from accounts the user does not follow. This is a recommendation system. Architecture: (1) Candidate generation — generate a pool of thousands of candidate posts from: posts liked by users similar to the viewer (collaborative filtering), posts popular in the viewer geographic region, posts with high engagement rates in topics the viewer has interacted with. Use an embedding model to represent users and posts in the same vector space; retrieve posts with embeddings close to the user embedding (approximate nearest neighbor search using FAISS or Pinecone). (2) Ranking — an ML model scores each candidate for the specific user. Features: post engagement rate, author-viewer affinity, content type preference, recency. The model predicts the probability of engagement (like, comment, save, share). (3) Filtering — remove posts from blocked users, posts violating community guidelines (content moderation), and posts the user has already seen. (4) Diversification — ensure the explore page shows varied content (not all food photos even if the user likes food). Inject posts from different categories. The explore page is computationally expensive (ML inference for each user) — pre-compute candidate pools and cache rankings with a 15-30 minute refresh cycle.

CDN and Image Serving

Images are served via CDN (CloudFront, Fastly, or Akamai). When a client requests an image: the CDN edge server checks its cache. On cache hit (90%+ of requests for popular content), return immediately from the edge — sub-10ms latency. On cache miss, the CDN fetches from the S3 origin, caches it, and returns to the client. Image URL format: cdn.instagram.com/images/{user_id}/{post_id}/640.webp. The URL encodes the size, allowing the client to request the appropriate size for the device (150px thumbnail for the grid, 640px for the feed, 1080px for full-screen). Format negotiation: the CDN or a Cloudflare Worker checks the Accept header. If the browser supports AVIF, serve AVIF (50% smaller than JPEG). If WebP, serve WebP (25% smaller). Otherwise, serve JPEG. Vary: Accept header ensures correct caching. Bandwidth savings: Instagram serves approximately 1 billion images per hour. WebP/AVIF saves 25-50% bandwidth compared to JPEG — saving petabytes of transfer per day and reducing page load times for users on slow connections. Cache invalidation: images are immutable (a new post gets a new URL). Deleted posts: the CDN URL is removed, and the CDN serves 404 after its cache TTL expires.

Social Graph and Interactions

The social graph stores follow relationships. Query patterns: “who does user A follow?” (following list), “who follows user A?” (follower list), “does user A follow user B?” (relationship check). Storage: a wide-column store (Cassandra) or a graph database (TAO at Meta). Partition by both follower_id and followee_id for bi-directional queries. Cache hot relationships in Redis. Likes: each post has a like count and a set of users who liked it. Like count: Redis INCR for real-time updates, periodically flushed to the database. Like check (“did I like this post?”): Redis SET per post containing user_ids who liked it. For posts with millions of likes, use a Bloom filter for the “did I like it?” check and store the full list in Cassandra. Comments: stored per post in a database, paginated by timestamp. Cache the first N comments per post (displayed in the feed). Notifications: likes, comments, follows, and mentions generate notifications. The notification service consumes events from Kafka and delivers via push notification (APNs/FCM) and in-app notification feed.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does Instagram handle image uploads without overloading application servers?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Instagram uses presigned S3 URLs to bypass the application server entirely for raw image bytes. Flow: (1) Client requests an upload URL from the backend. (2) Backend generates a presigned PUT URL with constraints (max 10MB, JPEG/PNG/HEIC only). (3) Client uploads directly to S3. The backend never handles the image bytes — saving bandwidth and CPU. (4) An S3 event triggers a Lambda function that validates the image, runs content moderation (AWS Rekognition), generates multiple sizes (150×150 thumbnail, 640×640 feed, 1080×1080 full), converts to WebP/AVIF, strips EXIF GPS data for privacy, and stores all versions in S3. (5) Lambda publishes a post-ready event to Kafka. (6) The post service creates the database record and triggers feed fanout. This pipeline handles 100M+ uploads per day with minimal application server load.”}},{“@type”:”Question”,”name”:”How does the Instagram Explore page generate personalized recommendations?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The Explore page shows content from accounts the user does not follow, personalized to their interests. Three stages: (1) Candidate generation — produce thousands of candidate posts using collaborative filtering (posts liked by similar users), geographic popularity, topic-based matching (user interested in cooking sees cooking content), and embedding-based retrieval (user and post embeddings in the same vector space, approximate nearest neighbor search). (2) Ranking — an ML model scores each candidate for the specific user, predicting engagement probability (like, comment, save, share). Features include post engagement rate, author-viewer affinity, content type preference, and recency. (3) Filtering and diversification — remove posts from blocked users, posts violating guidelines, and already-seen content. Inject variety to avoid showing all the same topic. The Explore page is pre-computed and cached with a 15-30 minute refresh cycle due to the computational cost of ML inference.”}},{“@type”:”Question”,”name”:”How does Instagram serve billions of images per hour efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”CDN (CloudFront/Fastly/Akamai) serves all images. The CDN edge server caches images close to users. For popular content, cache hit rate exceeds 90% — sub-10ms response from edge. On cache miss, the CDN fetches from S3 origin, caches, and serves. Format negotiation: the CDN checks the Accept header and serves AVIF (50% smaller than JPEG) to supported browsers, WebP (25% smaller) as fallback, and JPEG as universal fallback. The Vary: Accept header ensures correct per-format caching. Image URLs encode the size: cdn.example.com/images/{user_id}/{post_id}/640.webp. Clients request the appropriate size for their device. Bandwidth savings from WebP/AVIF are enormous at Instagram scale — petabytes per day. Images are immutable (new post = new URL), so cache invalidation is simple. Deleted posts return 404 after CDN cache TTL expires.”}},{“@type”:”Question”,”name”:”How do Instagram Stories handle ephemeral 24-hour content?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Stories differ from regular posts in key ways: (1) TTL-based storage — stories have a 24-hour TTL. After expiration, they are deleted from active storage (moved to archive for Highlights). (2) Stories tray — the horizontal bar of story circles. Pre-computed per user: for each followed account, check for active stories. Sort by unseen first, then by engagement recency. Cache the tray and invalidate when a followed user posts or a story expires. (3) View tracking — each story view (viewer_id, story_id, timestamp) is recorded. Celebrity stories with 10M views create massive write volume. Use Redis INCR for real-time count and Kafka + batch writes for the full view list. (4) Ordering — within a user stories, show chronologically (oldest first). Between users, show unseen stories first, then stories from users the viewer engages with most.”}}]}
Scroll to Top