Designing a photo-sharing platform like Instagram tests your ability to handle media-heavy workloads, social graph queries, feed generation, and content discovery. Instagram processes over 100 million photo uploads per day and serves billions of feed requests. This guide covers the end-to-end architecture from image upload to feed delivery — with depth expected at senior engineering interviews.
Image Upload Pipeline
Upload flow: (1) The client requests a presigned S3 upload URL from the backend. The backend generates the URL with constraints (max file size 10MB, allowed content types: JPEG, PNG, HEIC). (2) The client uploads the image directly to S3 using the presigned URL. This bypasses the application server — no bandwidth or CPU consumed on the backend for the raw upload. (3) An S3 event notification triggers an image processing Lambda function. (4) The Lambda function: validates the image (checks for corruption, runs content moderation via AWS Rekognition or a custom ML model), generates multiple resized versions (150×150 thumbnail, 640×640 feed, 1080×1080 full), converts to WebP/AVIF for modern browsers while keeping JPEG as fallback, strips EXIF data (privacy — remove GPS location unless the user explicitly adds a location), and stores all versions in S3 with a deterministic key pattern: images/{user_id}/{post_id}/{size}.webp. (5) After processing completes, the Lambda publishes a “post ready” event to Kafka. (6) The post service consumes the event, creates the post record in the database (post_id, user_id, image_urls, caption, location, created_at), and triggers feed fanout.
News Feed Generation
Instagram uses a hybrid fanout approach similar to Twitter. When a user with fewer than 10,000 followers posts: the fanout service pushes the post_id to each follower timeline cache (Redis sorted set, score = timestamp). When a celebrity (>10,000 followers) posts: skip fanout. Their posts are fetched at read time and merged with the pre-computed timeline. Feed loading: the client requests GET /feed?cursor=last_post_id. The backend reads the pre-computed timeline from Redis (ZREVRANGEBYSCORE for the next page of post_ids), fetches celebrity posts from the celebrity post cache, merges and ranks by the ML ranking model, hydrates the post_ids into full post objects (image URLs, captions, like counts, author info — fetched from cache or database), and returns the hydrated feed. Pagination: cursor-based using the post_id (which is time-sorted via Snowflake). The client sends the last_post_id from the previous page, and the backend returns posts older than that ID. This is stable under concurrent inserts (unlike offset pagination).
Stories Architecture
Stories are ephemeral content that disappear after 24 hours. Architecture differences from feed: (1) TTL-based storage — stories are stored with a 24-hour TTL. After expiration, they are deleted from the active store (moved to archive if the user has “Highlights” enabled). (2) Stories tray — the horizontal list of story circles at the top of the feed. This is a separate data structure: for each user the viewer follows, check if they have active stories (posted within 24 hours). Sort by: unseen stories first, then by recency. Pre-compute the stories tray per user and cache it. Invalidate when a followed user posts a new story or a story expires. (3) Viewing order — within one user stories, show in chronological order (oldest first). Between users, show unseen stories first, then stories from users the viewer engages with most. (4) View tracking — when a user views a story, record the view (viewer_id, story_id, timestamp). The story creator sees the view count and viewer list. This generates massive write volume (a celebrity story with 10M views = 10M write operations). Batch writes and use a counter service (Redis INCR for real-time count, Kafka + batch write for the view list).
Explore Page and Content Discovery
The Explore page shows personalized content from accounts the user does not follow. This is a recommendation system. Architecture: (1) Candidate generation — generate a pool of thousands of candidate posts from: posts liked by users similar to the viewer (collaborative filtering), posts popular in the viewer geographic region, posts with high engagement rates in topics the viewer has interacted with. Use an embedding model to represent users and posts in the same vector space; retrieve posts with embeddings close to the user embedding (approximate nearest neighbor search using FAISS or Pinecone). (2) Ranking — an ML model scores each candidate for the specific user. Features: post engagement rate, author-viewer affinity, content type preference, recency. The model predicts the probability of engagement (like, comment, save, share). (3) Filtering — remove posts from blocked users, posts violating community guidelines (content moderation), and posts the user has already seen. (4) Diversification — ensure the explore page shows varied content (not all food photos even if the user likes food). Inject posts from different categories. The explore page is computationally expensive (ML inference for each user) — pre-compute candidate pools and cache rankings with a 15-30 minute refresh cycle.
CDN and Image Serving
Images are served via CDN (CloudFront, Fastly, or Akamai). When a client requests an image: the CDN edge server checks its cache. On cache hit (90%+ of requests for popular content), return immediately from the edge — sub-10ms latency. On cache miss, the CDN fetches from the S3 origin, caches it, and returns to the client. Image URL format: cdn.instagram.com/images/{user_id}/{post_id}/640.webp. The URL encodes the size, allowing the client to request the appropriate size for the device (150px thumbnail for the grid, 640px for the feed, 1080px for full-screen). Format negotiation: the CDN or a Cloudflare Worker checks the Accept header. If the browser supports AVIF, serve AVIF (50% smaller than JPEG). If WebP, serve WebP (25% smaller). Otherwise, serve JPEG. Vary: Accept header ensures correct caching. Bandwidth savings: Instagram serves approximately 1 billion images per hour. WebP/AVIF saves 25-50% bandwidth compared to JPEG — saving petabytes of transfer per day and reducing page load times for users on slow connections. Cache invalidation: images are immutable (a new post gets a new URL). Deleted posts: the CDN URL is removed, and the CDN serves 404 after its cache TTL expires.
Social Graph and Interactions
The social graph stores follow relationships. Query patterns: “who does user A follow?” (following list), “who follows user A?” (follower list), “does user A follow user B?” (relationship check). Storage: a wide-column store (Cassandra) or a graph database (TAO at Meta). Partition by both follower_id and followee_id for bi-directional queries. Cache hot relationships in Redis. Likes: each post has a like count and a set of users who liked it. Like count: Redis INCR for real-time updates, periodically flushed to the database. Like check (“did I like this post?”): Redis SET per post containing user_ids who liked it. For posts with millions of likes, use a Bloom filter for the “did I like it?” check and store the full list in Cassandra. Comments: stored per post in a database, paginated by timestamp. Cache the first N comments per post (displayed in the feed). Notifications: likes, comments, follows, and mentions generate notifications. The notification service consumes events from Kafka and delivers via push notification (APNs/FCM) and in-app notification feed.