Low-Level Design: Blog Platform — Content Management, Comments, and SEO-Friendly URLs

Core Entities

User: user_id, username (unique, URL-safe), email, password_hash, display_name, bio, avatar_url, role (READER, AUTHOR, EDITOR, ADMIN), created_at. Post: post_id, author_id, title, slug (unique, URL-safe: “my-first-post”), body (Markdown or HTML), excerpt (auto-generated or manual), status (DRAFT, PUBLISHED, ARCHIVED), featured_image_url, published_at, created_at, updated_at, view_count, reading_time_minutes. Tag: tag_id, name, slug. PostTag: post_id, tag_id. Comment: comment_id, post_id, author_id (nullable for guests), author_name (for guests), author_email, body, status (PENDING, APPROVED, SPAM), parent_id (nullable for threading), created_at. PostRevision: revision_id, post_id, author_id, body_snapshot, title_snapshot, created_at, change_summary.

SEO-Friendly URLs and Slug Generation

Posts are accessed via /author-username/post-slug (e.g., /john/how-to-design-systems). Slug generation: convert title to lowercase, replace spaces with hyphens, strip non-alphanumeric characters except hyphens, truncate to 100 characters. Ensure uniqueness per author: if slug already exists, append -2, -3, etc. Redirect handling: when a post’s title (and thus slug) changes, keep the old slug mapping in a PostSlug table: (slug, post_id, is_primary). The old slug returns HTTP 301 to the new slug. Never return 404 for previously valid URLs — search engines penalize broken links. Canonical URL: set the canonical link tag to the primary slug URL to consolidate SEO credit if multiple paths reach the same post. RSS feed: generate /feed.xml for each author and globally. Cache the RSS XML (regenerate only on new post publish). Sitemap: generate /sitemap.xml with all published posts, tags, and author pages. Refresh daily. Submit to Google Search Console.

Draft and Publishing Workflow

class PostService:
    def publish_post(self, post_id: int, publisher_id: int) -> Post:
        post = self.repo.get_post(post_id)
        if post.status != PostStatus.DRAFT:
            raise InvalidStatusError()
        if post.author_id != publisher_id and not self.is_editor(publisher_id):
            raise PermissionError()

        # Compute auto-fields
        post.reading_time_minutes = self._estimate_reading_time(post.body)
        post.excerpt = post.excerpt or self._generate_excerpt(post.body)
        post.published_at = datetime.utcnow()
        post.status = PostStatus.PUBLISHED

        self.repo.save(post)

        # Clear post cache
        self.cache.delete(f"post:{post.slug}")
        self.cache.delete("recent_posts")
        self.cache.delete(f"author_posts:{post.author_id}")

        # Trigger async jobs
        self.events.publish("post.published", {
            "post_id": post.post_id,
            "author_id": post.author_id
        })
        return post

The post.published event triggers: notification to subscribers (newsletter send), ping to search engines (IndexNow API for instant indexing), social media auto-post (if configured), CDN cache warmup.

Comments with Moderation

Comment workflow: (1) Guest/user submits comment → status=PENDING. (2) Spam check: run through Akismet API (industry standard for blog spam). If spam: status=SPAM (not shown). (3) Auto-approve trusted commenters: users who have had 2+ comments approved are auto-approved (status=APPROVED immediately). First-time commenters wait for moderation. (4) Author/editor moderation queue: PENDING comments are visible in the admin dashboard. Approve or mark as spam. Email notification to the post author when a new comment awaits moderation. Threading: store parent_id on each comment. Load top-level comments first; load replies on expand. Flat representation in the database, tree representation in the UI. Comment count: cache the approved comment count per post in Redis. Increment on approve, decrement on delete. Avoid counting in SQL on every page load. Anti-spam: rate limit by IP (max 3 comments per 10 minutes), require JavaScript challenge (honeypot field), validate email format for guests.

Post Discovery and Recommendations

Tag-based discovery: posts are tagged with topics. Tag page (/tag/system-design) shows all posts for a tag sorted by published_at DESC. Index: (tag_id, published_at DESC) on PostTag join with Posts. Author page (/john) shows all of an author’s published posts. Related posts: after reading a post, show 3-5 related posts. Simple approach: posts sharing the most tags with the current post. Query: find posts sharing any tag, rank by number of shared tags, return top 5. More sophisticated: TF-IDF similarity on post content (compute offline, store in a related_posts cache). Cache related posts per post_id for 24 hours. Trending posts: posts with the most views in the last 7 days. Maintain a sorted set in Redis: ZINCRBY trending:posts 1 post_id on each page view. Refresh daily from the database for accuracy (Redis count is approximate). Full-text search: Elasticsearch index for searching across titles, bodies, and tags. Autocomplete suggestions from a completion suggester on post titles.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you handle URL slug changes without breaking existing links?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When a post title changes, the slug changes (SEO-friendly URL changes). Existing links from other sites, bookmarks, and search engine indexes should continue to work. Solution: store all historical slugs for a post. Schema: PostSlug table: (slug, post_id, is_primary, created_at). On slug change: mark the old slug is_primary=false, insert the new slug as is_primary=true. URL resolution: when serving /:author/:slug, look up the slug in PostSlug. If is_primary=false: return HTTP 301 redirect to the current primary slug URL. If is_primary=true: serve the post. Canonical link: in the HTML , always set . This prevents duplicate content penalties even if old URLs are still being accessed. Slug uniqueness: enforce unique (author_id, slug) across ALL slugs (not just primary). This prevents a new post from claiming a slug that redirects to an old post. Index: (slug, author_id) for fast resolution. 410 Gone: if a post is deleted (not just archived): return 410 Gone for old slugs (tells search engines the content is permanently removed, not just moved).”
}
},
{
“@type”: “Question”,
“name”: “How do you implement a rich text editor for blog posts with image uploads?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Rich text editors for blogs typically use ProseMirror (the basis of Tiptap and Notion), Quill, or TipTap. These serialize content as JSON (a structured document model) or HTML. Recommendation: store as JSON (portable, structured, queryable) and render to HTML at display time. Image upload flow in the editor: (1) User pastes or drags an image into the editor. (2) Editor captures the image file (Blob). (3) Client requests a pre-signed S3 upload URL from the server. (4) Client uploads directly to S3. (5) On upload completion: client inserts an image node in the document with the S3 URL. Server-side image processing: on S3 upload notification: (1) Validate the file (MIME type, max size 10MB, scan for malware with ClamAV or AWS Rekognition). (2) Resize to multiple sizes (thumbnail 300px, medium 800px, full size) using Pillow or Sharp. (3) Convert to WebP for smaller file sizes. (4) Store all sizes; use the medium size as the default in blog posts. CDN: serve images through CloudFront or Cloudflare with caching headers (Cache-Control: max-age=31536000, immutable) — image URLs are content-addressed (hash in the filename), so they never need to be invalidated.”
}
},
{
“@type”: “Question”,
“name”: “How do you design the email newsletter subscription system for a blog?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Newsletter subscriptions: readers subscribe to an author’s blog. On new post publish: send emails to all subscribers. Schema: Subscription: subscription_id, subscriber_email, author_id, status (ACTIVE, UNSUBSCRIBED, BOUNCED), confirmed (boolean for double opt-in), subscribed_at, token (random token for unsubscribe links). Double opt-in: on subscribe, send a confirmation email with a link containing the token. Only set confirmed=true after the link is clicked. Prevents spam sign-ups (someone else’s email being subscribed without consent). CAN-SPAM/GDPR compliance: include an unsubscribe link in every email. Unsubscribe = one-click (no login required). Process the unsubscribe via a token link: GET /unsubscribe?token=xxx u2192 set status=UNSUBSCRIBED. NEVER re-subscribe a user who unsubscribed without explicit re-consent. Send time: when a new post is published, enqueue a newsletter job. The job sends emails in batches (50 emails/second via SES API — stay within rate limits). For 10K subscribers: ~3 minutes to send all emails. Track bounces: hard bounces (invalid email) u2192 set status=BOUNCED, never send again. Soft bounces (mailbox full) u2192 retry 3 times, then BOUNCED. Use SES bounce notifications via SNS.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement reading time estimation and auto-generated excerpts?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Reading time: based on the average adult reading speed of 200-250 words per minute. Algorithm: strip HTML tags from post body, split by whitespace, count words. reading_time_minutes = ceil(word_count / 200). Account for images: each image adds ~10 seconds (0.17 minutes) of viewing time. Formula: reading_time_minutes = ceil(word_count / 200 + image_count * 0.17). Display: “5 min read.” Store on the Post row so it doesn’t need to be recomputed on each page view. Recompute when the post body is updated. Auto-generated excerpt: take the first 150-200 characters of the stripped text (no HTML). Truncate at a word boundary (don’t cut mid-word). Append “…” if truncated. This excerpt is used: in post listings, in RSS feed summaries, in the og:description meta tag (for social sharing previews). If the author manually sets an excerpt: use that instead (override). Store in the excerpt column. Excerpt length: 150-160 characters is the target — matching search engine meta description length. Longer excerpts are truncated by search engines anyway.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle SEO meta tags and Open Graph for a blog platform?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “SEO meta tags: title tag: use post.title | site_name (max 60 characters). meta description: use post.excerpt (max 160 characters). meta robots: published posts = index,follow. Draft and archived posts = noindex,nofollow. Open Graph (og:) tags for social sharing: og:title, og:description (post.excerpt), og:image (post.featured_image_url or a generated social card), og:type = article, og:url = canonical URL, og:article:author = author profile URL, og:article:published_time = ISO-8601 publish date. Twitter Card meta tags: twitter:card = summary_large_image (shows a large preview image), twitter:title, twitter:description, twitter:image, twitter:creator = @author_twitter_handle. JSON-LD structured data: Article schema with author, datePublished, dateModified, image, headline. FAQPage schema for posts with FAQ sections (improves search result display with Q&A rich snippets). Generated at render time and cached. Cache invalidation: when a post is updated (title, excerpt, featured image changed), purge the CDN page cache for that URL. CDN edge caching: cache the full rendered HTML at the edge (CDN) for 60 seconds. Blog posts are read-heavy; CDN caching absorbs most traffic without hitting the origin.”
}
}
]
}

Asked at: Shopify Interview Guide

Asked at: Twitter/X Interview Guide

Asked at: Snap Interview Guide

Asked at: Cloudflare Interview Guide

Scroll to Top