Low Level Design: Sitemap Generator Service

Sitemap XML Format

urlset with url entries: loc, lastmod, changefreq, priority.

Sitemap Index

For sites with >50K URLs, generate multiple sitemap files + index file referencing them.

SitemapEntry Table

SitemapEntry (
  id,
  url,
  lastmod,
  changefreq: always/hourly/daily/weekly/monthly/yearly/never,
  priority DECIMAL 0.0-1.0,
  sitemap_file_id
)

Generation Process

Query SitemapEntry ordered by priority DESC → batch into files of max 50K URLs → generate XML → gzip → upload to S3 (sitemap1.xml.gz, sitemap2.xml.gz) → generate sitemap-index.xml.

Incremental Update

On content publish/update, upsert SitemapEntry + re-generate only affected sitemap file.

Priority Rules

  • Homepage: 1.0
  • Category pages: 0.8
  • Content pages: 0.7
  • Tag pages: 0.5

Exclusions

Dynamic pages excluded: search results, user-specific pages, admin URLs.

robots.txt Integration

Auto-append Sitemap: URL line to robots.txt.

Search Engine Ping

GET request to Google/Bing ping URL after generation.

Schedule

Full regeneration weekly + incremental on every publish.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Atlassian Interview Guide

See also: Shopify Interview Guide

Scroll to Top