Sitemap XML Format
urlset with url entries: loc, lastmod, changefreq, priority.
Sitemap Index
For sites with >50K URLs, generate multiple sitemap files + index file referencing them.
SitemapEntry Table
SitemapEntry (
id,
url,
lastmod,
changefreq: always/hourly/daily/weekly/monthly/yearly/never,
priority DECIMAL 0.0-1.0,
sitemap_file_id
)
Generation Process
Query SitemapEntry ordered by priority DESC → batch into files of max 50K URLs → generate XML → gzip → upload to S3 (sitemap1.xml.gz, sitemap2.xml.gz) → generate sitemap-index.xml.
Incremental Update
On content publish/update, upsert SitemapEntry + re-generate only affected sitemap file.
Priority Rules
- Homepage: 1.0
- Category pages: 0.8
- Content pages: 0.7
- Tag pages: 0.5
Exclusions
Dynamic pages excluded: search results, user-specific pages, admin URLs.
robots.txt Integration
Auto-append Sitemap: URL line to robots.txt.
Search Engine Ping
GET request to Google/Bing ping URL after generation.
Schedule
Full regeneration weekly + incremental on every publish.
See also: Atlassian Interview Guide
See also: Shopify Interview Guide