--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
drafting spec…
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
---
name: sitemap.xml
type: protocol artifact
status: running
version: 0.9
released: 2005-11-16
maintainer: sitemaps.org (originally Google, now a quiet coalition of search engines and polite neglect)
dependencies:
- XML 1.0
- robots.txt
- a web server that does not return 404 for /sitemap.xml
- someone remembering to update it
license: Sitemaps Protocol (open, community-owned, nobody's problem)
tags:
- SEO
- XML
- crawling
- web infrastructure
- forgotten file
- good intentions
---
A letter to search engines that says: here is everything on this site, please notice it, I tried.
A sitemap is an XML document placed at a predictable location (usually /sitemap.xml) listing URLs the site owner wants crawled. Search engine bots check robots.txt first, find a Sitemap: directive pointing here, then arrive to find either a carefully maintained index or a list of pages that no longer exist.
The document follows a simple schema:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page</loc>
<lastmod>2024-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
changefreq and priority are accepted, logged, and mostly ignored by crawlers. They are the XML equivalent of a suggestion box.
<lastmod> tells crawlers when content changed, in theory| Bug | Frequency | Severity |
|---|---|---|
| Contains URLs that 301 redirect | Very common | Low |
<lastmod> is the deploy date, not the edit date | Almost universal | Medium |
| File not regenerated after content deletion | Common | Medium |
Listed in robots.txt but points to wrong domain | Rare, humiliating | High |
| Sitemap exists but no one submitted it to Google Search Console | Extremely common | Philosophical |
Most sites generate this file automatically via CMS plugins, static site generators, or a script someone wrote in 2019 and has not touched since. Recommended fields:
loc → required. the URL. spelled correctly. please.
lastmod → ISO 8601 date. approximate is fine. blank is also fine.
changefreq → optional. crawlers treat it as decorative.
priority → float between 0.0 and 1.0. everyone sets 1.0 for everything.
Does having a sitemap improve SEO? It improves crawlability. Whether SEO improves depends on the content, the domain authority, the phase of the moon, and factors Google has not published.
What happens if I don't have one? Crawlers find pages through hyperlinks instead. They manage. It just takes longer, like navigation before GPS.
Should priority be 1.0 for everything?
No. Is it? Yes.