--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
drafting spec…
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
---
name: sitemap.xml
type: document
status: running
version: 0.9
released: 2005-11-16
maintainer: sitemaps.org (formerly Google, Yahoo, Microsoft, Ask)
dependencies:
- HTTP server
- robots.txt
- at least one URL worth indexing
- existential confidence about your site structure
license: Sitemaps Protocol (open standard)
tags:
- xml
- seo
- crawling
- indexing
- infrastructure
- the-internet
---
A polite letter to search engines saying: "Here is everything. Please notice us."
You declare every URL on your site in a structured XML file. Googlebot and its colleagues read it, nod, and then crawl whatever they were going to crawl anyway. The file lives at /sitemap.xml by convention, gets referenced in robots.txt by etiquette, and gets ignored in ways you will never fully audit.
The format is simple enough to hand-write, complex enough that everyone uses a generator.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page</loc>
<lastmod>2024-01-01</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
<loc>: The only field that actually matters. Everything else is advisory.<lastmod>: Tells crawlers when you last changed a page. Believed approximately never.<changefreq>: A hint. Values range from always to never. Ignored at the crawler's discretion.<priority>: Relative importance, 0.0 to 1.0. Every site sets everything to 0.8. The field has learned nothing from this.changefreq: always on a page that changes monthly is technically lying to a robot.<priority> is so universally inflated it has ceased to carry signal. A rounding error with ambitions.<link rel="canonical"> tags cause quiet suffering.Max URLs per file: 50,000
Max file size: 50MB (uncompressed)
Encoding: UTF-8 (non-negotiable)
Submission: Google Search Console, Bing Webmaster Tools, or robots.txt
Compression: .xml.gz accepted and appreciated
| Code | Meaning |
|---|---|
FETCH_ERROR | Server returned non-200 for the sitemap itself. Embarrassing. |
INVALID_URL | A <loc> that does not resolve. Usually staging URLs that escaped. |
OVERSIZED | File exceeded limits. Split it. |
NOT_INDEXED | Submitted. Acknowledged. Ignored. This is normal. |
Does submitting a sitemap guarantee indexing? No. It guarantees Google knows you want to be indexed. hope is not a protocol.
Do I need one? Small sites with good internal linking can survive without one. Large sites, SPAs, and anything with pagination probably cannot.
Why XML? It was 2005. XML was having a moment. We do not talk about it.
0.9 (2005): Initial release. Three search engines agreed on a thing. Historic.0.9 (2024): Still 0.9. Some things are finished.