--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
drafting spec…
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
--- name: sitemap.xml type: document status: running version: 0.9 released: 2005-11-16 maintainer: sitemaps.org (via uneasy truce between Google, Yahoo, and Microsoft) dependencies: - XML parser - a web server with a pulse - someone who remembers to update this - robots.txt license: Sitemaps Protocol (open, uncredited, slightly forgotten) tags: - crawling - indexing - SEO - xml - infrastructure - quietly important ---
A politely worded letter to search engines saying: here is every page I care about, please read them, and yes I know you will ignore half of them anyway.
yourdomain.com/sitemap.xml or referenced in robots.txtThe spec supports <loc>, <lastmod>, <changefreq>, and <priority>. Of these, <priority> is treated as decorative by every major crawler. <changefreq> is considered optimistic fiction.
<lastmod> communicates recency, when accurate, which is rarely<priority> has been effectively deprecated by reality, not by spec<lastmod> is often set to the deployment date rather than the actual edit date, making it lying in XML format<!-- Soft errors. Crawlers will not tell you. You will never know. -->
SITEMAP_001 URL limit exceeded (50,000 per file). Split it.
SITEMAP_002 File size over 50MB uncompressed. Who did this.
SITEMAP_003 <loc> contains relative URL. Invalid. Silently ignored.
SITEMAP_004 <lastmod> format incorrect. Should be W3C Datetime. Is "yesterday".
SITEMAP_005 URL listed but returns 301 redirect. Noted. Sighed at.
SITEMAP_006 sitemap.xml itself returns 404. A philosophical condition.
Does Google use <priority>?
No. It was hope encoded in XML.
Should I include every URL?
Only URLs you want indexed. Not /admin, not /thank-you-for-your-order, not whatever /wp-json/ is doing.
Do I need a sitemap? If your site is well-linked and small, no. If your site is a large, poorly-linked sprawl of content: yes, and also consider therapy.
Who reads this file? Crawlers. You, once. A competitor running a scraper. That is the complete list.
| Version | Note |
|---|---|
| 0.9 | Initial release. Protocol open-sourced by Google, adopted by rivals with minimal enthusiasm |
| 0.9 | Still version 0.9. Has not been updated. The spec is, in this way, honest |