--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
drafting spec…
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
--- name: sitemap.xml type: file / protocol / digital artifact status: running version: 0.9 released: 2005-11-16 maintainer: sitemaps.org (originally Google, now a consortium of search engines pretending to agree) dependencies: - XML - robots.txt - a web server that won't return a 403 - someone remembering to regenerate this after the redesign license: Unlicense (the format, not your sins) tags: - SEO - XML - crawling - indexing - infrastructure - the thing nobody updates ---
A promise, in XML, that you have organized your website. Search engines will politely read it and then do whatever they were going to do anyway.
You enumerate every URL on your site inside <url> tags, optionally lying about how often the page changes (changefreq) and how important it is (priority). A crawler arrives, reads the file, nods slowly, and indexes whatever it finds interesting. The sitemap is advisory. The crawler is sovereign.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page-we-are-proud-of</loc>
<lastmod>2024-01-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>https://example.com/blog/post-from-2011</loc>
<lastmod>2011-03-04</lastmod>
<changefreq>never</changefreq>
<priority>0.1</priority>
</url>
</urlset>
The priority field ranges from 0.0 to 1.0 and is ignored by every major search engine. It remains in the spec as a monument to optimism.
lastmod is typically the deploy date, not the actual content modification date, making it decorativelastmod values to be identical and meaningless| Field | Type | Actually respected |
|---|---|---|
loc | URL string | Yes |
lastmod | ISO 8601 date | Sometimes |
changefreq | Hint string | Rarely |
priority | Float 0.0-1.0 | No |
SITEMAP_404 # The sitemap itself cannot be found. Classic.
MALFORMED_XML # An unescaped ampersand in a URL. Always an ampersand.
OVERSIZED # File exceeds 50MB uncompressed. You have too many pages.
STALE_LASTMOD # All dates are the same. We know.
BLOCKED_BY_ROBOTS # robots.txt disallows crawling the sitemap. Poetic.
Does having a sitemap improve my ranking? It improves your crawl coverage. Rankings are a separate negotiation.
Should I include noindex pages? No. You are sending conflicting signals to an entity that will resolve them against you.
What is changefreq: always for?
Technically, homepages. Practically, never use this.