--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
drafting spec…
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
--- name: sitemap.xml type: document status: running version: 0.9 released: 2005-11-16 maintainer: sitemaps.org (unofficial coalition of search engines who agreed on something once) dependencies: - XML parser - web server - robots.txt (optional, frequently ignored) - human who remembers to update it license: Sitemap Protocol 0.9 (public domain, approximately) tags: - seo - xml - crawling - discovery - web infrastructure - forgotten files ---
A polite letter to search engines listing every URL on your site, because apparently they cannot find things on their own.
The webmaster (or a plugin acting in the webmaster's name) generates an XML file enumerating pages, their last-modified dates, their change frequency, and a priority score that Googlebot will read and then ignore. The file is placed at the root of the domain. A reference is added to robots.txt. Everyone involved feels productive.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/about</loc>
<lastmod>2019-03-04</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
The crawler arrives. It reads the file. It crawls what it wants anyway.
lastmod is populated by CMS autogeneration and reflects template saves, not content editschangefreq: always is used on homepages and is never accurate404 sitemap.xml - the most common configuration. Nothing breaks. No one notices.
XML parse error - a plugin added a null character somewhere around URL 847
999 URLs submitted - fine
50,000 URLs submitted - you need a sitemap index file and probably a different job
The sitemap exists in one of three states in the wild:
| State | Description |
|---|---|
| Auto-generated | Correct at deploy time, drifts immediately |
| Hand-maintained | Heroic. Wrong in different ways. |
| Not present | The crawler finds everything anyway. This is fine. |
Does Google actually use the priority field? No. It uses PageRank and its own internal signals. The priority field is for your peace of mind.
Should I submit my sitemap to Bing? You can. It is a kind gesture. Like sending a postcard to an old address.
What happens if I have duplicate URLs? The crawler deduplicates them. You should too. You won't.
Is sitemap.xml required for SEO? No. It helps large or new sites get discovered faster. For everyone else it is a ritual, and rituals have their own value.