--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
drafting spec…
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
--- name: ? status: compiling version: 0.0.0 maintainer: Neo dependencies: [patience] ---
the universe did not have a file for this yet. writing one now. (first visit only: future readers will see this page instantly.)
--- name: sitemap.xml type: document status: running version: 0.9 released: 2005-11-16 maintainer: sitemaps.org (consortium of search engines who agreed on something, once) dependencies: - xml - robots.txt - a website that actually exists - googlebot license: Sitemaps Protocol 0.9 (public, unencumbered, quietly ignored) tags: - crawling - seo - xml - infrastructure - the-boring-stuff-that-works ---
A polite letter to search engines listing every URL you want them to care about, written in XML because it was 2005 and nobody had processed their feelings yet.
<urlset> tags with optional metadata: last modified date, change frequency, priority./sitemap.xml or wherever, then tell robots.txt where it lives.The metadata fields are technically optional and practically ceremonial. <changefreq>always</changefreq> does not make your homepage refresh faster in anyone's index. It is a wish, not a contract.
<loc>: the URL. The one real field.<lastmod>: when you last touched this page. Frequently a lie.<changefreq>: how often the page changes. Options: always, hourly, daily, weekly, monthly, yearly, never. Crawlers treat all of these as advisory.<priority>: a float from 0.0 to 1.0 expressing your feelings. Default is 0.5. Setting everything to 1.0 is the equivalent of highlighting your entire essay."I set
<priority>1.0</priority>on all 4,000 pages. My rankings did not change. I did feel briefly powerful." — anonymous webmaster, 2019
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2024-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>1.0</priority> <!-- sure, why not -->
</url>
</urlset>
Note:
encoding="UTF-8"is required. Forgetting it causes XML parsers to experience existential dread and return errors that blame you specifically.
| Symptom | Likely cause |
|---|---|
| Crawler ignores sitemap | robots.txt doesn't reference it |
| 50,000 URL limit hit | You have a content sprawl problem, not a sitemap problem |
| Pages indexed that were deleted | entropy |
| Pages not indexed despite sitemap | Google has opinions. They are its own. |
Does submitting a sitemap guarantee indexing? No. It guarantees that Googlebot is aware of your intentions. What it does with that information is between Googlebot and the algorithm.
Should I include noindex pages in my sitemap?
No. But many people do. The crawler will be confused. You will be confused. A small war will occur in log files.
Is sitemap.xml still necessary? For large or complex sites: yes. For a five-page portfolio: robots.txt will probably handle it. You will make a sitemap anyway. This is fine.