Free sitemap.xml analyzer

Your sitemap tells Google which pages exist. When an entry is malformed, Google skips it silently — nothing turns up in Search Console, and you keep waiting on pages that will never get crawled. Paste a sitemap URL or your domain root and see exactly what's broken. No signup.

Direct sitemap URL, or just your domain — we try /sitemap.xml, /sitemap_index.xml, /sitemap-index.xml automatically.

Free, anonymous. Up to 10 analyses per hour. We fetch the sitemap URL you provide — plus, if you tick the box above, up to 10 of its child sitemaps.

What this tool checks

It fetches the URL you enter and reads the raw sitemap XML. It then validates every <url> entry against the sitemap 0.9 protocol:

  • The loc element. Every <url> entry needs a present, non-empty <loc> that is an absolute URL with a valid scheme (http or https) and a host. This is the one that actually stops a page from being queued when it's wrong.
  • Cross-host entries. If a <loc> belongs to a different registrable domain than the sitemap itself, Google ignores it — so we flag it as a warning.
  • The optional hints get checked for format, not truth: <lastmod> must be a valid ISO 8601 date, <changefreq> one of the seven spec values (always, hourly, daily, weekly, monthly, yearly, never), and <priority> a decimal from 0.0 to 1.0. A malformed value here doesn't break anything — it just gets dropped, which is worse, because you assume it's working.
  • URLs longer than 2,048 characters can be silently skipped, so those get flagged too.
  • If the file is a sitemapindex — a sitemap of sitemaps — you get the child list with a link to run each one through the analyzer. Gzipped .xml.gz files are detected and decompressed on the way in.

How to use it

  1. Paste a URL. Either the direct sitemap (https://example.com/sitemap.xml) or just the domain (example.com) — for a bare domain we probe the three most common sitemap locations for you.
  2. Glance at the summary tiles for a quick health read: total URLs, valid entries, warnings, errors.
  3. Open the Issues tab. Most people go straight to the URL table because it's long and looks impressive. Skip it. The Issues tab is where the actual problems live — every warning and error grouped with the exact entry that caused it.
  4. Got a sitemapindex? Click into any child sitemap to run it on its own.

Why sitemap health matters for SEO

A sitemap is not magic crawl juice. But a broken sitemap is a real cost. If a URL has a malformed <loc>, Google never queues it. If your <lastmod> dates are inaccurate, Google stops trusting them and crawls less efficiently. Cross-host entries signal a configuration problem and get ignored without any error in Search Console. None of these break your rankings on pages Google already knows about. But they slow discovery of new content, and they make recovery after a site migration harder.

Sitemap vs robots.txt: two different jobs

robots.txt tells crawlers which paths they are allowed to request. A sitemap tells them which paths exist and are worth crawling. They are complementary. A page in your sitemap that is also blocked in robots.txt confuses crawlers: you are simultaneously saying "here is a URL I want indexed" and "do not fetch it." Google resolves this by ignoring the sitemap entry and respecting the robots.txt block. Check both with the Robots.txt tester and this tool side by side.

Sitemapindex vs urlset

A plain urlset is a flat list of page URLs. A sitemapindex is a list of other sitemaps, each of which may be a urlset. Large sites use a sitemapindex to stay under the 50,000 URL and 50 MB per-sitemap limits. Google crawls both formats; the difference is only organizational. This tool recognizes both and reports the structure so you know which one you have.

When lastmod is ignored

Google has publicly stated it ignores lastmod when the dates look unrealistic — for instance, every page showing today's date regardless of whether it was changed. If your CMS auto-sets lastmod on every deploy, you may be training Google to distrust the hint. The spec-compliant behavior is to only set lastmod when the page content changed. This tool validates the format but cannot tell you whether the dates are accurate.

Frequently asked questions

What is a sitemap.xml file?

A sitemap.xml is an XML file that lists the URLs you want search engines to discover and crawl. It is the fastest way to tell Google which pages exist on your site, especially new ones or ones that are hard to reach through internal links. It can also carry hints about when each page was last updated (lastmod), how often it changes (changefreq), and relative priority (priority). Google does not guarantee it will follow these hints, but they do influence crawl scheduling.

What does this tool check?

It fetches your sitemap, determines whether it is a plain urlset or a sitemapindex, and validates each URL entry against the sitemap 0.9 protocol: presence and format of the loc element, whether the URL is absolute with a valid scheme and host, whether the loc domain matches the sitemap host (cross-host entries are ignored by Google), whether lastmod is a valid ISO 8601 date, whether changefreq is one of the allowed values, and whether priority is a float between 0.0 and 1.0. It also checks URL length and surfaces gzip decompression problems.

What is a sitemapindex?

A sitemapindex is a sitemap that contains links to other sitemaps rather than to individual pages. It is the standard approach for large sites that have more than 50,000 URLs or whose sitemap would exceed the 50 MB uncompressed limit. Google follows the child sitemaps listed in the index and crawls them separately. This tool surfaces the sitemapindex structure and links to each child so you can analyze them individually.

Does Google require a lastmod date?

No, lastmod is optional. But when it is present and accurate, it helps Google prioritize which pages to recrawl. The key word is accurate: if you always set lastmod to today regardless of whether the page changed, Google learns to ignore it. Use lastmod only for pages that were genuinely updated on that date. An invalid date format is treated as missing by most crawlers.

What is a cross-host URL in a sitemap?

A cross-host URL is a loc entry whose domain differs from the domain the sitemap was fetched from. For example, if your sitemap is at example.com/sitemap.xml but contains a loc pointing to cdn.otherdomain.com/page, that is a cross-host entry. Google's documented behavior is to ignore them, so they add noise without any indexation benefit. This tool flags them as warnings.

Does this tool check if my URLs are actually indexed?

Not in this free version. Checking Google indexation status for each URL in a sitemap would require one API call per URL, which becomes slow and costly at scale. This tool focuses on structural validation: are your entries well-formed and spec-compliant. For individual URL indexation, use the Backlink index checker.

Why might my sitemap fail to load?

The most common reasons: the URL returns a non-200 status (404, 403, 500); the site is behind a bot-protection system that blocks automated fetches; the file is over 10 MB; or the gzip compression is corrupt. This tool reports the HTTP status code and top-level errors to help diagnose each case.

Should I submit my sitemap to Google Search Console?

Yes. Submitting via GSC tells Google where to find it and gives you feedback on crawl errors. You should also declare it in your robots.txt with a Sitemap: line. GSC is the authoritative place to see which URLs Google has discovered and indexed. Our GSC Anchor Text Analyzer helps you work with your GSC data.

Last updated: 2026-05-27