Free robots.txt tester

One wrong line in robots.txt can quietly wall a site off from Google. Enter a domain and a path, and see whether Googlebot, Bing, and the main AI crawlers are allowed to crawl it, plus the exact rule that decides. No signup.

Free, anonymous. Up to 10 checks per hour. We fetch the public robots.txt only.

What this tool checks

It fetches the live robots.txt at the root of the domain you enter, parses every user-agent group, and evaluates the path you gave it the way Google does. For each crawler you get a clear allowed or blocked verdict and the single rule that produced it.

  • The verdict per crawler — Googlebot, Bingbot, and the main AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), because they don't always get the same answer.
  • The deciding rule — the exact Allow or Disallow line that won.
  • Every user-agent group, parsed and laid out, including which group each crawler obeys.
  • Declared sitemaps, or a flag if none are declared.
  • Common mistakes: a site-wide Disallow: /, blocked CSS or JS, missing sitemap, AI-crawler blocks.

How to use it

  1. Enter the domain. Bare domain is fine (example.com); we look for /robots.txt at its root.
  2. Add a path to test (optional). For example /admin/ or /blog/my-post. Leave it blank to test the homepage. You can also paste a full URL and we'll use its path.
  3. Read the verdicts. Each crawler shows allowed or blocked plus the deciding rule. Check Googlebot first.
  4. Fix the line, not the symptom. Use the parsed groups to find the exact rule, edit your robots.txt, and re-test.

Why robots.txt is worth testing

robots.txt is a few lines of text with an outsized blast radius. It is the first file a crawler reads, and a single misplaced Disallow can take a section, or a whole site, out of search. The classic failure is a launch: a staging robots.txt with Disallow: / gets pushed live, and weeks later traffic quietly falls off a cliff. Testing takes seconds; finding out from a ranking drop takes months.

Crawling is not the same as indexing

This trips up almost everyone. Disallow stops a crawler from fetching a URL. It does not stop that URL from being indexed. If other pages link to a blocked URL, Google can still list it in results, usually as a bare link with no description, because it was never allowed to read the page. So robots.txt is the wrong tool for keeping a page out of Google. To do that, let the page be crawled and add a noindex meta tag or X-Robots-Tag header. Block it in robots.txt and Google never sees the noindex.

How crawlers actually pick a rule

Two things decide every verdict, and both surprise people. First, a crawler obeys only the single most specific user-agent group that matches it; once it picks that group, the rules in every other group, including the catch-all * group, are ignored for that crawler. Second, within the chosen group the rule with the longest matching path wins, and on an exact-length tie Allow beats Disallow. Wildcards (*) and end-anchors ($) change what counts as a match. That is why a page can look allowed in the * group yet be blocked by a more specific Googlebot group. This tool shows you the group and the rule, so the logic is visible instead of guessed.

AI crawlers are a separate decision

The crawlers reading the web for AI answers, GPTBot, ClaudeBot, PerplexityBot, and Google-Extended, obey robots.txt too, and you can allow or block them independently of Googlebot. There is no SEO penalty either way; it is a call about whether you want your content feeding AI answers and brand mentions, or kept out of them. The tool reports where each AI crawler stands so the decision is at least a conscious one.

Frequently asked questions

What is a robots.txt file?

robots.txt is a plain-text file at the root of a site (for example example.com/robots.txt) that tells crawlers which parts of the site they may or may not request. It is grouped by user-agent, with Allow and Disallow rules under each. It is a crawling instruction, not a security control: well-behaved bots like Googlebot obey it, but it does not stop anyone from visiting a URL directly.

How do I test if a URL is blocked by robots.txt?

Enter the domain (and optionally the exact path you care about) above. This tool fetches the live robots.txt, parses the user-agent groups, and tells you whether the path is allowed or blocked for Googlebot, Bing, and the main AI crawlers, plus the exact rule that decides. It uses Google's matching algorithm: the most specific user-agent group, then the longest matching path rule, with Allow winning ties.

Does robots.txt stop a page from being indexed?

No, and this is the most common robots.txt mistake. Disallow stops Google from crawling a URL, not from indexing it. If other pages link to a blocked URL, Google can still index it, usually showing a bare title with no description. To keep a page out of Google, let it be crawled and add a noindex meta tag or X-Robots-Tag header. If you block it in robots.txt, Google never sees the noindex.

What does Disallow: / mean?

Disallow: / blocks the entire site for the user-agent in that group. It is correct for a staging site you never want crawled, and a disaster if it lands on a production site by accident — it tells Google to stop crawling everything. A common cause is a robots.txt copied over from a staging environment at launch. This tool flags it when it affects Googlebot.

Should I block AI crawlers like GPTBot in robots.txt?

It depends on your goal. Blocking GPTBot, ClaudeBot, PerplexityBot, and Google-Extended keeps your content out of those AI models and AI answers. Some publishers do this to protect content; others leave them open because AI answers are becoming a real source of traffic and brand mentions. There is no SEO penalty either way, it is a business decision. This tool shows you which AI crawlers a site currently allows or blocks.

Where should the robots.txt file be located?

It must be at the root of the host, at /robots.txt (for example https://example.com/robots.txt). A robots.txt in a subfolder is ignored. Each subdomain and each scheme is separate: https://example.com and https://blog.example.com need their own robots.txt, and http and https are treated independently.

What is the difference between Allow and Disallow?

Disallow tells a crawler not to request matching URLs; Allow explicitly permits them, mainly to carve an exception out of a broader Disallow. For example Disallow: /admin/ with Allow: /admin/public/ blocks the admin area except that one subfolder. When an Allow and a Disallow both match a URL, the more specific (longer) rule wins, and on an exact-length tie Allow wins.

Why is my page blocked when the robots.txt looks fine?

Usually because of which group applies and which rule is most specific. A crawler obeys only the single most specific user-agent group that matches it, and rules from other groups (including the * group) are then ignored for that crawler. Within the chosen group, the longest matching path pattern wins, and wildcards (*) and end-anchors ($) change what matches. This tool shows you the exact rule and group that decided each verdict, so the surprise becomes obvious.

Last updated: 2026-05-25