Question 1

What is a robots.txt file?

Accepted Answer

robots.txt is a plain-text file at the root of a site (for example example.com/robots.txt) that tells crawlers which parts of the site they may or may not request. It is grouped by user-agent, with Allow and Disallow rules under each. It is a crawling instruction, not a security control: well-behaved bots like Googlebot obey it, but it does not stop anyone from visiting a URL directly.

Question 2

How do I test if a URL is blocked by robots.txt?

Accepted Answer

Enter the domain (and optionally the exact path you care about) above. This tool fetches the live robots.txt, parses the user-agent groups, and tells you whether the path is allowed or blocked for Googlebot, Bing, and the main AI crawlers, plus the exact rule that decides. It uses Google's matching algorithm: the most specific user-agent group, then the longest matching path rule, with Allow winning ties.

Question 3

Does robots.txt stop a page from being indexed?

Accepted Answer

No, and this is the most common robots.txt mistake. Disallow stops Google from crawling a URL, not from indexing it. If other pages link to a blocked URL, Google can still index it, usually showing a bare title with no description. To keep a page out of Google, let it be crawled and add a noindex meta tag or X-Robots-Tag header. If you block it in robots.txt, Google never sees the noindex.

Question 4

What does Disallow: / mean?

Accepted Answer

Disallow: / blocks the entire site for the user-agent in that group. It is correct for a staging site you never want crawled, and a disaster if it lands on a production site by accident — it tells Google to stop crawling everything. A common cause is a robots.txt copied over from a staging environment at launch. This tool flags it when it affects Googlebot.

Question 5

Should I block AI crawlers like GPTBot in robots.txt?

Accepted Answer

It depends on your goal. Blocking GPTBot, ClaudeBot, PerplexityBot, and Google-Extended keeps your content out of those AI models and AI answers. Some publishers do this to protect content; others leave them open because AI answers are becoming a real source of traffic and brand mentions. There is no SEO penalty either way, it is a business decision. This tool shows you which AI crawlers a site currently allows or blocks.

Question 6

Where should the robots.txt file be located?

Accepted Answer

It must be at the root of the host, at /robots.txt (for example https://example.com/robots.txt). A robots.txt in a subfolder is ignored. Each subdomain and each scheme is separate: https://example.com and https://blog.example.com need their own robots.txt, and http and https are treated independently.

Question 7

What is the difference between Allow and Disallow?

Accepted Answer

Disallow tells a crawler not to request matching URLs; Allow explicitly permits them, mainly to carve an exception out of a broader Disallow. For example Disallow: /admin/ with Allow: /admin/public/ blocks the admin area except that one subfolder. When an Allow and a Disallow both match a URL, the more specific (longer) rule wins, and on an exact-length tie Allow wins.

Question 8

Why is my page blocked when the robots.txt looks fine?

Accepted Answer

Usually because of which group applies and which rule is most specific. A crawler obeys only the single most specific user-agent group that matches it, and rules from other groups (including the * group) are then ignored for that crawler. Within the chosen group, the longest matching path pattern wins, and wildcards (*) and end-anchors ($) change what matches. This tool shows you the exact rule and group that decided each verdict, so the surprise becomes obvious.

Free robots.txt tester

Can this path be crawled?

What to check

Parsed rules by user-agent

Sitemaps declared

What this tool checks

How to use it

Why robots.txt is worth testing

Crawling is not the same as indexing

How crawlers actually pick a rule

AI crawlers are a separate decision

Frequently asked questions