AI crawler access checker

There's a robots.txt snippet doing the rounds that "blocks all AI." Paste it without reading it and you can quietly delete yourself from the AI answers you actually wanted to be in. Enter your domain and this checks the whole AI-crawler fleet against your live robots.txt — GPTBot, ClaudeBot, PerplexityBot, the search bots, the -Extended tokens. It tells you, in plain language, whether you've opted out of training while staying in answers, or walled off both by accident.

Free and anonymous. Reads your /robots.txt once. Fair-use limit: 10 checks per hour per IP.

"AI crawler" isn't one thing

The reason the "block all AI" snippet backfires is that the bots it lumps together do three different jobs, and most providers run a separately named bot for each. Allow one, block another. That split is everything:

  • Training crawlers scrape your pages to train a model — GPTBot, ClaudeBot, CCBot. This is the one most people want to opt out of, and it's nearly free: blocking them costs you nothing in Google rankings. (One catch — ByteDance's Bytespider is reported to ignore robots.txt, so a rule against it may do nothing.)
  • AI search indexers build the index an assistant cites from — OAI-SearchBot, Claude-SearchBot, PerplexityBot. Block these and you can't be cited; this is your path into AI answers, not out of them.
  • Live retrieval bots fetch a page in real time because a user asked the assistant to read it — ChatGPT-User, Claude-User, Perplexity-User. Two of the three may ignore robots.txt because the request came from a person.
  • Opt-out tokens like Google-Extended and Applebot-Extended aren't crawlers at all — they're switches. Disallowing Google-Extended opts you out of Gemini training and, in Google's own words, does not affect your place in Google Search.

So "should I block AI?" is the wrong question. "Do I want out of training while staying in answers and search?" is the right one, and for most sites the answer is yes. This tool tells you which of those your robots.txt is actually doing right now.

How to use it

  1. Enter your domain — a bare example.com is fine. We read the robots.txt at its root.
  2. Read the summary first. It tells you in one line whether you're opted out of training, blocking your own citations, or open to everything.
  3. Scan the table. Each bot shows allowed or blocked, the rule that decided it, and whether that bot actually honors robots.txt — so you know which verdicts are reliable.
  4. Fix it if it's wrong. The guide to blocking AI crawlers has copy-paste directives, and the robots.txt tester lets you confirm a specific path before you ship.

A verdict is only as honest as the bot

robots.txt is a polite request, not a firewall. A "blocked" verdict here means a well-behaved crawler will stay out — and most of them are well-behaved. But two things are worth saying out loud, because a tool that hides them is lying to you. The user-initiated fetchers (ChatGPT-User by OpenAI's own statement, Perplexity-User by Perplexity's) may fetch a page anyway when a person asks for it. And bots that disregard robots.txt entirely — the behavior third parties report from Bytespider — won't be stopped by any line in the file.

We flag each bot's behavior in the table so the verdict comes with its own asterisk. If you need to actually enforce a block against a bot that ignores the rules, that's a job for your CDN or WAF, not robots.txt.

Frequently asked questions

Which AI crawlers does this check?

Thirteen — the ones that actually matter as of June 2026: five model-training crawlers, two training opt-out tokens, three AI-search indexers, and three live retrieval fetchers. In full: the training crawlers GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), Meta-ExternalAgent (Meta) and Bytespider (ByteDance); the opt-out tokens Google-Extended and Applebot-Extended; the search indexers OAI-SearchBot, Claude-SearchBot and PerplexityBot, the bots that decide whether you can be cited; and the retrieval fetchers ChatGPT-User, Claude-User and Perplexity-User. We check each one against your site root and group the verdicts by what the bot is for.

How do you decide whether a bot is blocked?

We fetch your live /robots.txt, parse it into user-agent groups, and run each bot through Google's documented matching algorithm: the most specific user-agent group wins, the longest matching path pattern inside it wins, and an Allow ties out a Disallow of equal length. Then we evaluate the site root, so a verdict of blocked means the bot is shut out of the whole site, not just one folder. If there's no robots.txt at all, every crawler is allowed by default — that's how robots.txt works.

Does blocking GPTBot hurt my Google ranking?

No. GPTBot is OpenAI's training crawler; it has nothing to do with Googlebot or how Google ranks you. The same goes for Google-Extended — Google states plainly that disallowing it does not change your inclusion in Search and is not a ranking signal. So opting out of AI training carries no classic-SEO penalty. The only thing you can lose is AI visibility, and only if you also block the search and retrieval bots.

If some bots ignore robots.txt, why bother checking?

Because most of them honor it. GPTBot, ClaudeBot, CCBot, the search indexers and the -Extended tokens all respect robots.txt, so a few lines genuinely opts you out of large-scale training. The honest limits: the user-initiated fetchers (ChatGPT-User, Perplexity-User) may ignore it, and bad actors like the reported Bytespider don't respect it at all. We flag each bot's behavior in the table so you know which verdicts are reliable and which are best-effort. For the ones that ignore it, you need CDN- or WAF-level blocking, not robots.txt.

How is this different from your robots.txt tester?

The robots.txt tester is general: you give it one URL and one user-agent and it tells you whether that path is crawlable. This tool is specialized for the AI question — you give it a domain and it checks the entire AI-crawler fleet at once, sorts the answers into training, search and retrieval, and tells you in plain language whether you've opted out of training while staying citable. If you want to test an arbitrary bot or a specific path, use the robots.txt tester; if you want the AI picture for your whole site, use this.

Can I block AI training but still get cited in AI search?

Yes, and that's the configuration most sites want. Block the training crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) and leave the search bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot) allowed. You stop feeding the models for free while staying indexable and citable in AI answers. There is no single AI switch — the granularity is the whole point, and this tool shows you which side of it you're on.

Last updated: 2026-06-21