Question 1

Which AI crawlers does this check?

Accepted Answer

Thirteen — the ones that actually matter as of June 2026: five model-training crawlers, two training opt-out tokens, three AI-search indexers, and three live retrieval fetchers. In full: the training crawlers GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), Meta-ExternalAgent (Meta) and Bytespider (ByteDance); the opt-out tokens Google-Extended and Applebot-Extended; the search indexers OAI-SearchBot, Claude-SearchBot and PerplexityBot, the bots that decide whether you can be cited; and the retrieval fetchers ChatGPT-User, Claude-User and Perplexity-User. We check each one against your site root and group the verdicts by what the bot is for.

Question 2

How do you decide whether a bot is blocked?

Accepted Answer

We fetch your live /robots.txt, parse it into user-agent groups, and run each bot through Google's documented matching algorithm: the most specific user-agent group wins, the longest matching path pattern inside it wins, and an Allow ties out a Disallow of equal length. Then we evaluate the site root, so a verdict of blocked means the bot is shut out of the whole site, not just one folder. If there's no robots.txt at all, every crawler is allowed by default — that's how robots.txt works.

Question 3

Does blocking GPTBot hurt my Google ranking?

Accepted Answer

No. GPTBot is OpenAI's training crawler; it has nothing to do with Googlebot or how Google ranks you. The same goes for Google-Extended — Google states plainly that disallowing it does not change your inclusion in Search and is not a ranking signal. So opting out of AI training carries no classic-SEO penalty. The only thing you can lose is AI visibility, and only if you also block the search and retrieval bots.

Question 4

If some bots ignore robots.txt, why bother checking?

Accepted Answer

Because most of them honor it. GPTBot, ClaudeBot, CCBot, the search indexers and the -Extended tokens all respect robots.txt, so a few lines genuinely opts you out of large-scale training. The honest limits: the user-initiated fetchers (ChatGPT-User, Perplexity-User) may ignore it, and bad actors like the reported Bytespider don't respect it at all. We flag each bot's behavior in the table so you know which verdicts are reliable and which are best-effort. For the ones that ignore it, you need CDN- or WAF-level blocking, not robots.txt.

Question 5

How is this different from your robots.txt tester?

Accepted Answer

The robots.txt tester is general: you give it one URL and one user-agent and it tells you whether that path is crawlable. This tool is specialized for the AI question — you give it a domain and it checks the entire AI-crawler fleet at once, sorts the answers into training, search and retrieval, and tells you in plain language whether you've opted out of training while staying citable. If you want to test an arbitrary bot or a specific path, use the robots.txt tester; if you want the AI picture for your whole site, use this.

Question 6

Can I block AI training but still get cited in AI search?

Accepted Answer

Yes, and that's the configuration most sites want. Block the training crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) and leave the search bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot) allowed. You stop feeding the models for free while staying indexable and citable in AI answers. There is no single AI switch — the granularity is the whole point, and this tool shows you which side of it you're on.

AI crawler access checker

"AI crawler" isn't one thing

How to use it

A verdict is only as honest as the bot

Frequently asked questions

AI crawler access checker

"AI crawler" isn't one thing

How to use it

A verdict is only as honest as the bot

Frequently asked questions

Related tools and reading