You open robots.txt and realise it hasn't been touched since 2022. Back then AI crawlers weren't a category. Now there are at least ten user-agent names you need an opinion on, and getting one of them wrong means either six weeks of being invisible to ChatGPT or six weeks of being scraped while you think you blocked the door.
The worst version of this story isn't the typo. It's the 3am Sunday when you discover the typo and realise nobody's been reading the file for two months — long enough that the missed citations are someone else's now.
This is the configuration checklist version of our long-form AI bots in robots.txt article. Same opinions, different format — a 22-item tiered list you can work through, save your progress in your browser, and re-open the next time you audit a site. Items are tagged critical (skipping costs you real traffic), important (skipping is a footgun), or nice to have (skipping is fine).
How to use this checklist
Tick items as you go — progress lives in your browser, no account needed. Click How to do this inside an item for the exact steps. Filter by tier if you only want to ship the critical fixes first. When you finish, hit "Share progress" to copy a one-line summary you can paste into your team's Slack ("20/20 done — robots.txt audit live").
If you only have 15 minutes, do every critical item in order. The rest can wait for the quarterly review.
Who this is for
Anyone touching robots.txt on a production site. Specifically:
- SEO leads doing a quarterly audit of a client site.
- SaaS founders configuring a new marketing site or migration.
- Publishers deciding which AI crawlers to allow vs block.
- Agencies standardising a robots.txt template across many clients.
If you have under a dozen pages and no plan to be cited by ChatGPT, you can
probably skip most of this and ship the default User-agent: *
group with a sensible Sitemap: line. For everyone else, the
twenty-two items below.
One vocabulary note before we start
A user-agent group is one User-agent: line
plus every Disallow: and Allow: line beneath it
until the next User-agent:. Each crawler reads top-to-bottom,
picks the single most-specific group that names it, and follows only that
group. User-agent: * is the fallback for crawlers that don't have
their own group — never a base layer that other groups inherit from.
Carry that mental model through the rest of the checklist.
What success looks like
Done right, robots.txt is a file you touch four times a year, sleep easy about the rest of the time, and forget exists. A month from now, when somebody in your buyer's Slack asks ChatGPT which tools to use, your name is in the answer — not part of the cohort that got the file wrong. The 22 items below are the cost of buying that quiet.
Three concept checks before you open the file. Skip these and you'll make confident decisions for the wrong reasons.
These bots fetch your pages to fold into the next model. Blocking them costs nothing today; allowing them buys a non-zero chance of being named in future AI answers.
These bots only fire when a specific user pastes your URL into ChatGPT / Claude / Perplexity. Blocking them breaks a feature your readers might actually use — the user already chose to read your site, you're just refusing the assistant they're using to read it.
The syntax mistakes that show up most often in robots.txt code review. Two of these can take a site out of Google.
Six checks between "saved the file" and "pushed to production". Skipping these is how a typo silently breaks a site for six weeks — Google takes 4-8 weeks of recrawl cycles to fully reverse a bad robots.txt.