You are 72 hours from launch. The new site has been in staging for three months. The dev team wants to merge and ship. The founder is in the deploy-day Slack channel asking when the marketing campaign can go live. You have one window to catch the launch-day mistakes that take weeks to undo — the staging noindex tag that survived into production, the Disallow: / left behind in robots.txt, the canonical that points to staging.acme.com from every page on the live site, the redirect chain that the migration consultant promised was clean but isn't.
This checklist is for that window. It is not a tour of basic SEO. It assumes you know what noindex does. It focuses on the four 2026 shifts that now belong on every serious pre-launch checklist: (1) the post-leak hostAge cold-start tax that makes new-domain trust signals harder to earn; (2) field Core Web Vitals replacing lab-data as the only metric Google actually ranks on; (3) AI Overviews eating a 58% chunk of position-1 organic CTR on the queries where they appear; (4) per-bot AI-crawler control turning into a real operator decision instead of a binary opt-in/opt-out.
Three numbers anchor what's actually at stake.
1. Only 1.74% of newly-published pages reach a Google top-10 ranking in their first year, down from 5.7% in 2017. 72.9% of the pages currently in Google's top 10 are more than three years old; the average #1-ranked page is five years old, up from two years in 2017. Source: Ahrefs' 2025 analysis of 2M+ pages. The implication for launch: the cold-start tax is real and is meaningfully higher than it was even three years ago. Brand-entity signals, structured author data, and inbound mentions BEFORE launch shorten that lag — not Indexing-API tricks, not aged-domain purchases, not llms.txt files. The leaked Google hostAge attribute, confirmed by Mike King and Rand Fishkin's analysis of the May 2024 Content Warehouse API leak, exists specifically to demote fresh-and-suspicious entities at serving time. Looking established matters; pretending to be established backfires.
2. Only 48% of mobile pages and 56% of desktop pages pass all three Core Web Vitals in field data. INP at 77% mobile pass, LCP at 62%, CLS at 81%. Source: HTTP Archive 2025 Web Almanac, performance chapter, drawn from the open web's actual Chrome User Experience field data. The pass-rate gap is dominated by LCP. Half the public web fails the bar Google has been ranking on for two years. Lab data — Lighthouse, PageSpeed Insights' synthetic run — does not count for ranking. Google web.dev is explicit: "lab data isn't used for Google search rankings." You launch knowing you have zero field data on day one, and you instrument to start collecting it the moment real users arrive.
3. AI Overviews reduce position-1 organic CTR by 58% when present, per Ahrefs' Feb 2026 update covering 300,000 keywords' GSC data (up from a 34.5% drop the same researchers measured in April 2025). AI Overviews (AIO) appear on roughly 15-25% of queries depending on the month. Sites cited as sources inside an AI Overview see 35% more organic clicks than non-cited competitors. 83% of AIO-triggering searches end without a click. The pre-launch implication didn't exist in 2021: ranking #1 organically is no longer the only finish line on informational queries. AIO citation is a parallel one, and your launch needs to be instrumented for both surfaces from day one.
This checklist has 36 items across 7 categories: foundations & domain hygiene, indexability controls, URL structure + canonicals + redirects, rendering & Core Web Vitals, structured data & entity signals, off-page foundations & launch-day instrumentation, and anti-patterns being sold as pre-launch SEO that don't survive scrutiny in 2026. Tier filter, browser-saved progress, no account required.
Vocabulary
If you've read our GEO readiness checklist and indexation troubleshooting checklist you have most of these. New terms in italics.
- CWV (Core Web Vitals): Google's three field-measured page-experience metrics: LCP, INP, CLS. Field data (real users, via the Chrome User Experience Report) ranks; lab data (Lighthouse, synthetic) only diagnoses.
- LCP (Largest Contentful Paint): time until the largest above-the-fold element renders. Field target: ≤ 2.5s for "good".
- INP (Interaction to Next Paint): time from a user interaction to the next paint. Replaced FID on 12 March 2024. Field target: ≤ 200ms for "good".
- CLS (Cumulative Layout Shift): sum of unexpected layout shifts during the page's lifecycle. Field target: ≤ 0.1.
- 75th percentile (p75): Google judges CWV at the 75th-percentile sample of your real users — i.e. three out of four pageviews must hit the "good" threshold.
- CrUX (Chrome User Experience Report): Google's public field-data dataset for CWV. A fresh launch has no CrUX data on day one — expect 28+ days of meaningful traffic before your site shows up.
hostAge: a leaked Google Content Warehouse API attribute that quietly down-ranks brand-new sites until they accumulate real-world trust signals. Not a "sandbox" Google admits to; more accurately a fresh-and-suspicious demotion that backs off as you look established.- Candidate set: the pool of URLs an AI engine (or Google's own ranking) considers when deciding what to surface for a query. If you're not retrievable, you're not in the candidate set, and tactical GEO tricks lift you by zero.
hreflang/x-default: link/header annotations that tell Google which language+region variant of a page serves which audience;x-defaultis the geo-agnostic fallback. 75% of implementations contain errors per Search Engine Land's aggregated data.sameAs: schema.org property listing a brand or person's other canonical URLs (LinkedIn, Crunchbase, Wikidata, X, GitHub) so search engines link them as one entity.- E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness): Google's quality-rater framework. Author bio + credentials + sameAs cluster + structured Person schema are how a page demonstrates it.
- Person / ProfilePage / Organization schema: schema.org types used for E-E-A-T signals.
- OAI-SearchBot / GPTBot / ChatGPT-User: OpenAI's three crawlers, each independently controllable in robots.txt.
- ClaudeBot / Claude-User / Claude-SearchBot: Anthropic's three crawlers as of late 2025, splitting training, live fetches, and search indexing.
lastmod: sitemap-entry timestamp of the last meaningful content change. Google actively polices honesty here; per Mueller, "faking the lastmod date in your XML sitemaps won't help your SEO."- Soft-404: a page that returns HTTP 200 but is functionally a not-found page (no content, "page not found" message). Google treats these as 404 anyway and trust drops if your CMS emits them.
- SSR / SSG / hybrid / CSR: SSR (server-side render — your server returns ready HTML per request); SSG (static site generation — pre-rendered HTML at build time); hybrid (SSR/SSG for above-the-fold, JS-hydrated for interactivity); CSR (client-side render — the server returns an empty shell and JS builds the page in the browser). Google can render all four; AI crawlers can't execute JS, so CSR is invisible to them.
- Hydration: the process where JavaScript "attaches" to server-rendered HTML to make it interactive (event handlers, dynamic state). Critical above-the-fold content should be in the HTML pre-hydration; the JS just adds interactivity.
- Sitemap ping: the old
https://google.com/ping?sitemap=...URL operators used to "nudge" Google to recrawl a sitemap. Deprecated June 2023, fully shut down by end of 2023. - SpamBrain: Google's AI-driven spam detection system, the engine behind link-spam and quality demotions since 2022.
- PBN (private blog network): a ring of low-quality sites secretly owned by one operator to pass link equity to a target. SpamBrain detects most of them and the practice is net-negative under modern Google.
What success looks like
You ship with the staging-leak vectors closed, robots.txt + canonical + sitemap signals internally consistent, schema validated, two analytics surfaces wired, and three baselines captured (CrUX field data, AI-citation manual baseline, branded-search GSC baseline). You do NOT promise the founder a top-10 ranking in week 2. You set the expectation that cold-start is 3-9 months and that the work on this checklist shortens it — not by gaming Google but by giving its entity layer real signals on launch day.
Lock these on launch day. Every other category assumes these are right; getting them wrong takes weeks to undo.
The highest blast-radius bucket. A single forgotten staging directive that survives into production can de-index the whole site for weeks. Every item here is 'check it twice on launch day'.
What determines whether the link equity you earn aggregates to the right URL. Get the signals internally consistent — canonical, sitemap, internal links, redirects all agreeing.
Half the open web fails the CWV bar. Lab data diagnoses; field data ranks. Get the architecture right at launch — the field data you collect in the first 28 days becomes your CrUX baseline.
Ship only what Google's rich-results docs list — but ship it correctly. Entity signals (Organization, Person, sameAs) shorten cold-start lag per the leaked hostAge logic.
Things that aren't on your site but determine how fast you escape cold-start. The pre-launch trust-seeding work that the post-leak operator consensus says shortens hostAge lag.
Four things vendor blogs and SEO course-sellers currently sell as pre-launch best practice that don't survive scrutiny against 2025-2026 data. If you've been paying for any of these, you're not behind — you're early enough to redirect budget to the things that actually move the needle.