LinkGuard cover — Technical SEO checklist before site launch: 36 items for a clean 2026 go-live

You are 72 hours from launch. The new site has been in staging for three months. The dev team wants to merge and ship. The founder is in the deploy-day Slack channel asking when the marketing campaign can go live. You have one window to catch the launch-day mistakes that take weeks to undo — the staging noindex tag that survived into production, the Disallow: / left behind in robots.txt, the canonical that points to staging.acme.com from every page on the live site, the redirect chain that the migration consultant promised was clean but isn't.

This checklist is for that window. It is not a tour of basic SEO. It assumes you know what noindex does. It focuses on the four 2026 shifts that now belong on every serious pre-launch checklist: (1) the post-leak hostAge cold-start tax that makes new-domain trust signals harder to earn; (2) field Core Web Vitals replacing lab-data as the only metric Google actually ranks on; (3) AI Overviews eating a 58% chunk of position-1 organic CTR on the queries where they appear; (4) per-bot AI-crawler control turning into a real operator decision instead of a binary opt-in/opt-out.

Three numbers anchor what's actually at stake.

1. Only 1.74% of newly-published pages reach a Google top-10 ranking in their first year, down from 5.7% in 2017. 72.9% of the pages currently in Google's top 10 are more than three years old; the average #1-ranked page is five years old, up from two years in 2017. Source: Ahrefs' 2025 analysis of 2M+ pages. The implication for launch: the cold-start tax is real and is meaningfully higher than it was even three years ago. Brand-entity signals, structured author data, and inbound mentions BEFORE launch shorten that lag — not Indexing-API tricks, not aged-domain purchases, not llms.txt files. The leaked Google hostAge attribute, confirmed by Mike King and Rand Fishkin's analysis of the May 2024 Content Warehouse API leak, exists specifically to demote fresh-and-suspicious entities at serving time. Looking established matters; pretending to be established backfires.

2. Only 48% of mobile pages and 56% of desktop pages pass all three Core Web Vitals in field data. INP at 77% mobile pass, LCP at 62%, CLS at 81%. Source: HTTP Archive 2025 Web Almanac, performance chapter, drawn from the open web's actual Chrome User Experience field data. The pass-rate gap is dominated by LCP. Half the public web fails the bar Google has been ranking on for two years. Lab data — Lighthouse, PageSpeed Insights' synthetic run — does not count for ranking. Google web.dev is explicit: "lab data isn't used for Google search rankings." You launch knowing you have zero field data on day one, and you instrument to start collecting it the moment real users arrive.

3. AI Overviews reduce position-1 organic CTR by 58% when present, per Ahrefs' Feb 2026 update covering 300,000 keywords' GSC data (up from a 34.5% drop the same researchers measured in April 2025). AI Overviews (AIO) appear on roughly 15-25% of queries depending on the month. Sites cited as sources inside an AI Overview see 35% more organic clicks than non-cited competitors. 83% of AIO-triggering searches end without a click. The pre-launch implication didn't exist in 2021: ranking #1 organically is no longer the only finish line on informational queries. AIO citation is a parallel one, and your launch needs to be instrumented for both surfaces from day one.

This checklist has 36 items across 7 categories: foundations & domain hygiene, indexability controls, URL structure + canonicals + redirects, rendering & Core Web Vitals, structured data & entity signals, off-page foundations & launch-day instrumentation, and anti-patterns being sold as pre-launch SEO that don't survive scrutiny in 2026. Tier filter, browser-saved progress, no account required.

Vocabulary

If you've read our GEO readiness checklist and indexation troubleshooting checklist you have most of these. New terms in italics.

CWV (Core Web Vitals): Google's three field-measured page-experience metrics: LCP, INP, CLS. Field data (real users, via the Chrome User Experience Report) ranks; lab data (Lighthouse, synthetic) only diagnoses.
LCP (Largest Contentful Paint): time until the largest above-the-fold element renders. Field target: ≤ 2.5s for "good".
INP (Interaction to Next Paint): time from a user interaction to the next paint. Replaced FID on 12 March 2024. Field target: ≤ 200ms for "good".
CLS (Cumulative Layout Shift): sum of unexpected layout shifts during the page's lifecycle. Field target: ≤ 0.1.
75th percentile (p75): Google judges CWV at the 75th-percentile sample of your real users — i.e. three out of four pageviews must hit the "good" threshold.
CrUX (Chrome User Experience Report): Google's public field-data dataset for CWV. A fresh launch has no CrUX data on day one — expect 28+ days of meaningful traffic before your site shows up.
hostAge: a leaked Google Content Warehouse API attribute that quietly down-ranks brand-new sites until they accumulate real-world trust signals. Not a "sandbox" Google admits to; more accurately a fresh-and-suspicious demotion that backs off as you look established.
Candidate set: the pool of URLs an AI engine (or Google's own ranking) considers when deciding what to surface for a query. If you're not retrievable, you're not in the candidate set, and tactical GEO tricks lift you by zero.
hreflang / x-default: link/header annotations that tell Google which language+region variant of a page serves which audience; x-default is the geo-agnostic fallback. 75% of implementations contain errors per Search Engine Land's aggregated data.
sameAs: schema.org property listing a brand or person's other canonical URLs (LinkedIn, Crunchbase, Wikidata, X, GitHub) so search engines link them as one entity.
E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness): Google's quality-rater framework. Author bio + credentials + sameAs cluster + structured Person schema are how a page demonstrates it.
Person / ProfilePage / Organization schema: schema.org types used for E-E-A-T signals.
OAI-SearchBot / GPTBot / ChatGPT-User: OpenAI's three crawlers, each independently controllable in robots.txt.
ClaudeBot / Claude-User / Claude-SearchBot: Anthropic's three crawlers as of late 2025, splitting training, live fetches, and search indexing.
lastmod: sitemap-entry timestamp of the last meaningful content change. Google actively polices honesty here; per Mueller, "faking the lastmod date in your XML sitemaps won't help your SEO."
Soft-404: a page that returns HTTP 200 but is functionally a not-found page (no content, "page not found" message). Google treats these as 404 anyway and trust drops if your CMS emits them.
SSR / SSG / hybrid / CSR: SSR (server-side render — your server returns ready HTML per request); SSG (static site generation — pre-rendered HTML at build time); hybrid (SSR/SSG for above-the-fold, JS-hydrated for interactivity); CSR (client-side render — the server returns an empty shell and JS builds the page in the browser). Google can render all four; AI crawlers can't execute JS, so CSR is invisible to them.
Hydration: the process where JavaScript "attaches" to server-rendered HTML to make it interactive (event handlers, dynamic state). Critical above-the-fold content should be in the HTML pre-hydration; the JS just adds interactivity.
Sitemap ping: the old https://google.com/ping?sitemap=... URL operators used to "nudge" Google to recrawl a sitemap. Deprecated June 2023, fully shut down by end of 2023.
SpamBrain: Google's AI-driven spam detection system, the engine behind link-spam and quality demotions since 2022.
PBN (private blog network): a ring of low-quality sites secretly owned by one operator to pass link equity to a target. SpamBrain detects most of them and the practice is net-negative under modern Google.

What success looks like

You ship with the staging-leak vectors closed, robots.txt + canonical + sitemap signals internally consistent, schema validated, two analytics surfaces wired, and three baselines captured (CrUX field data, AI-citation manual baseline, branded-search GSC baseline). You do NOT promise the founder a top-10 ranking in week 2. You set the expectation that cold-start is 3-9 months and that the work on this checklist shortens it — not by gaming Google but by giving its entity layer real signals on launch day.

0 / 36 · 0%

Critical Important Nice to have

Show only undone

Lock these on launch day. Every other category assumes these are right; getting them wrong takes weeks to undo.

Critical HTTPS enforced site-wide; HTTP redirects to HTTPS via 301 HTTPS has been a confirmed Google ranking signal since 2014 and is the table-stakes baseline in 2026 — Chrome marks non-HTTPS pages as 'Not Secure'. Launching with mixed HTTP/HTTPS exposure, missing 301 from HTTP to HTTPS, or an invalid/self-signed certificate kills trust signals on day one and creates duplicate-URL canonicalisation problems Google will resolve unpredictably.
How to do this
Verify in this order before flipping DNS:
Certificate valid (not self-signed, covers all hostnames including www and apex), auto-renewal configured.
http://example.com/ → 301 → https://example.com/; http://www.example.com/... → 301 → https://www.example.com/... (or your chosen variant).
HSTS header (Strict-Transport-Security) set, ideally with preload after the site has been live for a few days.
Mixed-content audit: no http:// references in HTML/CSS/JS (DevTools → Security tab will flag).
Sanity-check with curl -sI http://yoursite.com — first response line should be HTTP/1.1 301 Moved Permanently.
Critical Pick ONE canonical hostname (www vs non-www, trailing slash) and enforce it via 301s Google treats https://example.com/foo and https://www.example.com/foo as different URLs by default. Without server-side 301 enforcement to a single chosen variant, link equity splits across both, canonical signals disagree, and you end up with two indexed copies of every page. The same is true for trailing-slash policy — pick one and stick with it.
How to do this
Decide and document:
Hostname: www or apex. No SEO advantage either way — pick one.
Trailing slash: /foo/ or /foo. Same — pick one.
Then enforce server-side:
Non-chosen hostname → 301 → chosen hostname (Nginx/Apache/CDN rule).
Non-chosen slash policy → 301 → chosen policy.
Internal links use the chosen variant directly (no internal redirects).
Self-referencing canonical on every page uses the chosen variant.
Sitemap entries use the chosen variant.
Test 4 variants of every key URL: http/https × www/apex. Three of them must 301 to the fourth.
Important Google Search Console Domain property verified via DNS, not URL-prefix Domain-property verification (DNS TXT record) covers all subdomains and both HTTP/HTTPS — one property, one source of truth. URL-prefix verification limits you to that exact subdomain+protocol and you need 4+ properties to cover the same surface, which routinely fragments dashboards and obscures staging or subdomain leaks.

How to do this

In GSC → Add property → Domain (left option, not the URL-prefix on the right). Paste the apex domain (example.com, no protocol, no www). Google generates a TXT record; add it to your DNS. Verification can take a few minutes after DNS propagates.
Once verified, remove any redundant URL-prefix properties you set up earlier (or keep them only if you need per-subdomain dashboards). All your sitemaps and Inspect-URL checks now run against one authoritative property.
Also verify Bing Webmaster Tools the same day — it's free and Bing powers ChatGPT Search's web index.
Important GA4 wired and validated in DebugView before real traffic arrives Launching with broken analytics means the first 1-3 weeks of launch data — usually the highest-traffic period from press, social, and direct — is lost forever. GA4's DebugView shows events in near-real-time while you click around, so you confirm pageview, scroll, click, and form events fire BEFORE anyone real visits.
How to do this
Two-step validation:
Install the GA4 measurement tag (gtag.js or GTM container) on every page, including the homepage and any conversion pages.
Open your browser with the GA Debugger extension (or append ?debug_mode=1 if your GTM is configured for it). Visit the live site. Open GA4 → Configure → DebugView. You should see your events in seconds.
Verify: page_view on every navigation, scroll at 90%, click on outbound links, your defined conversion events on form submission / purchase. If any are missing, fix BEFORE launch — re-attribution of week-1 data is impossible.
Repeat the validation on production with a fresh incognito session right after DNS cuts over.
Nice DNS TTL shortened to 300-600s during the launch window If you discover post-launch that DNS is routing to the wrong target (old host, mis-pointed CDN, broken edge config), a TTL of 86400 (24h) means broken-routing visitors hit the wrong destination for up to a day. A 300-600s TTL lets you pivot in minutes. Raise back to a normal TTL (3600-86400) once the launch is stable.

How to do this

72 hours BEFORE launch, lower your A/AAAA/CNAME TTL on the launch hostnames (apex, www, any subdomains) to 300 (5 minutes). The 72-hour lead time lets DNS resolvers worldwide flush their cached older-TTL copies.
Keep the short TTL for the first 48-72 hours after launch. Once you're confident routing is stable and you don't need to pivot, raise it back to 3600 (1 hour) or 86400 (1 day) to reduce DNS-lookup overhead for normal traffic.
Note: this is a launch-window technique. Permanent 300s TTL has no SEO downside but does add tiny per-request DNS lookup cost.
Critical CDN cache purged before DNS cutover, and cache-key includes the host If your CDN still has the staging build cached when DNS flips to point at production origin, real users hit the stale staging response for the cache lifetime (often hours). And if the cache-key doesn't include the host, prod and staging requests collide in the same cache entry — Googlebot pulls whichever version was cached most recently. Both failure modes are silent until rankings drop.
How to do this
15 minutes BEFORE DNS cutover:
Purge your CDN cache for the production hostname. Cloudflare: Caching → Configuration → Purge Everything. Fastly: fastly purge --all. Vercel/Netlify: usually automatic on deploy, but verify in the dashboard.
Verify the cache-key includes the host. Cloudflare: Caching → Cache Rules → ensure no rule strips the host from the cache-key. Custom edge workers: check that request.url.host participates in the cache-key hash.
After cutover, curl -I https://yoursite.com/ twice and confirm the second response includes a fresh cf-cache-status: HIT (or your CDN's equivalent) AND the last-modified/etag matches your production build, not staging.
For static asset paths (JS/CSS bundles), include the build hash in the filename so you don't need to purge — old visitors get the old hashed bundle (cached forever), new visitors get the new hash. main.a3f2c8.js, not main.js.

The highest blast-radius bucket. A single forgotten staging directive that survives into production can de-index the whole site for weeks. Every item here is 'check it twice on launch day'.

Critical robots.txt does NOT contain Disallow: / (the staging leftover) The single most common launch-day disaster: Disallow: / from the staging robots.txt survives into the production deploy and tells Googlebot not to crawl anything. Google honours it within hours, organic traffic flatlines for as long as it takes you to notice. This check costs five seconds; missing it costs weeks of lost ranking.
How to do this
Within the first 5 minutes of DNS cutover:
Open https://yoursite.com/robots.txt in an incognito tab.
Confirm there is NO Disallow: / line under User-agent: * or User-agent: Googlebot.
Confirm the file has positive content: at minimum a Sitemap: line and any specific Allow:/Disallow: rules you actually want.
If you discover Disallow: / post-launch: fix the file immediately, then in Google Search Console → robots.txt report → request a re-fetch. Google usually re-checks within an hour.
Check your robots.txt free
Critical robots.txt does NOT block /assets/, /static/, /css/, /js/ Googlebot's renderer needs CSS and JavaScript to lay out the page, compute Core Web Vitals, and decide whether content is visible to users. Blocking /assets/, /static/, or specific .css/.js paths silently degrades rendering quality and CWV scoring across the entire site. Old WordPress and CMS templates routinely ship robots.txt with these blocks left over from 2014 best practice.
How to do this
Run a Googlebot-perspective render in GSC → URL Inspection → Test live URL → View tested page → Screenshot. If the rendered screenshot is missing CSS (looks like a 1995-style unstyled page) or images, robots.txt is blocking the assets Googlebot needs.
Common offenders to remove from robots.txt:
Disallow: /wp-content/ (legacy WP advice — blocks the entire theme + uploads)
Disallow: /assets/
Disallow: /static/
Disallow: /css/ / Disallow: /js/
If you need to block specific spider-trap paths under those folders, use precise rules — never the parent.
Check your robots.txt free
Critical No site-wide noindex left in templates (header, meta tag, X-Robots-Tag) Three independent surfaces can carry a noindex directive — the HTML <meta name="robots" content="noindex">, the HTTP X-Robots-Tag header set by CDN/edge/server config, and per-CMS plugin defaults. Any one of them shipped from staging silently de-indexes pages. Per Mueller, Google honours the strictest signal — so the directive only has to leak in one place.
How to do this
Pre-launch audit (run a crawler — Screaming Frog free tier handles up to 500 URLs):
Crawl the staging environment. Note every URL that emits a noindex (meta tag or X-Robots-Tag header).
For each noindex URL: decide whether it SHOULD be noindex in production (cart, account pages, search results, internal admin) or whether the directive was a staging-only artefact.
Strip staging-only noindexes from templates BEFORE deploy.
Post-launch verification (within 1 hour):
Run curl -sI https://yoursite.com/ | grep -i x-robots-tag on the homepage and 5 random indexable URLs. Should return nothing or no noindex value.
View-source the homepage; search for noindex. Should be absent.
GSC → URL Inspection → 5 random indexable URLs → 'Indexing allowed: yes'.
Run the on-page audit free
Critical Staging environment is password-protected, NOT relying on robots.txt or noindex Mueller's standing recommendation since 2019, re-confirmed in 2025 staging guidance: 'it's better to use server-side authentication.' robots.txt blocks crawling but not necessarily indexing if external links to staging exist; noindex on staging means a small template bug or a staging-to-prod leak indexes nothing. HTTP Basic Auth or a VPN-only requirement is the only defence that can't be bypassed by a stray inbound link.
How to do this
Configure at the staging hostname level, before any application code runs:
Nginx: auth_basic "Staging"; + auth_basic_user_file with htpasswd-generated credentials.
Apache: AuthType Basic in .htaccess for the staging vhost.
Vercel/Netlify: enable Password Protection on the staging deployment.
CDN-fronted: edge-level basic auth (Cloudflare Workers, Fastly VCL).
Bonus: a staging environment behind auth ALSO blocks accidental press / analytics / press-release crawlers from picking up the wrong URL before launch.
Whatever you do, do NOT rely on Disallow: / in robots.txt OR <meta name="robots" content="noindex"> as the only staging defence — both are advisory and both have failure modes that bite in production.
Important AI-crawler robots.txt policy is explicit per bot, not 'allow-all' by default and not blanket-blocked OpenAI, Anthropic, and Perplexity each run multiple bots with different roles — GPTBot (training) vs OAI-SearchBot (search indexing) vs ChatGPT-User (live fetches on user prompt); ClaudeBot vs Claude-User vs Claude-SearchBot. Late-2025 research found publishers who blanket-blocked AI experienced a ~23.1% total traffic decline without a reliable reduction in citations [weak provenance, treat as directional]. Decide per bot, document why, ship it.
How to do this
For each of the following bots, decide ALLOW or DISALLOW and write the rule. Defaults below are the typical 2026 operator choice for a content/SaaS launch:
User-agent: GPTBot — OpenAI training. Default: Disallow if you don't want your content used to train future models, Allow otherwise.
User-agent: OAI-SearchBot — ChatGPT Search indexing. Default: Allow (this is the path to ChatGPT Search citations).
User-agent: ChatGPT-User — live fetch on user prompt. Default: Allow.
User-agent: ClaudeBot — Anthropic training. Default: Disallow if you opt out of training; Allow otherwise.
User-agent: Claude-User — live fetch. Default: Allow.
User-agent: Claude-SearchBot — search indexing. Default: Allow.
User-agent: PerplexityBot — Perplexity index + live. Default: Allow.
User-agent: Google-Extended — controls inclusion in Gemini training, separate from Googlebot. Default: Disallow for training opt-out; does not affect Google Search ranking.
User-agent: CCBot — Common Crawl, used by many AI vendors. Default: Disallow for training opt-out.
Document your choice in a comment at the top of robots.txt so the next person doesn't undo it. Cross-link to our robots.txt AI-crawler checklist for the full reasoning per bot.
Check your robots.txt free
Critical robots.txt Sitemap directive points at the LIVE host, not staging After the staging-leftover Disallow check, the second-most-common launch disaster is a robots.txt where the Sitemap: directive still reads https://staging.acme.com/sitemap.xml. Googlebot follows it, hits a password-protected (or 404) staging URL, and gives up on your sitemap entirely. You lose crawl-discovery on every URL not already linked from a strong indexed page.
How to do this
Within the first 5 minutes of DNS cutover:
curl -s https://yoursite.com/robots.txt | grep -i sitemap
Every Sitemap: line MUST start with https://yoursite.com/... — your production hostname.
No staging., no dev., no preview., no IP addresses, no localhost.
Sitemap URL itself must return 200: curl -sI "$(grep -i ^sitemap robots.txt | awk '{print $2}')"
If you find a staging reference: fix it immediately, then in GSC → Sitemaps → remove the old/broken sitemap entry and resubmit the corrected one.
Check your robots.txt free
Critical XML sitemap contains zero 4xx, zero 5xx, and zero redirected URLs A sitemap full of stale 301s, 404s, or 5xx URLs actively hurts a fresh site's crawl budget. Googlebot rate-limits crawl on new domains harder than established ones; every dead URL Google fetches from your sitemap is one fewer real URL it gets to that day. The most common cause: launching a relaunch with the old sitemap still generated by the CMS, before the new URLs got regenerated.
How to do this
Pre-launch audit (Screaming Frog free tier — Mode → List → paste sitemap URL):
Crawl every URL listed in your sitemap.
Filter the result: status MUST be 200 for every entry.
Remove from sitemap: any 301/302 redirected URLs (sitemap should list the FINAL canonical URL, not redirected ones), any 404s, any 5xxs, any URLs blocked by robots.txt, any URLs with noindex.
After fix, in GSC → Sitemaps → resubmit. Google re-processes within hours-to-days.
Ongoing: regenerate the sitemap on deploy, not on a stale cron. If your CMS generates it lazily, set up a build-time hook that calls the sitemap-generate endpoint as part of CI.
Important /sitemap.xml resolves, returns 200, is valid XML, and is under 50MB / 50k URLs per file Google's sitemap protocol caps each sitemap file at 50,000 URLs OR 50MB uncompressed. Exceed either and Google silently truncates (or rejects). Sites that ship a single mega-sitemap on launch day discover this when GSC's Sitemap report shows 'Couldn't fetch' or 'Submitted URLs: 50000' against an actual catalog of 80,000 pages. Fix by splitting into a sitemap-index with multiple child sitemaps.
How to do this
Verification before launch:
curl -sI https://yoursite.com/sitemap.xml — first line MUST be HTTP/1.1 200 OK, not 404 or 403.
curl -s https://yoursite.com/sitemap.xml | xmllint --noout - — must parse without errors (no syntax errors, proper namespace).
curl -s https://yoursite.com/sitemap.xml | wc -c — under 52,428,800 bytes (50 MB).
If your catalog is large: use a sitemap-index.xml with multiple child sitemaps (e.g. sitemap-products-1.xml, sitemap-products-2.xml) — each child stays under the cap.
Submit the sitemap-index (not the children) in GSC → Sitemaps. Google discovers and crawls the children automatically.

What determines whether the link equity you earn aggregates to the right URL. Get the signals internally consistent — canonical, sitemap, internal links, redirects all agreeing.

Critical Self-referencing canonical on every indexable page, in raw HTML (not JS-injected) Operator default across SurferSEO, Ahrefs, Yoast, and Semrush 2025-2026 guides: every primary page gets a self-referencing canonical because it reinforces that page's authority and prevents URL variations (tracking parameters, session IDs, A/B test parameters) from creating unintended duplicate competition. Google's December 2025 JS SEO docs update added: set canonical in raw HTML, not via JS injection, so the pre-render signal is consistent.
How to do this
Every indexable page should have, inside <head>, exactly one:
<link rel="canonical" href="https://example.com/exact-page-url" />
Rules:
Absolute URL, not relative.
HTTPS, the chosen hostname variant (per fou-02), the chosen trailing-slash policy.
Points to itself (self-referencing), not to a different page, unless the page is a deliberate duplicate of a canonical elsewhere.
Present in the raw HTML returned by the server — verify with curl -s https://yoursite.com/page | grep canonical. If the canonical only appears after JS execution, fix the template.
Crawl your staging site with Screaming Frog → 'Canonicals' tab → confirm every indexable URL has a canonical and it matches the URL itself. Any divergence is a bug.
Validate this canonical free
Critical All canonical, sitemap, internal-link, and hreflang signals agree — no contradictions Google resolves canonical conflicts unpredictably and demotes signals it can't trust. If your sitemap lists /foo but the page's canonical points to /bar and your menu links to /foo?utm_source=..., Google has three votes and picks the one its algorithm prefers — which may not be the one you want indexed. Internal consistency is the cheapest ranking signal you can ship.
How to do this
Audit (Screaming Frog → Reports → Canonicals):
Every URL in the XML sitemap MUST have a canonical that points to itself.
Every internal link points to the canonical URL — not a tracking-parameter or A/B-test variant.
If hreflang is configured: every hreflang URL has a canonical that points to itself, not to the default-language version.
Pagination uses self-canonical (Google deprecated rel=prev/next in 2019 — strip leftover tags but don't replace with anything).
Common failure modes to fix:
Sitemap has /foo/, canonical has /foo (trailing-slash mismatch).
Internal links use HTTP, canonical uses HTTPS.
Internal links use uppercase, canonical uses lowercase.
UTM-tagged campaign links inside the same site (use fragment tracking or rewrite to canonical URLs).
Validate this canonical free
Important All redirects are server-side 301/308; no chains longer than 1 hop on critical URLs; no JS redirects Google passes link equity through 301/308 with minimal loss, but redirect chains compound that loss and slow first-byte time. Client-side JavaScript redirects (window.location, meta refresh under 1s) work but Google has to render the page first to discover the redirect — which is slower, less reliable, and breaks for non-rendering AI crawlers entirely. Use them only when no server-side option exists.
How to do this
Pre-launch redirect audit (Screaming Frog → Reports → Redirects & Canonicals):
Crawl the new site PLUS any old URLs you're redirecting from (if this is a relaunch / migration).
Filter for status 301 → identify any chains (A → B → C → D). Compress to A → D directly.
Filter for any meta-refresh or JS redirects. Replace with server-side 301 where possible.
Verify every redirect lands on a 200 (not a 404, not a soft-404, not another redirect).
Internal links should point to FINAL URLs, not redirecting ones. Update the source HTML — don't rely on the redirect doing the work on every request.
For migration redirects from old URLs: 1:1 map every old indexable URL to its closest new equivalent. Generic 'redirect everything to homepage' destroys link equity and Google flags it as a soft-404 pattern.
Important 404 pages return HTTP 404; no soft-404s from CMS misconfiguration A soft-404 is a page that returns HTTP 200 but is functionally a not-found page (no real content, 'page not found' message, redirect-to-homepage with no real URL match). Google detects these heuristically and treats them as 404 anyway — but trust in your URL→status mapping drops, which can cascade into 'Crawled — currently not indexed' for borderline pages. Most CMSes ship soft-404 generators by default.
How to do this
Verify:
curl -sI https://yoursite.com/this-url-does-not-exist — first line must be HTTP/1.1 404 Not Found, not HTTP/1.1 200 OK.
The body of the 404 page should be a real, branded 404 page (helpful: site search, top-pages list, contact link) — not just blank.
Avoid 'redirect 404 to homepage' (very common WordPress pattern). Google flags this as soft-404; users lose URL context.
GSC monitoring: Index → Pages → Reasons → 'Not found (404)' and 'Soft 404'. Both are normal at low volume on a launched site; if 'Soft 404' spikes after launch, your error handling is misconfigured.
Critical Relaunch / migration: 1:1 redirect map for every old indexable URL verified to land on 200 Skip this and you destroy the link equity you earned on the old site. The default failure mode — 'redirect everything to the homepage' — is the single biggest reason migrations tank rankings. Google flags it as a soft-404 pattern and the old URLs lose their accumulated authority. Per Kelly-Anne Crean (Search Engine Land, March 2026): 'Most migration failures are preventable before launch.' This is what she means.
How to do this
Pre-cutover work (one full day, minimum):
In old-site GSC → Performance → Pages → export top 500-1000 URLs by clicks (these are the URLs carrying real equity).
For each old URL: map it to the closest content equivalent on the new site. If there's no equivalent, map to the nearest topical parent category — NEVER to homepage.
Build the redirect map as server-side 301s (Nginx rewrite, Apache RewriteRule, or CDN edge rules).
Run a Screaming Frog crawl of the old-URL list AFTER cutover: every entry MUST return 301 → 200 (not 301 → 404, not 301 → 301 → 200 chain).
Cross-link our indexation troubleshooting checklist for what to monitor in the 30 days after a migration. The first 14 days post-cutover is when most of the visible damage shows up.

Half the open web fails the CWV bar. Lab data diagnoses; field data ranks. Get the architecture right at launch — the field data you collect in the first 28 days becomes your CrUX baseline.

Critical Critical above-the-fold content is in initial server-rendered HTML (not pure CSR) Vercel + MERJ's July 2024 analysis of 100,000+ Googlebot fetches found Google does fully render JavaScript — but Vercel's follow-up confirmed AI crawlers (ChatGPT, Claude, Perplexity) do NOT execute JS. With zero-click at 58.5% of US searches (SparkToro) and AI Overviews cutting position-1 CTR by 58% on AIO-present queries (Ahrefs Feb 2026, 300k keywords), launching with a pure CSR architecture means AI-search visibility starts at zero.
How to do this
Architecture check (do this BEFORE the launch window — it's not a 5-minute fix):
curl -s https://yoursite.com/key-page | wc -l and view the raw HTML. Critical content (H1, primary copy, hero image src, CTA links) should be present in the raw HTML response.
If the raw HTML is mostly an empty <div id="root"></div> shell, you're shipping CSR. AI crawlers will see nothing.
Fix paths (in order of effort):
Best: SSR or SSG via Next.js, Nuxt, SvelteKit, Astro, Remix. Critical pages pre-rendered to HTML at build or request time.
Acceptable: hybrid — SSR for above-the-fold, hydrate to SPA for interactivity.
Last resort: dynamic rendering (server-side rendering for known bot user-agents only). Google supports it as a workaround but no longer recommends it — it's tech debt.
Pure CSR is now operator-flagged as a launch handicap for any organic-traffic-seeking site. Exception: closed-loop apps behind auth where SEO doesn't matter.
SEO hand-off note: this is engineering work, not an SEO check. Your job at launch is to FLAG the architecture decision and document the cost of CSR (zero AI-search visibility) — not to rewrite the framework. File the ticket with the specific failing URLs, link this item, and put the eng-lead on it.
Run the on-page audit free
Critical LCP field-data target: ≤ 2.5s on mobile (the bar most sites fail) HTTP Archive 2025 Web Almanac: only 62% of mobile pages pass LCP — it's the CWV metric that drags total pass-rate to 48%. Lab tools (Lighthouse, PageSpeed Insights' synthetic run) help you DIAGNOSE LCP but they don't determine ranking. Google web.dev: 'lab data isn't used for Google search rankings.' Field data via CrUX is what ranks. A fresh launch has no CrUX data — instrument Real User Monitoring on day one.
How to do this
Diagnose pre-launch (lab data):
PageSpeed Insights on 5 representative URLs. Note the LCP number AND what element is the LCP (usually the hero image or above-the-fold H1).
Optimise the LCP element: preload the hero image with <link rel="preload" as="image" href="/hero.webp">; use fetchpriority="high"; serve a properly-sized variant (don't deliver a 4000px image to a 375px mobile viewport).
Re-test until lab LCP is < 2.5s — leaves headroom for the real-world ~30% inflation from slow networks/devices.
Instrument field data on launch day:
The web-vitals JS library sends real-user LCP/INP/CLS to your analytics endpoint.
After 28 days of real traffic, your site starts appearing in the public CrUX dataset and GSC's Core Web Vitals report.
Important INP field-data target: ≤ 200ms (the metric that replaced FID) INP replaced FID on 12 March 2024. Per HTTP Archive 2025 Web Almanac, 77% of mobile pages pass INP — better than LCP, but heavily interactive sites (filters, autocomplete, large menus, complex forms) still fail. INP measures the FULL interaction latency, not just the first one. A site that passes FID can fail INP if any single interaction (e.g., opening a 12-item dropdown) takes >200ms to respond.
How to do this
Diagnose:
Open PageSpeed Insights → Diagnostics → look for 'Avoid long main-thread tasks' or 'Total Blocking Time' warnings.
Use Chrome DevTools → Performance → record interactions on your most-interactive surfaces (search, filter, dropdown menus, login). Look for individual long tasks (> 50ms) — those are INP killers.
Common fixes:
Break large synchronous tasks into chunks (e.g., process 100 rows then yield, not all 5000 at once).
Use requestIdleCallback for non-critical work.
Code-split heavy JS bundles — load only what each page needs.
Defer third-party scripts (analytics, chat widgets, marketing pixels) until after first interaction.
Instrument INP in production via the web-vitals JS library — same instrumentation as LCP. Watch the 75th-percentile INP per page type in your RUM dashboard.
SEO hand-off note: the actual code fixes (code-splitting, deferring third-party scripts, breaking long tasks) are engineering work. Your job at launch is to identify the specific interactions failing > 200ms in DevTools and hand them to eng as a spec, not to ship the patch yourself.
Important CLS field-data target ≤ 0.1 — reserve space for images, fonts, ads CLS measures unexpected layout shifts during the page lifecycle — content jumping because an image loads, an ad slot pushes the article down, or a web font swaps in. 81% of mobile pages pass CLS (Web Almanac 2025) — it's the easiest CWV to fix because the causes are mechanical. Almost every CLS failure traces to images-without-dimensions, fonts loading without size-adjust, or ad/embed slots without reserved height.
How to do this
Mechanical checks:
Every <img> tag has explicit width and height attributes (so the browser reserves the correct space before the image loads). Use the natural pixel dimensions; CSS scaling still works.
Every <iframe> (YouTube, ads, embeds) has explicit width and height.
Web fonts use font-display: swap AND size-adjust / ascent-override / descent-override to match the fallback font's metrics, so the swap doesn't shift text.
Ad slots and any conditionally-rendered above-the-fold content reserve their final height with CSS (min-height) so the load doesn't push other content down.
PageSpeed Insights → Diagnostics → 'Avoid large layout shifts' lists offenders. Fix in order of largest shift contribution.
Important Day-one Real User Monitoring instrumentation collecting field CWV data A fresh site has no CrUX data on launch day — Google's field dataset needs ~28 days of meaningful traffic before your site appears. The web-vitals JS library captures LCP, INP, and CLS from every real user from your first visitor onward, lets you see issues weeks before CrUX surfaces them, and gives you 75th-percentile metrics broken down by page type or country — which CrUX does not.
How to do this
Install on day one — takes 15 minutes:
npm install web-vitals (or pull from CDN: <script type="module" src="https://unpkg.com/web-vitals@4/dist/web-vitals.iife.js"></script>).
In your main JS bundle:
import {onCLS, onINP, onLCP} from 'web-vitals'; function send(metric) { navigator.sendBeacon('/api/rum', JSON.stringify(metric)); } onCLS(send); onINP(send); onLCP(send);
Stand up a tiny endpoint that accepts the beacon and writes to your analytics warehouse / Postgres / GA4 (custom event).
Dashboard: 75th-percentile LCP/INP/CLS by page type and by country. Weekly trend. Alert on regression > 20%.
This is your weeks-earlier warning system. CrUX shows you the public number; RUM shows you the cause.
SEO hand-off note: standing up the beacon endpoint and wiring it to your analytics store is an engineering ticket. Your job at launch is to write the spec (which metrics, what shape, where to send them, who alerts on regression) and confirm the instrument went live on day one — not to write the backend yourself.

Ship only what Google's rich-results docs list — but ship it correctly. Entity signals (Organization, Person, sameAs) shorten cold-start lag per the leaked hostAge logic.

Important Organization schema on homepage with logo, url, and sameAs cluster Organization schema with a populated sameAs array (LinkedIn Company Page, Crunchbase, Wikidata where applicable, your X profile, GitHub for dev-tooling brands) is the entity-establishment lever for SMBs. It tells Google's Knowledge Graph that all these online identities are the same entity — which directly feeds the trust signals the leaked hostAge attribute backs off as the entity accumulates. Cheaper than Wikipedia, almost as effective.
How to do this
JSON-LD in the homepage <head>:
{ "@context": "https://schema.org", "@type": "Organization", "name": "Acme Inc.", "url": "https://acme.com/", "logo": "https://acme.com/logo.png", "sameAs": [ "https://www.linkedin.com/company/acme/", "https://www.crunchbase.com/organization/acme", "https://x.com/acmehq", "https://github.com/acme" ] }
Rules:
Every URL in sameAs must be a profile you OWN and that links back to your homepage. One-sided sameAs without reciprocal mention is weak signal.
Don't list profiles you haven't actually created. Validate every link returns 200 and resolves to a profile page for the same entity.
JSON-LD (not Microdata, not RDFa) per Google's official recommendation.
Validate in Google's Rich Results Test before launch — paste the live URL or raw JSON.
Run the on-page audit free
Nice Person / ProfilePage schema on author bios with sameAs to authoritative profiles Mueller's standing recommendation: 'link to a central place where you say everything comes together for this author... an entity home.' Aleyda Solís's 2025 AI Search Optimization Checklist lists structured author data as non-negotiable for AI-search launch. AI engines lean on author credibility signals when ranking citation candidates — a named author with a bio page, credentials, LinkedIn link, and sameAs schema is more citable than an anonymous 'team' byline. This compounds with E-E-A-T for classic Google ranking.
How to do this
For every article author, ship a /authors/firstname-lastname/ page with Person + ProfilePage schema:
{ "@context": "https://schema.org", "@type": "ProfilePage", "mainEntity": { "@type": "Person", "name": "Andrei Andriievskyi", "jobTitle": "Founder, LinkGuard", "url": "https://linkguard.ai/authors/andrei/", "sameAs": [ "https://www.linkedin.com/in/andrievskiiandr/", "https://x.com/o638562538", "https://github.com/Black-coffe" ], "knowsAbout": ["SEO", "backlink monitoring", "technical SEO"] } }
Reference the author from inside each Article's author property:
"author": { "@type": "Person", "name": "Andrei Andriievskyi", "url": "https://linkguard.ai/authors/andrei/" }
The author page is the 'entity home' Mueller refers to. sameAs URLs must resolve and ideally link back.
Run the on-page audit free
Important Per-page-type schema (Article / Product / FAQPage / HowTo) matches ONLY what Google's rich-results docs list Google's own structured-data intro: 'rely on the Google Search Central documentation as definitive for Google Search behavior, rather than the schema.org documentation.' schema.org defines hundreds of types and properties Google doesn't surface as rich results. Shipping speculative schema is calorie burn now and a maintenance liability when the spec drifts. Ship only what Google's Search Gallery lists for your page type.
How to do this
Map each page type to the Google-documented schema only:
Article / BlogPosting / NewsArticle — required: headline, image, datePublished, dateModified, author (with Person reference). Recommended: publisher (Organization), wordCount.
Product — required: name, image, description. Recommended: brand, sku, gtin, offers (with price and availability), aggregateRating (only if you have REAL on-site reviews — see anti-04).
FAQPage — Google has deprecated FAQ rich results for most sites (announced 2023, expanded since); the schema still validates but the SERP feature is gone. Ship it only if it organises your content cleanly, not for rich-result lift.
HowTo — restricted to mobile in 2024; still valid for AI-Overview extraction of step lists.
BreadcrumbList — on every page deeper than 1 level. Cheap, universally surfaced.
Validate every JSON-LD block in Rich Results Test AND Schema Validator. The first checks Google eligibility; the second checks spec correctness.
Important JSON-LD only, in raw HTML, not injected by JavaScript Google's official recommendation is JSON-LD over Microdata or RDFa (cleaner separation from markup, easier to maintain). Critically, JSON-LD should be present in the raw server-rendered HTML — Google's December 2025 JS SEO docs update flagged that JS-injected schema can be processed inconsistently between Google's first-pass crawl and rendering pass, and is invisible to AI crawlers entirely (they don't execute JS).
How to do this
Verify each page's schema is server-rendered:
curl -s https://yoursite.com/page | grep -A 20 'application/ld+json'
The JSON-LD block should appear in the raw response, NOT after JS execution.
If your CMS/framework injects schema client-side (some React/Vue plugins do this by default), move it to server-rendered output:
Next.js: <Head> in getStaticProps / getServerSideProps.
Nuxt: useHead() in setup, not onMounted.
WordPress: any structured-data plugin worth using outputs to raw HTML by default.
Common bug: theme outputs JSON-LD twice (once server-side, once via a JS plugin). Run Rich Results Test on the live URL — duplicate schema blocks cause Google to ignore both.
SEO hand-off note: moving JSON-LD from a JS plugin to server-rendered output is an engineering change in most frameworks. Your job is to flag every page where Rich Results Test shows JS-injected schema and spec the move (which file, which template hook); eng implements.

Things that aren't on your site but determine how fast you escape cold-start. The pre-launch trust-seeding work that the post-leak operator consensus says shortens hostAge lag.

Important Brand entity rows live on Wikidata, Crunchbase, and LinkedIn BEFORE launch These three third-party entity records seed the sameAs cluster the day you launch — your Organization schema gets to point at real, third-party-verified profiles instead of empty pages. The leaked Google hostAge logic backs off as fresh entities accumulate real-world signals; having LinkedIn Company + Crunchbase + Wikidata live BEFORE launch means day-one Googlebot already sees you as an entity with external corroboration, not a fresh-and-suspicious string.
How to do this
Two weeks before launch:
LinkedIn Company Page — create, populate (logo, banner, description, website URL pointing to launch URL). Free.
Crunchbase — submit a company profile (founders, founding date, headline, website). Free; can take days to a few weeks to approve.
Wikidata — only if you have an actual entity claim (notability standards are lower than Wikipedia). Add Q-item with type 'business', founded date, official website. Free.
Every profile must link back to your homepage. Use the EXACT homepage URL — same hostname variant, same trailing-slash policy.
Then your Organization schema's sameAs array on day one points at four real things (LinkedIn, Crunchbase, your X profile, your GitHub if applicable) — instead of three placeholder profiles that scream 'I created these yesterday for SEO purposes.'
For SaaS brands: also create G2 and Capterra profiles; both are entity sources Google's Knowledge Graph reads.
Important Land 1-2 real third-party mentions in the first 30 days (press, podcast, or named newsletter) Cold-start lag (1.74% top-10 in year one, Ahrefs 2M-page study 2025) shortens fastest when fresh entities accumulate real inbound signals. Not PBNs, not paid-link networks, not guest-post farms — all net-negative under SpamBrain. ONE real mention from a trade publication, podcast, or named operator's newsletter beats 50 farmed links and feeds the brand-mention signal AI search now weighs heavily: Ahrefs' 2025 75K-brand study found brand mentions correlate with AI citation ~3× more strongly than typical backlinks (directional, not causal).
How to do this
Three plays that consistently work for technical-SaaS launches:
Founder podcast interview — find a niche-relevant podcast with 1k-50k listeners (small enough to book you, big enough to count). Note: podcast show-notes links are usually nofollow and the link itself is low link-equity; the real value is the named brand mention + journalists/podcasters citing you later.
Trade publication launch coverage — pitch your industry-specific publication (e.g. Search Engine Journal for SEO tools, GeekWire for PNW startups) with a real angle, not a press release. One real piece beats five syndicated rewrites.
Named operator's Substack / newsletter — if you're solving a real problem for SEOs / link builders / etc., contributors to large-audience newsletters in that space often link to genuinely useful new tools without a payment. Build the relationship 60-180 days BEFORE launch (60 only works if there's already a warm intro).
Realistic pitch-to-placement ratio: expect 15-25 personalised pitches per real editorial placement, even for technically interesting launches. Plan calendars accordingly — 'we'll just get a few mentions in the first month' is fantasy; 'we'll send 60 personalised pitches and expect 3 placements' is operator-truth.
NOT this:
Paid sponsored 'review' farms with no editorial standards.
'PR distribution' services that syndicate the same press release to 200 low-quality outlets — Google flags the pattern.
Guest-post networks. Honest test for a PBN-adjacent network: any network with a public order form or a monthly inventory of 'DA 50+ sites' is PBN-adjacent.
The current 2026 failure mode: AI-generated 'editorial' placements on aged-but-thin sites, sold via Telegram / Slack brokers at $50-200 per 'guest post'. SpamBrain detects the cluster.
Check if it is indexed in Google
Critical Uptime + 5xx monitoring active, alerting within 5 minutes of incident Per 2026 indexing-error guidance attributed to Mueller (paraphrased in multiple operator guides), Google is less patient with unreliable response codes than in 2021 — repeated 5xx responses during a launch window can demote URLs in the candidate set and slow re-crawl rates for weeks. A site that returns 200 on launch day and 503 during the press-coverage surge teaches Googlebot to back off exactly when you need it to crawl most.
How to do this
Three monitors, configured BEFORE launch:
External uptime check — UptimeRobot (free), Pingdom, or Statuspage Edge. Hits homepage + 3-5 key URLs every 1-5 minutes from multiple regions.
5xx error rate — your CDN (Cloudflare, Fastly, Vercel) exposes 5xx-rate dashboards. Set an alert at >1% of requests over 5 minutes.
Synthetic transactional check — for sites with critical flows (signup, checkout, login), a Playwright/Cypress script run every 15 minutes from a monitoring service.
Alert routes: Slack channel + at least one human's phone (PagerDuty / Opsgenie / a personal SMS-via-email gateway). 'Email-only' alerts get missed during launch day.
If a 5xx incident does happen during launch: fix → wait → in GSC → URL Inspection → request re-crawl for affected key URLs. Google does forgive transient 5xx; persistent 5xx is the demotion case.
Important Day-one AI-citation baseline captured manually on 5-10 priority queries AI Overviews appeared on roughly 16% of queries in November 2025 (Semrush's 10M-keyword tracker, having peaked at ~25% in July before pulling back) — so a third-or-less of your queries even trigger them. Without a launch-day baseline you can't tell whether month-1 'no citations' means your launch failed or means the queries you target simply don't trigger AI Overviews yet. A 30-minute manual audit on launch day saves multi-quarter ambiguity.
How to do this
For each of 5-10 high-priority queries (your top commercial intent, your brand name, your category):
Google the query while logged out + in incognito → screenshot the AI Overview if present. Note: is your URL cited? Is a competitor's? None?
Same query in ChatGPT with search enabled (toggle is below the prompt input) → screenshot the Sources panel.
Same query in Perplexity → screenshot citation list.
Same query in Google AI Mode (enable at labs.google.com → Search Labs → AI Mode, if available in your market) → screenshot the answer with citations.
Same query in Gemini's web-search-enabled mode if you have access.
Save the screenshots to a dated folder. Repeat the audit at month 1, 3, 6 — that's your AI-citation movement graph. Without the day-one snapshot, the month-3 data is uninterpretable.
Sites cited as sources inside AI Overviews see 35% more organic clicks than non-cited competitors (Digital Applied composite, 2026) — measuring this delta justifies investment in the content/authority work it takes to earn citations.

Four things vendor blogs and SEO course-sellers currently sell as pre-launch best practice that don't survive scrutiny against 2025-2026 data. If you've been paying for any of these, you're not behind — you're early enough to redirect budget to the things that actually move the needle.

Important Don't ship llms.txt for AI citation lift — three independent studies say it does nothing SE Ranking's 300,000-domain analysis (2026) found 10.13% adoption and ZERO statistical correlation with AI citations (both classical stats and ML). Kai Spriestersbach's analysis of OtterlyAI's 90-day study: 84 of 62,100 AI bot requests touched llms.txt (0.1%) — three times worse than average content pages. Search Engine Land's 10-site experiment (Jan 2026): 8 of 10 saw no change, 1 declined 19.7%. Per Mueller's 2025 public statements, no major AI engine uses llms.txt as a citation signal — and Google itself has confirmed it won't.
How to do this
Don't ship it for AI citation lift. The evidence is now overwhelming that no major AI engine respects it as a citation-ranking signal.
The narrow exception: if you ship developer tooling, Cursor and GitHub Copilot agents do parse llms.txt for context when working in a user's codebase. So a developer-tooling brand has a real (small) reason to ship it. Everyone else: don't.
Budget you would have spent maintaining llms.txt is better spent on:
Brand-mention work — Ahrefs' 2025 75K-brand study found brand-mention frequency correlates with AI citation ~3× more strongly than typical backlinks do (directional, not causal).
Tightening canonical / structured data consistency (Items in structured-data category).
Real third-party entity profiles (ops-01).
If you've already shipped llms.txt: leaving it doesn't hurt. Just stop selling it internally as a 'we're AI-ready' deliverable.
Important Don't use the Indexing API to push new URLs — restricted to job postings + livestreams only Google's Indexing API quickstart documents the restriction: job postings (JobPosting schema) and livestream content (BroadcastEvent embedded in VideoObject) only. Mueller, Bluesky May 2025: 'We see a lot of spammers misuse the Indexing API like this, so I'd recommend just sticking to the documented & supported use-cases.' 'Fast indexing' services that charge $200-500/month for Indexing-API pushes routinely have their project access revoked, taking your URLs with them when it happens.
How to do this
Don't pay for fast-indexing services. Don't write your own Indexing API client to push non-job-non-livestream URLs.
What to do instead, when you actually need a URL indexed faster than crawl-discovery:
GSC → URL Inspection → 'Request Indexing' (manual, but legitimate; rate-limited to ~10/day).
Get the URL linked from a strong existing indexed page (your homepage, a sitemap, a high-traffic blog post). Crawl discovery via internal link is usually faster than the URL Inspection queue.
Get the URL linked from a strong EXTERNAL indexed page (your launch press mention, a relevant Reddit/HN post, a partner site).
For sites with hundreds of new URLs per day: news sitemaps for news content, and just rely on crawl discovery for the rest.
Time-to-index for normal new pages on a well-instrumented launched site in 2026: 1-7 days. The Indexing API saves you nothing legitimately.
Important Don't submit your sitemap on launch day expecting same-day crawl — that hasn't been true since 2023 Google deprecated the sitemap ping endpoint in June 2023, fully shut down by end of 2023. Even before then, Mueller has been explicit (re-cited 2025): 'uploading sitemaps didn't guarantee that all the URLs would be crawled, and there is no set time for when Googlebot would crawl the sitemap URLs.' The 'submit sitemap, wait 24 hours, ranking begins' playbook some agencies still sell is a 2018 model that doesn't match 2026 reality.
How to do this
What to actually do:
Submit the sitemap index in GSC on launch day (one-time, takes 30 seconds).
Reference the sitemap from robots.txt: Sitemap: https://yoursite.com/sitemap-index.xml
Set realistic expectations with stakeholders: meaningful crawl of all sitemap URLs takes days to weeks, not hours. Ranking depends on more than crawl.
What to NOT do:
Don't pay for sitemap-ping-as-a-service. The endpoint Google deprecated still has dead-letter handlers everywhere.
Don't fake lastmod dates to 'force' re-crawl. Mueller, 2024: 'Faking the lastmod date in your XML sitemaps won't help your SEO' — and the trust signal it costs hurts.
Don't re-submit the sitemap every day expecting that to trigger crawl. GSC accepts re-submission; Googlebot doesn't change its behaviour because of it.
Critical Don't ship AggregateRating with fake review counts — Google will manual-action your structured data Google has cracked down on inflated AggregateRating schema with manual actions in 2025-2026. LinkGuard's own removal of a fake 'aggregateRating 4.9/127' from its landing page in April 2026 was prompted by exactly this risk. Google's structured-data guidelines require the rating to be 'visible to users on the page' and 'reflect a genuine assessment of the item' — fabricated counts on a fresh launch fail both criteria and put your entire structured-data feature eligibility at risk.
How to do this
Two cases, two answers:
You have real, on-site, user-submitted reviews: ship AggregateRating with the actual count and average. Display it visibly on the page. Update as reviews accumulate.
You don't have reviews yet (typical launch): don't ship AggregateRating at all. Empty is fine. Lying is a manual-action risk that nukes ALL your rich-result eligibility, not just the ratings.
Same logic applies to:
Review schema with fabricated reviewer names.
Recipe ratings on commercial pages.
Product schema with made-up aggregateRating blocks.
When you DO accumulate real reviews, integrate them into the page first (visible to users), THEN add the schema. The schema describes the page; the page describes the truth.

About the Author

Andrei

SEO and digital marketing professional with 13+ years of experience. Started as a website administrator in 2011, transitioned to SEO, and achieved top-3 rankings for competitive keywords. Co-founded a consulting firm specializing in marketing audits for companies in Ukraine and internationally. Built LinkGuard to solve the problem he experienced firsthand: most SEO teams purchase links but never monitor their survival. Based in Kyiv, Ukraine.