The standard technical SEO audit checks crawlability, indexability, web site velocity, mobile-friendliness, and structured knowledge. That guidelines was designed for one shopper: Googlebot.

That is the way it’s all the time been.

In 2026, your web site has, at the very least, a dozen further non-human shoppers. AI crawlers like GPTBot, ClaudeBot, and PerplexityBot practice fashions and energy AI search outcomes. Consumer-triggered brokers just like the newly introduced Google-Agent, or its “siblings” Claude-Consumer and ChatGPT-Consumer, browse web sites on behalf of particular people in actual time. A Q1 2026 analysis throughout Cloudflare’s community discovered that 30.6% of all internet visitors now comes from now bots, with AI crawlers and brokers making up a rising share. Your technical audit must account for all of them.

Listed here are the 5 layers so as to add to your current technical website positioning audit.

Layer 1: AI Crawler Entry

Your robots.txt was most likely written for Googlebot, Bingbot, and possibly just a few scrapers. AI crawlers want their very own robots.txt guidelines, and so they have to be separate from Googlebot and Bingbot.

What To Examine

Review your robots.txt for rules focusing on AI-specific consumer brokers: GPTBot, ClaudeBot, PerplexityBot, Google-Prolonged, Bytespider, AppleBot-Prolonged, CCBot, and ChatGPT-Consumer. If none of those seem, you’re operating on defaults, and people defaults won’t replicate what you really need. By no means settle for the defaults until they’re precisely what you want.

The bottom line is making a aware choice per crawler moderately than blanket permitting or blocking all the things. Not all AI crawlers serve the identical objective. AI crawler visitors will be cut up into three classes: coaching crawlers that accumulate knowledge for mannequin coaching (89.4% of AI crawler visitors in accordance with Cloudflare knowledge), search crawlers that energy AI search outcomes (8%), and user-triggered brokers like Google-Agent and ChatGPT-Consumer that browse on behalf of a particular human in actual time (2.2%). Every class warrants a unique robots.txt choice.

Chart showing traffic volume by crawler purpose - Cloudflare Radar Q1 2026
Cloudflare Radar knowledge displaying visitors quantity by crawl objective (Q1 2026); Screenshot by writer, April 2026

The crawl-to-referral ratios from Cloudflare’s Radar report could make this an knowledgeable choice for you. Anthropic’s ClaudeBot crawls 20.6 thousand pages for each single referral it returns. OpenAI’s ratio is 1,300:1. Meta sends no referrals. Blocking OpenAI’s OAI-SearchBot or PerplexityBot reduces your visibility in ChatGPT Search and Perplexity’s AI solutions. Blocking training-focused crawlers like CCBot or Meta’s crawler prevents knowledge extraction from a supplier that sends zero visitors again. The crawl-to-referral ratios let you know who’s taking with out giving.

There may be one crawler that requires particular consideration. Google added Google-Agent to its official record of user-triggered fetchers on March 20, 2026. Google-Agent identifies requests from AI methods operating on Google infrastructure that browse web sites on behalf of customers. Not like conventional crawlers, Google-Agent ignores robots.txt. Google’s place is that since a human initiated the request, the agent acts as a consumer proxy moderately than an autonomous crawler. Blocking Google-Agent requires server-side authentication, not robots.txt guidelines. That is each fascinating, and necessary for the longer term, even when it’s not inside the scope of this text.

Official documentation for every crawler:

Layer 2: JavaScript Rendering

Googlebot renders JavaScript utilizing headless Chromium. There may be nothing new about that. What’s new and totally different is that virtually every major AI crawler does not render JavaScript.

Crawler Renders JavaScript
GPTBot (OpenAI) No
ClaudeBot (Anthropic) No
PerplexityBot No
CCBot (Widespread Crawl) No
AppleBot Sure
Googlebot Sure

AppleBot (which makes use of a WebKit-based renderer) and Googlebot are the one main crawlers that render JavaScript. 4 of the six main internet crawlers (GPTBot, ClaudeBot, PerplexityBot, and CCBot) fetch static HTML solely, making server-side rendering a requirement for AI search visibility, not an optimization. In case your content material lives in client-side JavaScript, it’s invisible to the crawlers coaching OpenAI, Anthropic, and Perplexity’s fashions and powering their AI search merchandise.

What To Examine

Run curl -s [URL] in your important pages and search the output for key content material like product names, costs, or service descriptions. If that content material isn’t within the curl response, GPTBot, ClaudeBot, and PerplexityBot can’t see it both. Alternatively, use View Supply in your browser (not Examine Ingredient, which reveals the rendered DOM after JavaScript execution) and verify whether or not the necessary info is current within the uncooked HTML.

CURL fetch of No Hacks homepage
Curl fetch of No Hacks homepage (Picture from writer, April 2026)

Single-page functions (SPAs) constructed with React, Vue, or Angular are notably in danger until they use server-side rendering (SSR) or static web site era (SSG). A React SPA that renders product descriptions, pricing, or key claims completely on the consumer facet is sending AI crawlers a clean web page with a hyperlink to the JavaScript bundle.

The repair isn’t difficult. Server-side rendering (SSR), static web site era (SSG), or pre-rendering solves this for each main framework. Subsequent.js helps SSR and SSG natively for React, Nuxt offers the identical for Vue, and Angular Common handles server rendering for Angular functions. The audit simply must flag which pages depend upon client-side JavaScript for important content material.

Layer 3: Structured Knowledge For AI

Structured data has been a part of technical website positioning audits for years, however the analysis standards want updating. The query is now not simply “does this web page have schema markup?” It’s “does this markup assist AI methods perceive and cite this content material?”

What To Examine

  • JSON-LD implementation (most well-liked over Microdata and RDFa for AI parsing).
  • Schema sorts that transcend the fundamentals: Group, Article, Product, FAQ, HowTo, Individual.
  • Entity relationships: sameAs, writer, writer connections that hyperlink your content material to recognized entities.
  • Completeness: are all related properties populated, or are you simply checking a field utilizing skeleton schemas with title and URL?

Why This Issues Now

Microsoft’s Bing principal product supervisor Fabrice Canel confirmed in March 2025 that schema markup helps LLMs understand content for Copilot. The Google Search workforce said in April 2025 that structured knowledge provides a bonus in search outcomes.

No, you possibly can’t win with schema alone. Sure, it could assist.

The info density angle issues too. The GEO research paper by Princeton, Georgia Tech, the Allen Institute for AI, and IIT Delhi (introduced at ACM KDD 2024, first to publicly use the time period “GEO”) discovered that including statistics to content material improved AI visibility by 41%. Yext’s analysis discovered that data-rich web sites earn 4.3x extra AI citations than directory-style listings. Structured knowledge contributes to knowledge density by giving AI methods machine-readable info moderately than requiring them to extract that means from prose.

An necessary caveat: No peer-reviewed tutorial research exist but on schema’s affect on AI quotation charges particularly. The business knowledge is promising and constant, however deal with these numbers as indicators moderately than ensures.

W3Techs reports that roughly 53% of the highest 10 million web sites use JSON-LD as of early 2026. In case your web site isn’t amongst them, you’re lacking alerts that each conventional and AI search methods use to know your content material.

Duane Forrester, who helped construct Bing Webmaster Instruments and co-launched Schema.org, argues that schema markup is just the first step. As AI brokers proceed transferring from merely decoding pages to creating choices, manufacturers can even have to publish operational fact (pricing, insurance policies, constraints) in machine-verifiable codecs with versioning and cryptographic signatures. Publishing machine-verifiable supply packs is past the scope of a normal audit as we speak, however auditing structured knowledge completeness and accuracy is the inspiration verified supply packs construct on.

Layer 4: Semantic HTML And The Accessibility Tree

The primary three layers of the AI-readiness audit cowl crawler entry (robots.txt), JavaScript rendering, and structured knowledge. The ultimate two handle how AI brokers really learn your pages and what alerts assist them uncover and consider your content material.

Most SEOs consider HTML for search engine consumption. Agentic browsers like ChatGPT Atlas, Chrome with auto browse, and Perplexity Comet don’t parse pages the best way Googlebot does. They learn the accessibility tree as an alternative.

The accessibility tree is a parallel illustration of your web page that browsers generate out of your HTML. It strips away visible styling, structure, and ornament, protecting solely the semantic construction: headings, hyperlinks, buttons, kind fields, labels, and the relationships between them. Display readers like VoiceOver and NVDA have used the accessibility tree for many years to make web sites usable for folks with visible impairments. AI brokers now use the identical tree to know and work together with internet pages.

And the reason being easy: effectivity. Processing screenshots is each dearer and slower than working with the accessibility tree.

Accessibility tree shown in Google Chrome
That is what an accessibility tree seems like in Google Chrome (Picture from writer, April 2026)

This issues as a result of the accessibility tree exposes what your HTML really communicates, not what your CSS (or JS) makes it appear like. A

styled to appear like a button doesn’t seem as a button within the accessibility tree. A picture with out alt textual content means nothing. A heading hierarchy that skips from H1 to H4 creates a damaged construction that each display readers and AI brokers will battle to navigate.

Microsoft’s Playwright MCP, the usual device for connecting AI fashions to browser automation, makes use of accessibility snapshots moderately than uncooked HTML or screenshots. Playwright MCP’s browser_snapshot perform returns an accessibility tree illustration as a result of it’s extra compact and semantically significant for LLMs. OpenAI’s documentation states that ChatGPT Atlas uses ARIA tags to interpret web page construction when looking web sites.

Web accessibility and AI agent compatibility are actually the identical self-discipline. Correct heading hierarchy (H1-H6) creates significant sections that AI methods use for content material extraction. Semantic components like


Source link