Some builders have been experimenting with bot-specific Markdown supply as a technique to scale back token utilization for AI crawlers.
Google Search Advocate John Mueller pushed again on the thought of serving uncooked Markdown recordsdata to LLM crawlers, elevating technical considerations on Reddit and calling the idea “a silly concept” on Bluesky.
What’s Taking place
A developer posted on r/TechSEO, describing plans to make use of Subsequent.js middleware to detect AI consumer brokers equivalent to GPTBot and ClaudeBot. When these bots hit a web page, the middleware intercepts the request and serves a uncooked Markdown file as an alternative of the total React/HTML payload.
The developer claimed early benchmarks confirmed a 95% discount in token utilization per web page, which they argued ought to enhance the positioning’s ingestion capability for retrieval-augmented technology (RAG) bots.
Mueller responded with a sequence of questions.
“Are you positive they will even acknowledge MD on an internet site as something apart from a textual content file? Can they parse & comply with the hyperlinks? What is going to occur to your web site’s inner linking, header, footer, sidebar, navigation? It’s one factor to present it a MD file manually, it appears very completely different to serve it a textual content file once they’re searching for a HTML web page.”
On Bluesky, Mueller was extra direct. Responding to technical web optimization advisor Jono Alderson, who argued that flattening pages into Markdown strips out which means and construction,
Mueller wrote:
“Changing pages to markdown is such a silly concept. Do you know LLMs can learn pictures? WHY NOT TURN YOUR WHOLE SITE INTO AN IMAGE?”
Alderson argued that collapsing a web page into Markdown removes necessary context and construction, and framed Markdown-fetching as a comfort play slightly than an enduring technique.
Different voices within the Reddit thread echoed the considerations. One commenter questioned whether or not the hassle may restrict crawling slightly than improve it. They famous that there’s no proof that LLMs are educated to favor paperwork which can be much less resource-intensive to parse.
The unique poster defended the speculation, arguing LLMs are higher at parsing Markdown than HTML as a result of they’re closely educated on code repositories. That declare is untested.
Why This Issues
Mueller has been constant on this. In a earlier change, he responded to a question from Lily Rayabout creating separate Markdown or JSON pages for LLMs. His place then was the identical. He stated to deal with clear HTML and structured information slightly than constructing bot-only content material copies.
That response adopted SE Ranking’s analysis of 300,000 domains, which discovered no connection between having an llms.txt file and the way usually a site will get cited in LLM solutions. Moreover, Mueller has compared llms.txt to the keywords meta tag, a format main platforms haven’t documented as one thing they use for rating or citations.
Up to now, public platform documentation hasn’t proven that bot-only codecs, equivalent to Markdown variations of pages, enhance rating or citations. Mueller raised the identical objections throughout a number of discussions, and SE Rating’s information discovered nothing to recommend in any other case.
Wanting Forward
Till an AI platform publishes a spec requesting Markdown variations of internet pages, the most effective observe stays as it’s. Preserve HTML clear, scale back pointless JavaScript that blocks content material parsing, and use structured information the place platforms have documented schemas.
Source link


