Only 7.4% of Fortune 500 have an llms.txt file, study finds

A brand new analysis report revealed on March 31, 2026, by ProGEO.ai reveals a placing hole in how America’s largest corporations are making ready – or failing to organize – for a search panorama more and more formed by generative synthetic intelligence. The examine, titled “Signaling the Shift to Generative Engine Optimization (GEO),” scanned all 500 corporations on the Fortune 500 listing and measured adoption charges throughout three technical protocols: robots.txt, JSON-LD, and llms.txt.

The headline discovering is blunt. In line with the report, solely 7.4% of Fortune 500 corporations – 37 in complete – have carried out llms.txt, a specification launched in 2024 to assist AI platforms course of web site content material extra effectively. In contrast, 92.8% have carried out robots.txt, the 30-year-old crawler management commonplace. The hole between these two figures maps virtually precisely to the gap between a mature, search-engine-era commonplace and a nascent protocol designed for a distinct form of machine reader.

A brand new optimization self-discipline takes form

The report frames these numbers by means of the lens of what ProGEO.ai calls generative engine optimization (GEO) – a apply distinct from conventional SEO (search engine marketing) that focuses particularly on model visibility inside AI-generated responses. Platforms resembling ChatGPT, Claude, and Gemini are altering how patrons discover data, and, in response to Gartner, “Chief Advertising and marketing Officers and their groups want to regulate their internet content material technique to adapt to go looking engine’s evolving algorithms and seem in GenAI-powered search outcomes.”

That shift issues commercially. Organic search traffic has been falling as AI-generated responses reply queries immediately, with out sending customers to exterior web sites. Zero-click conduct – the place a question is resolved completely inside a platform’s interface – has intensified steadily since AI-powered search options started scaling in 2024 and 2025. In opposition to that backdrop, the query of how a model indicators its presence to AI methods is not purely theoretical.

Clinton Karr, CMO of ProGEO.ai, described the findings when it comes to the traditional diffusion framework developed by sociologist Everett Rogers. “ProGEO.ai noticed that the Fortune 500 adoption charges for robots.txt, JSON-LD, and llms.txt mapped to Rogers’ ‘Diffusion of Improvements’ curve, demonstrating a full spectrum of technical advertising maturity,” stated Karr. “Early adopters of llms.txt within the Fortune 500 are signaling their experimentation with generative engine optimization.”

The adoption charges within the knowledge do comply with this curve intently. robots.txt, which dates to 1994 and have become an official IETF commonplace in 2022 as RFC 9309, sits at 92.8% – properly previous the early majority threshold. JSON-LD, a W3C structured knowledge commonplace that has existed since 2011, sits at 53.8% – squarely throughout the late majority. llms.txt, revealed as a specification in 2024, sits at 7.4% – squarely throughout the innovators class on Rogers’ curve.

Half one: robots.txt at 30, nonetheless not constructed for AI

Robots.txt implements the Robots Exclusion Protocol. In line with the ProGEO.ai report, it permits website operators to specify whether or not they permit or disallow internet crawlers from accessing their website. Regardless of its near-universal adoption – 92.8% of Fortune 500 corporations have one – solely 11% of these corporations, simply 55 corporations in complete, have named a selected AI person agent wherever within the file.

This issues as a result of the default conduct of robots.txt is permissive. The absence of an specific directive is handled as an implied permit. So the 89% of Fortune 500 corporations that haven’t named an AI person agent are, in response to the report, “by default extra accessible to AI crawlers than a lot of the 11% who’ve.”

Among the many 55 corporations which have named an AI person agent, ProGEO.ai recognized 270 complete directives throughout 25 distinct AI person brokers. These break down into 105 permit directives, 116 disallow directives, and 49 partial entry directives. A transparent sample emerges when the information is break up by crawler sort. Directives geared toward coaching crawlers – bots that gather content material to construct or refine AI fashions – skew towards disallow. Directives geared toward search crawlers – bots that retrieve content material to generate responses – skew towards permit.

GPTBot, OpenAI’s coaching crawler, is probably the most ceaselessly named AI person agent with 32 complete directives, and it leans towards restriction. CCBot (Widespread Crawl), Google-Prolonged (Gemini), Meta-ExternalAgent, and Bytespider (ByteDance) comply with the identical sample. In contrast, ChatGPT-Consumer (OpenAI’s search agent), OAI-SearchBot, and PerplexityBot present predominantly permissive directives. The implication is that Fortune 500 corporations making deliberate decisions about AI entry are typically protecting their doorways open to AI-generated search whereas attempting to dam the usage of their content material for mannequin coaching.

The efficacy of this method is, nonetheless, contested. In December 2025, OpenAI announced that its ChatGPT-Consumer agent would not comply with robots.txt directives for user-initiated shopping. A number of stories counsel Bytespider, operated by ByteDance, additionally ignores robots.txt directives. Some organizations have responded by transferring enforcement to internet utility firewalls (WAFs), which may block requests on the community degree relatively than counting on voluntary compliance. Cloudflare introduced Robotcop in December 2024 to automate this course of.

WAF-based enforcement introduces its personal complication. In line with the report, Google makes use of the identical person agent for all of its crawlers – masking each Search and Gemini. Blocking Googlebot on the WAF layer to limit Gemini entry would concurrently stop the positioning from being listed for conventional search. The separation of coaching crawlers from search crawlers that robots.txt permits doesn’t translate cleanly to WAF enforcement.

76% of Fortune 500 corporations embody at the least one Sitemap directive of their robots.txt – greater than three in 4. Sitemaps present crawlers with a structured listing of URLs, relative precedence values, and metadata. Collectively, robots.txt and sitemaps type the foundational layer of knowledge retrieval for each engines like google and AI methods.

Half two: JSON-LD – widespread adoption, shallow implementation

JSON-LD (JavaScript Object Notation for Linked Information) is a W3C commonplace for encoding structured knowledge on internet pages. It supplies machine-readable semantic indicators – specific declarations that inform engines like google and AI methods what’s on a web page and what it means. Google explicitly acknowledged in 2019 that JSON-LD is its most popular format for structured knowledge.

In line with the ProGEO.ai report, 53.8% of Fortune 500 corporations have carried out JSON-LD on their homepage – 269 corporations – at a median of 5.1 schema varieties per implementation. The three commonest varieties are Group (utilized by 182 corporations), WebSite (utilized by 147), and SearchAction (utilized by 124). These three varieties deal with conventional search engine marketing duties: Group populates information panels, WebSite permits sitelinks, and SearchAction powers the search field inside Google outcomes.

That baseline, nonetheless, conceals a major maturity hole. ProGEO.ai randomly sampled inside content material pages for 189 of the 269 corporations with homepage JSON-LD. The remaining 77 blocked inside web page scanning, and three returned inconclusive outcomes. Among the many 189 efficiently sampled, 52.4% had JSON-LD solely on the homepage or injected as a site-wide template – doing equivalent work on each web page: declaring Group, WebSite, and SearchAction. Solely 47.6% – 90 corporations out of the 189 sampled – have been including page-specific structured knowledge to inside pages.

The commonest content-specific varieties on inside pages have been Article (discovered at 146 corporations), Particular person (105), and BreadcrumbList (84). These varieties do the work that issues particularly for GEO. They establish the creator, mark the content material as a definite publishable unit, and set up its place inside a website hierarchy – exactly the indicators that AI methods use to construct entity relationships and establish citable sources.

In sensible phrases, the information suggests solely about one-quarter of all Fortune 500 corporations have deployed JSON-LD at a degree of sophistication related to AI visibility. The bulk have the infrastructure however aren’t utilizing it strategically.

Half three: llms.txt – early adopters, contested efficacy

llms.txt is the latest of the three protocols. In line with the specification revealed in 2024, it serves content material in Markdown – the format most effectively processed by massive language fashions – at a file positioned on the root of a website. It makes use of an H1 header to declare the positioning identify, H2 sections to listing URLs, and Markdown to offer contextual element. The report describes sitemaps as maps for engines like google and llms.txt as guidebooks for AI methods.

ProGEO.ai discovered that 37 Fortune 500 corporations – 7.4% – have carried out llms.txt. Evaluation of the file construction and content material throughout these implementations reveals a number of patterns. Roughly two-thirds of the everyday llms.txt file (66.5% by character depend) is prose relatively than URLs. The common file dimension is 6,721 characters, with a median of 31 URLs and a median of eight headers. The required construction requires one H1 header and 6 H2 headers. Outlier evaluation revealed vital variance: one firm carried out 976 H1 headers, undermining the specification’s hierarchical logic; one other revealed an llms.txt file of 1.3 million characters – roughly 250,000 tokens, which exceeds the context window of some AI fashions completely.

The report additionally notes that 70.3% of Fortune 500 corporations which have carried out llms.txt have additionally carried out JSON-LD – a co-adoption charge that implies deliberate, multi-layer serious about AI visibility relatively than remoted experimentation. Eight of the 37 llms.txt adopters have moreover named AI person brokers of their robots.txt, and all however a kind of take a predominantly permissive posture: throughout 51 AI directives in these eight corporations’ robots.txt recordsdata, 41 are permit and just one is disallow.

Six corporations have carried out all three indicators – llms.txt, JSON-LD, and specific permit directives for AI person brokers in robots.txt. In line with the report, these corporations are Nvidia (nvidia.com), Dell Applied sciences (dell.com), Builder FirstSource (bldr.com), Sonic Automotive (sonicautomotive.com), FM (fmglobal.com), and Concentrix (concentrix.com). A footnote flags that Concentrix explicitly disallows ClaudeBot from its complete website, although different AI bots inherit partial permission by means of wildcard guidelines. Throughout the total Fortune 500, fewer than 1% of corporations have carried out all three indicators.

A protocol with out consensus

Whether or not llms.txt really does something helpful for AI visibility is genuinely unresolved. The specification was revealed two years in the past in 2024, and, because the report notes, “the proof for its efficacy is early and contested.” PPC Land reported in July 2025 that server log evaluation discovered AI crawlers don’t request llms.txt recordsdata throughout web site visits, indicating zero precise utilization at the moment.

Google’s John Mueller reiterated all through 2025 that no AI system was utilizing llms.txt. As of March 2026, nonetheless, Google’s personal Gemini documentation has an lively llms.txt file – an implicit acknowledgment of the specification’s relevance even when the mechanics of the way it influences AI responses stay unclear. OpenAI additionally serves llms.txt as of March 2026. Anthropic’s docs.anthropic.com/llms-full.txt, alternatively, returns a “web page not discovered” consequence as of the identical date, regardless of Anthropic having requested Mintlify – a documentation platform – to implement llms.txt help in 2024. In February 2026, a 90-day experiment by OtterlyAI discovered llms.txt supplied no significant impression on AI crawler conduct.

The blended indicators from the platforms themselves make it troublesome to attract agency conclusions. The specification just isn’t but a proper commonplace. It has no RFC equal. Platform endorsement has traditionally been the motive force of adoption for comparable protocols – Google’s specific help for JSON-LD in 2019 drove its uptake amongst enterprises, simply as Google, Microsoft, and Yahoo! drove adoption of Schema.org structured knowledge varieties from 2011 onwards. For llms.txt, that decisive second of platform endorsement has not but arrived, even when the course of journey amongst some platforms seems optimistic.

What the information means for the advertising group

The broader context for this analysis is the continuing disruption to natural search visitors. Ahrefs research published in February 2026 discovered that Google’s AI Overviews now correlate with a 58% discount in click-through charges for top-ranking pages – almost double the 34.5% decline the identical group documented in April 2025. The course is constant and, for publishers and types depending on natural search visitors, extreme. Zero-click searches have turn out to be the bulk end result for a lot of question varieties.

In that context, the query of how manufacturers preserve visibility inside AI-generated responses – relatively than simply conventional search consequence pages – turns into more and more materials. GEO, as ProGEO.ai frames it, is an try to reply that query on the technical infrastructure degree. The report is cautious to notice, nonetheless, that technical indicators are vital however not ample. In line with the report, AI methods cite content material that’s “authoritative, evidence-based, and structured for extraction.” Google’s E-E-A-T framework – expertise, experience, authority, and belief – describes the content material qualities that matter alongside the technical plumbing.

The information from the ProGEO.ai examine doesn’t counsel that llms.txt is a silver bullet – the authors explicitly acknowledge the contested proof round its efficacy. What the information does counsel is that the biggest enterprises are starting to deal with AI visibility as a definite self-discipline requiring devoted technical consideration. Whether or not that spotlight finally proves well-directed is determined by selections that AI platforms themselves haven’t but made clear.

Timeline

1994: robots.txt launched as an off-the-cuff commonplace; adopted by Lycos, AltaVista, and Google as a de facto internet crawler management mechanism
2011: W3C launches JSON-LD Group Group; Google, Microsoft, and Yahoo! launch Schema.org
2014: JSON-LD 1.0 receives W3C Suggestion standing
2019: Google explicitly states JSON-LD is its most popular structured knowledge format
2020: JSON-LD 1.1 receives W3C Suggestion standing
2022: IETF publishes RFC 9309, formalizing robots.txt as an official web commonplace; Microsoft launches a robots.txt tester tool for Bing
June 2024: Cloudflare publishes analysis of AI bot activity, discovering AI bots accessed roughly 39% of the highest a million web properties
2024: llms.txt specification revealed; Anthropic requests Mintlify to implement llms.txt help
September 2024: Cloudflare introduces AI Audit tools for writer content material administration
December 2024: Cloudflare launches Robotcop to implement robots.txt insurance policies at community degree
December 2025: OpenAI announces ChatGPT-Consumer will not comply with robots.txt for user-initiated shopping
December 2025: Google updates JavaScript SEO documentation together with interplay between nosnippet directive and AI-powered search options
February 2026: OtterlyAI 90-day experiment finds llms.txt supplies no significant impression on AI crawler conduct; Ahrefs research finds AI Overviews now correlate with 58% discount in natural click-through charges
March 2025: Google outlines pathway for robots.txt protocol to evolve for rising AI use circumstances
March 31, 2026: ProGEO.ai publishes “Signaling the Shift to Generative Engine Optimization (GEO),” measuring Fortune 500 adoption charges of robots.txt, JSON-LD, and llms.txt

Abstract

Who: ProGEO.ai, a San Francisco-based data-driven generative engine optimization company, revealed the analysis. The report was authored by Clinton Karr, CMO of ProGEO.ai, who has 20 years of background in company communications and content material advertising.

What: A report titled “Signaling the Shift to Generative Engine Optimization (GEO)” measured adoption charges of three technical protocols – robots.txt, JSON-LD, and llms.txt – throughout all 500 corporations on the Fortune 500 listing. Key findings embody: 92.8% of Fortune 500 corporations have robots.txt, 53.8% have JSON-LD, and seven.4% have llms.txt. Solely 11% of Fortune 500 corporations identify an AI person agent in robots.txt, and fewer than 1% have carried out all three indicators.

When: The analysis was performed in March 2026, with the report revealed and introduced on March 31, 2026.

The place: ProGEO.ai scanned all Fortune 500 firm web sites utilizing a Python-based HTTP consumer. The analysis lined homepage, robots.txt, and llms.txt recordsdata, with inside web page sampling performed for structured knowledge evaluation. The announcement was issued from San Francisco.

Why: The examine addresses the rising hole between conventional search engine marketing infrastructure and the necessities of AI-powered search platforms. As generative AI platforms more and more reply queries immediately – with out sending customers to exterior web sites – manufacturers face a query of find out how to preserve visibility inside AI-generated responses. ProGEO.ai positioned the analysis as a baseline measurement of GEO maturity among the many largest US corporations, enabling enterprises to benchmark their very own technical readiness towards the Fortune 500 cohort.

Source link

Only 7.4% of Fortune 500 have an llms.txt file, study finds

A brand new optimization self-discipline takes form

Half one: robots.txt at 30, nonetheless not constructed for AI

Half two: JSON-LD – widespread adoption, shallow implementation

Half three: llms.txt – early adopters, contested efficacy

A protocol with out consensus

Timeline

Abstract

[email protected]

Leave a Reply Cancel reply

News publishers are posting at the wrong time

This open-source app turned GitHub into my favorite Android app store

Press ESC to close

A brand new optimization self-discipline takes form

Half one: robots.txt at 30, nonetheless not constructed for AI

Half two: JSON-LD – widespread adoption, shallow implementation

Half three: llms.txt – early adopters, contested efficacy

A protocol with out consensus

Timeline

Abstract

Share Article:

Utilmate – 100+ Screens Flutter Mobile App Template

NeauroTalk – AI Chatbot Flutter App UI Template

Leave a Reply Cancel reply