A compact reference for figuring out the foremost AI bots in manufacturing server logs has circulated on LinkedIn, bringing into one place the person agent strings that now decide who will get entry to your content material – and who will get blocked.
What the put up truly reveals
Aquib Israr, an unbiased search engine optimisation and GEO advisor, shared yesterday on LinkedIn a technical put up itemizing the total person agent (UA) strings for six main internet crawlers, formatted with operator URLs as they seem in manufacturing HTTP request headers. The record covers OpenAI’s GPTBot, OAI-SearchBot, and ChatGPT-Consumer; Anthropic’s ClaudeBot and Claude-SearchBot; PerplexityBot; and Microsoft’s Bingbot. Israr describes himself as working throughout EdTech, FinTech, SaaS, and publishing verticals, with purchasers in MENA, Australia, and India.
The put up will not be an official announcement from any of those firms. It’s a practitioner-compiled reference, the type that circulates when a technical job – configuring robots.txt guidelines, writing server-side enable lists, auditing log information – requires exact string values that may in any other case take time to trace down from a number of firm documentation pages. That it attracted consideration displays the place the business is correct now: web site homeowners are making high-stakes selections about AI crawler entry, and people selections depend upon with the ability to accurately determine who’s knocking.
The strings, in line with the put up, are as follows.
GPTBot (coaching): Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; appropriate; GPTBot/1.2; +https://openai.com/gptbot)
OAI-SearchBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; appropriate; OAISearchBot/1.0; +https://openai.com/searchbot)
ChatGPT-Consumer: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; appropriate; ChatGPT-Consumer/1.0; +https://openai.com/bot)
ClaudeBot: Mozilla/5.0 (appropriate; ClaudeBot/1.0; [email protected])
Claude-SearchBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; appropriate; Claude-SearchBot/1.0; +https://www.anthropic.com/claudesearchbot)
PerplexityBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; appropriate; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Bingbot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; appropriate; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
Why the format particulars matter
Consumer agent strings look uniform at a look however carry significant structural variations between operators. The GPTBot, OAI-SearchBot, and ChatGPT-Consumer strings all comply with a sample constructed on the Mozilla/5.0 and AppleWebKit/537.36 base – a conference shared with most fashionable browsers – earlier than declaring compatibility and their particular bot identifier. The AppleWebKit/537.36 reference is a legacy artifact; it doesn’t imply these crawlers use the WebKit rendering engine. It exists as a result of many server-side guidelines have been traditionally written to permit requests with these tokens.
ClaudeBot departs from that conference noticeably. Its string doesn’t embrace AppleWebKit or a Chrome-like rendering base. It declares compatibility instantly: Mozilla/5.0 (appropriate; ClaudeBot/1.0; [email protected]). The operator contact is an electronic mail deal with slightly than a URL – a special conference from OpenAI, which hyperlinks to a devoted documentation web page for every bot. Claude-SearchBot, in contrast, does use the AppleWebKit base and hyperlinks to the Anthropic web page at https://www.anthropic.com/claudesearchbot.
Bingbot is essentially the most elaborated string on the record. It appends a Chrome model identifier – Chrome/116.0.1938.76 Safari/537.36 – that doesn’t seem within the OpenAI or Anthropic strings. This implies server-side filters matching on “Chrome” within the person agent would catch Bingbot however not GPTBot, and filters matching on “bingbot/2.0” would isolate it from the AI mannequin coaching crawlers fully. These structural variations are what make a compiled reference helpful; pattern-matching throughout crawlers in log evaluation instruments or internet utility firewalls requires figuring out how the strings truly differ.
Three OpenAI bots, three totally different jobs
The presence of three distinct OpenAI entries on the record displays a crawling infrastructure that has been documented in growing element over the previous yr. GPTBot, model 1.2 within the present string, is the coaching crawler. It collects content material which may be used to construct or refine OpenAI’s generative fashions. OAI-SearchBot operates otherwise: it runs when ChatGPT performs an online search indirectly triggered by a person, retrieving dwell content material to floor responses. ChatGPT-Consumer executes when an individual explicitly asks ChatGPT to entry or work together with a particular web page.
The distinction matters operationally. When OpenAI revised its crawler documentation in December 2025, it eliminated robots.txt compliance language for ChatGPT-Consumer, positioning it as a proxy for person looking slightly than an autonomous crawler – a framing that justified exempting it from robots.txt controls. A web site that blocks OAI-SearchBot in its robots.txt should still obtain ChatGPT-Consumer requests with none recourse by means of the usual exclusion mechanism. GPTBot and OAI-SearchBot stay topic to robots.txt beneath present documentation. An analysis of more than 7 billion server log entries printed in April 2026 discovered that OAI-SearchBot exercise tripled following the GPT-5 launch in August 2025, registering a 3.5x enhance, whereas GPTBot expanded 2.9x. ChatGPT-Consumer occasions, conversely, dropped 28% from December 2025 by means of mid-March 2026.
Two Anthropic bots with very totally different attain
Anthropic clarified the roles of its three crawlers in February 2026. ClaudeBot, per that documentation, collects internet content material that would doubtlessly contribute to AI mannequin coaching. Claude-SearchBot navigates the net to enhance search outcome high quality. A 3rd crawler, Claude-Consumer, handles real-time person queries however doesn’t seem in Israr’s record, presumably as a result of its string differs from the others in ways in which make it much less related to the particular use case of manufacturing testing and robots.txt configuration.
The structural distinction between ClaudeBot and Claude-SearchBot within the strings is price analyzing. ClaudeBot makes use of an electronic mail contact ([email protected]) slightly than a documentation URL. Claude-SearchBot makes use of an online URL (+https://www.anthropic.com/claudesearchbot). For web site homeowners making an attempt to confirm crawler authenticity past the person agent string, the verification mechanisms differ accordingly. In keeping with Anthropic’s up to date documentation, its bots respect robots.txt directives and the corporate won’t try to bypass CAPTCHAs. That dedication has not been with out controversy – Reddit’s lawsuit towards Anthropic, filed in June 2025, alleged that ClaudeBot-identified site visitors accessed its platform greater than 100,000 occasions after Anthropic publicly acknowledged it had stopped crawling the positioning.
A sensible limitation applies to each Anthropic strings. As documented by Cloudflare researchers and covered at PPC Land, no verification protocol existed for ClaudeBot on the time of writing, that means the string may be spoofed with no cryptographic verify. A server seeing a ClaudeBot person agent can not affirm, utilizing commonplace mechanisms, that the request truly originated from Anthropic’s infrastructure. This isn’t distinctive to Anthropic – it applies broadly to the ecosystem – however it’s a hole price understanding when writing enable lists based mostly solely on person agent strings.
Bingbot in a altering context
Bingbot’s inclusion alongside AI coaching crawlers displays a structural shift in how Microsoft’s crawler is perceived. Bingbot is a standard search crawler: it feeds the Bing search index and, by extension, Bing’s AI-powered search options, together with these now built-in with Copilot. Its person agent string, uniquely amongst these within the put up, appends a particular Chrome model: Chrome/116.0.1938.76 and Safari/537.36. That Chrome model – 116 – was launched in August 2023, which suggests the string has not been up to date to replicate a more moderen Chrome model within the interval since. Whether or not this issues in apply will depend on whether or not the server-side rule matching for it’s version-sensitive or makes use of a broader substring match.
The robots.txt conduct for Bingbot is ruled by commonplace internet conventions: it reads and respects disallow guidelines. In contrast to ChatGPT-Consumer, Bingbot has not been documented as bypassing robots.txt for any class of request. This locations it in a special class from the AI-specific crawlers, even when they share server log house and present up collectively in analytics dashboards.
The verification drawback behind each string
A person agent string is a self-declaration. Nothing within the HTTP protocol requires a crawler to ship an correct one, and research documented by PPC Land in January 2026 confirmed that some AI brokers have been utilizing spoofed person agent strings to bypass blocks, with a single question to xAI’s Grok triggering 16 requests from 12 IP addresses impersonating human browsers. Information from DataDome discovered that 80% of AI brokers don’t declare themselves correctly when visiting web sites.
Because of this, practitioners utilizing the strings in Israr’s put up for something past log evaluation or robots.txt configuration ought to pair them with IP vary verification. Google maintains daily-updated JSON information containing its crawlers’ IP ranges, documented by PPC Land when Google shifted from weekly to daily refresh cycles. OpenAI publishes an identical record. Anthropic’s verification infrastructure has been much less constantly documented, which is the hole that the Google-Agent crawler announcement in March 2026 additionally highlighted – Google was exploring a web-bot-auth protocol that will use cryptographic signatures to confirm bot id, a mechanism that doesn’t but have widespread adoption throughout the business.
For many manufacturing configurations, the workflow is: match the person agent string to determine the crawler kind, then affirm the originating IP towards the printed IP vary file from the related operator. Counting on the string alone is adequate for log evaluation however inadequate for security-critical entry selections.
What the robots.txt penalties truly are
The strings in Israr’s record are most instantly actionable for robots.txt configuration. A web site that wishes to dam OpenAI’s coaching crawler however stay seen in ChatGPT search outcomes would disallow GPTBot whereas leaving OAI-SearchBot unrestricted. A web site that wishes to dam all mannequin coaching from each OpenAI and Anthropic would disallow each GPTBot and ClaudeBot. A web site that wishes to decide out of AI-powered search outcomes throughout platforms would disallow OAI-SearchBot and Claude-SearchBot as well as.
Research published in early 2026 by Rutgers Business School and The Wharton School difficult the opt-out calculus: publishers who blocked AI crawlers by way of robots.txt noticed whole site visitors fall by 23%, a discovering that displays the dependency between content material blocking and AI-powered discovery. Blocking the coaching crawler and the search crawler usually are not equal selections. The coaching block removes content material from mannequin datasets; the search block removes the positioning from AI-generated solutions. The excellence issues for publishers who care about each knowledge rights and viewers attain.
A study by ProGEO.ai published March 31, 2026, scanning all 500 firms on the Fortune 500 record, discovered that among the many 55 firms that had named any AI person agent of their robots.txt, essentially the most often focused was GPTBot, with 32 whole directives – and people directives leaned towards restriction. Directives aimed toward search crawlers, in contrast, skewed towards permitting entry. That sample displays the identical logic seen within the string record: totally different bots, totally different penalties, totally different insurance policies.
Scale of what these strings are monitoring
The quantity of requests these person agent strings seem in has reached ranges which can be virtually vital for server infrastructure. Kinsta’s analysis of more than 10 billion requests throughout its managed WordPress internet hosting infrastructure, printed in June 2026, discovered a ClaudeBot-identified crawler hitting add-to-cart URLs 3.75 million occasions in a single 24-hour interval – roughly one request each 23 milliseconds, sustained across the clock. A separate misbehaving crawler generated 550 million requests in a single calendar month. These usually are not background noise. They’re infrastructure occasions, and figuring out which person agent string generated them is step one in any response.
Cloudflare’s knowledge for the week ending June 5, 2026, confirmed bots accounting for 57.4% of all HTML internet site visitors. HUMAN Security’s May 2026 data found that the rate at which sites block agentic traffic climbed to nearly 9% – up from 8.2% the earlier month – at the same time as whole agentic site visitors quantity dipped 4.3% month over month. Towards that backdrop, a reference record of person agent strings will not be merely a comfort for search engine optimisation practitioners. It’s a precondition for any knowledgeable entry coverage.
Microsoft Clarity introduced a violations detection feature on June 23, 2026, giving publishers visibility into which bots are ignoring robots.txt directives and concentrating on which particular content material paths. The function requires CDN integration to operate, however as soon as related, it reveals violations as a share of whole bot requests – not only a uncooked rely – making it doable to differentiate between compliant crawl site visitors and lively coverage violations. That form of per-bot, per-path visibility will depend on the identical underlying knowledge because the strings in Israr’s put up: the person agent values the bots ship after they arrive.
Why this issues for advertising groups
For promoting and advertising professionals, the stakes of AI crawler identification lengthen past content material entry. As documented by PPC Land, OpenAI’s bots – together with ChatGPT-Consumer, OAI-SearchBot, GPTBot, and ChatGPT Agent – accounted for roughly 69% of all noticed AI-driven site visitors by quantity in 2026 knowledge from HUMAN Safety. Anthropic identities, together with ClaudeBot and Claude-SearchBot, made up roughly 11%. The alternatives a web site makes about which bots to permit form instantly which AI platforms can floor that web site’s content material in responses.
Google-Agent, added to Google’s official crawler documentation on March 20, 2026, formalized a brand new class: user-triggered fetchers that execute looking duties on behalf of customers. In contrast to autonomous crawlers, user-triggered fetchers bypass robots.txt, which suggests a web site that has blocked Google’s coaching and search crawlers can nonetheless obtain requests from Google-Agent when a person with AI looking enabled visits. That distinction – autonomous crawler versus user-triggered fetcher – doesn’t seem explicitly within the person agent strings Israr listed, however it’s the context wherein these strings are more and more deployed.
For a advertising crew managing content material technique in 2026, figuring out that OAI-SearchBot/1.0 and Claude-SearchBot/1.0 are the identifiers that decide AI search visibility, whereas GPTBot/1.2 and ClaudeBot/1.0 are those tied to coaching datasets, is foundational operational information. The person agent strings are the entry level to each different determination in that stack.
Timeline
- August 2023 – OpenAI launches GPTBot; 5% of prime 1,000 web sites block it at launch
- August 2024 – Blocking of GPTBot reaches 35.7% of top 1,000 websites, a seven-fold enhance
- September 2024 – Google revamps its public crawler and fetcher documentation
- November 2025 – Google launches devoted crawling infrastructure documentation web site; detailed by PPC Land
- December 2025 – OpenAI revises ChatGPT crawler documentation, eradicating robots.txt compliance language for ChatGPT-Consumer; covered by PPC Land
- December 2025 – Cloudflare year-in-review knowledge reveals AI bots at 4.2% of all HTML requests globally
- January 6, 2026 – Analysis paperwork AI brokers utilizing spoofed person brokers to bypass web site defenses; covered by PPC Land
- February 20, 2026 – Anthropic updates crawler documentation; PPC Land covers the change on February 25
- March 12, 2026 – Google engineers clarify the SaaS structure behind Googlebot; PPC Land coverage
- March 20, 2026 – Google-Agent added to official crawler documentation; PPC Land covers the addition
- March 31, 2026 – ProGEO.ai examine finds solely 7.4% of Fortune 500 firms have llms.txt; PPC Land coverage
- April 21, 2026 – OAI-AdsBot documented in OpenAI’s official crawler documentation; PPC Land coverage
- April 24, 2026 – Evaluation of seven billion OpenAI log occasions reveals 3.5x OAI-SearchBot surge after GPT-5; PPC Land coverage
- June 2026 – Kinsta knowledge finds ClaudeBot-identified crawler hitting WordPress cart pages 3.75 million occasions in 24 hours; PPC Land coverage
- June 4, 2026 – HUMAN Safety Might 2026 knowledge reveals web site blocking of agentic site visitors climbing to just about 9%; PPC Land coverage
- June 23, 2026 – Microsoft Readability launches robots.txt violations detection in its Bot Analytics dashboard; PPC Land coverage
- June 28, 2026 – Aquib Israr publishes full manufacturing person agent strings for six main AI and search crawlers on LinkedIn
Abstract
Who: Aquib Israr, an unbiased search engine optimisation and GEO advisor with over six years of expertise throughout EdTech, FinTech, SaaS, and publishing, serving purchasers in MENA, Australia, and India. The reference is related to site owners, search engine optimisation practitioners, builders, and advertising professionals managing content material entry insurance policies for AI crawlers.
What: A LinkedIn put up itemizing the entire person agent strings for seven crawlers – GPTBot/1.2, OAI-SearchBot/1.0, ChatGPT-Consumer/1.0, ClaudeBot/1.0, Claude-SearchBot/1.0, PerplexityBot/1.0, and bingbot/2.0 – formatted with their operator documentation URLs as they seem in manufacturing HTTP request headers.
When: The put up was printed right this moment, June 28, 2026.
The place: Revealed on LinkedIn. The strings are related to server log evaluation, robots.txt configuration, internet utility firewall guidelines, and bot analytics dashboards used globally by web site operators.
Why: Consumer agent strings are the first mechanism by which internet crawlers determine themselves to servers, and they’re the primary knowledge level in any determination to permit or block a crawler. With AI bot site visitors accounting for a rising share of internet requests – Cloudflare recorded bots at 57.4% of HTML site visitors in early June 2026 – and with totally different bots carrying totally different penalties for content material coaching, AI search visibility, and server load, correct identification is a prerequisite for knowledgeable entry coverage. The reference compiles strings from a number of firm documentation sources right into a single practitioner-ready format.
Source link

