Joost de Valk, the Dutch entrepreneur greatest generally known as the creator of Yoast search engine optimisation, yesterday revealed the Website Specification – a platform-agnostic reference doc that consolidates technical requirements for constructing a contemporary web site right into a single, brazenly licensed useful resource at specification.web site. The spec covers 128 subjects throughout 10 classes, with every entry assigned certainly one of 4 statuses: required, beneficial, non-compulsory, or keep away from.

Why this exists

The premise is straightforward. In response to de Valk in his LinkedIn announcement, the issue he stored working into was having to level at six totally different sources to again a single suggestion: “WHATWG for HTML. WCAG for accessibility. IETF for headers. schema.org for structured knowledge. MDN, net.dev, Google Search Central for all the pieces else.”

The result’s a reference that pulls these layers collectively. In response to the positioning’s about web page, the online is “a layer cake of requirements” – WHATWG defines HTML, W3C ratifies WCAG, the IETF publishes RFCs behind safety headers and /.well-known/ URIs, engines like google publish their very own guidelines, and browsers add their very own quirks. “Nearly no one carries the entire image,” the web page states. This spec is an try to hold it.

The audience is broad. In response to the about web page, the spec is meant for engineers auditing or constructing an internet site, designers and PMs making an attempt to scope high quality, and – particularly – AI brokers that want a machine-readable description of what to examine. That final class is each important and novel: the specification.web site itself is structured so an AI agent can question all the spec by way of an MCP server, a subject that entrepreneurs and publishers have been grappling with as AI crawlers consume a growing proportion of web traffic.

The ten classes and their scope

The specification is organized into 10 areas, and the breakdown by subject depend reveals the place de Valk positioned essentially the most weight.

Foundations covers 14 subjects – the HTML, head, and doc fundamentals that each web page wants. Required objects embody the HTML doctype declaration, the lang attribute on the html aspect, the meta charset declaration (which should seem within the first 1,024 bytes of HTML), the meta viewport tag, and the title aspect. In response to the spec, the title aspect is utilized by “browsers, engines like google, display screen readers, social previews, and AI brokers.” Beneficial objects on this class embody canonical URLs, Open Graph protocol tags, feed discovery, and the Popover API as a substitute for ARIA-dependent JavaScript modals.

search engine optimisation carries 13 subjects. Required objects are HTTP redirects (with a particular word to make use of 301 or 308 for everlasting strikes and by no means chain greater than vital), meta robots and the X-Robots-Tag, and heading hierarchy. In response to the spec, “delicate 404s” – pages that show a not-found message whereas returning HTTP 200 to a crawler – are listed as an “keep away from” merchandise: “engines like google deal with delicate 404s as a top quality downside and infrequently refuse to index them.” Beneficial subjects embody robots.txt, XML sitemaps, URL construction, inside linking, and JSON-LD structured knowledge. IndexNow is listed as non-compulsory, with a word that Google doesn’t take part in that protocol.

Accessibility is the most important single class at 20 subjects. Among the many required objects: color distinction, picture alt textual content, kind labels, keyboard navigation, seen focus indicators, semantic HTML and landmark parts, descriptive hyperlink textual content, accessible kind errors, doc language, diminished movement help, captions and transcripts, accessible knowledge tables, and contact goal measurement. WCAG 2.2 units the improved contact goal at 44 by 44 CSS pixels, with a minimal of 24 by 24. Accessibility overlays – third-party JavaScript widgets that declare to make a website WCAG-compliant at runtime – are listed below “keep away from.” In response to the spec, they “don’t work, usually hurt screen-reader customers, and appeal to lawsuits.”

Safety covers 12 subjects and requires HTTPS with TLS 1.2 or 1.3, HSTS with max-age, includeSubDomains, and preload (described as “an irreversible dedication”), X-Content material-Kind-Choices with nosniff, clickjacking safety by way of CSP frame-ancestors, and cookie attributes. In response to the spec, each cookie must be “Safe, HttpOnly the place potential, and have an express SameSite,” with the __Host- and __Secure- prefixes beneficial for session cookies. Content material Safety Coverage and Referrer-Coverage are listed as beneficial, as is /.well-known/safety.txt, which is described as low cost to publish and one thing that “dramatically lowers the bar for accountable disclosure.”

Properly-Identified URIs covers 9 subjects organized across the /.well-known/ path prefix, which was standardized in RFC 8615. Required: none particularly on this class. Beneficial objects embody /.well-known/api-catalog, outlined in RFC 9727, which publishes a machine-readable index of the APIs and sources a bunch exposes.

Agent Readiness at 18 subjects is arguably essentially the most forward-looking class, and the one which intersects most immediately with the discussions the marketing community has been having about AI crawler behavior. It contains /llms.txt as beneficial – described as “a proposed markdown file on the website root that offers LLMs a curated index of your most essential content material” and explicitly famous as “rising conference, not a ratified customary.” The /llms-full.txt companion file is listed as non-compulsory. Each the adoption challenges of llms.txt and the broader query of what indicators AI brokers truly use have been extensively debated.

Secure URLs are marked required on this class. “URLs are public contracts,” the spec states. “Breaking them invalidates citations, bookmarks, hyperlinks, and agent caches.”

Among the many non-compulsory objects are MCP and power discovery – described as “an rising approach for websites to reveal queryable instruments to brokers over JSON-RPC” – together with A2A agent playing cards, DNS for AI Discovery (DNS-AID) utilizing SVCB/HTTPS data, NLWeb for conversational interface discovery, WebMCP for browser-native agent instruments, and a site-specific conference known as Schemamap that indexes one JSON-LD endpoint per useful resource by way of /schemamap.xml.

Net Bot Auth, which makes use of RFC 9421 HTTP Message Signatures to let a bot cryptographically show its identification, is listed as non-compulsory. PPC Land has reported on the limited adoption of WebBotAuth among AI operators, with Google experimenting with the web-bot-auth protocol below the identification https://agent.bot.goog as noted in updated crawler documentation.

Efficiency spans 19 subjects and contains Core Net Vitals as required. The spec units the targets at LCP at or under 2.5 seconds, INP at or under 200 milliseconds, and CLS at or under 0.1, measured on the seventy fifth percentile of actual customers. Picture optimization (WebP and AVIF codecs, appropriate viewport sizing, express dimensions) and compression (brotli most well-liked, gzip as fallback) are additionally required. Cache-Management is required, with a particular beneficial sample: immutable plus max-age=31536000 for fingerprinted belongings and no-cache for HTML.

Beneficial efficiency objects embody lazy loading (with a particular warning: “by no means on the LCP aspect”), Hypothesis Guidelines, View Transitions, Again/Ahead Cache eligibility, HTTP/2 at minimal, HTTP/3 the place potential, and the No-Differ-Search response header – which tells browsers that some URL question parameters like UTM monitoring codes don’t change the response content material.

Privateness covers 6 subjects. Required objects are a privateness coverage and cookie consent, with the spec noting that “within the EU and UK, non-essential cookies require freely given, knowledgeable, particular, and unambiguous opt-in consent.” Beneficial objects embody World Privateness Management, which California and Colorado require websites to honour, and privacy-respecting analytics described as “combination, cookieless, EU-hosted analytics instruments.”

Resilience carries 5 subjects. Customized 404 and 500 error pages are required and should “return the proper HTTP standing code, clarify what went unsuitable in plain language, and provide the person a approach ahead with out leaking implementation particulars.”

Internationalisation covers 12 subjects. Computerized IP-based language redirects are listed as “keep away from” – they “entice customers within the unsuitable language, break search crawlers, and break shared hyperlinks.” Required objects embody the lang attribute on inline content material, and beneficial objects embody hreflang, a language switcher that lists every locale in its personal language, and RTL help for Arabic, Hebrew, Persian, and Urdu.

The MCP server

One technically important facet of the launch is the publication of an MCP server at mcp.specification.web site/mcp. In response to the spec’s MCP web page, the server makes use of the Streamable HTTP MCP 2025-03-26 protocol revision, is stateless, and requires no authentication.

The server exposes 5 instruments: search (returning ranked spec pages with title, standing, class, URL, and physique excerpts), list_topics (a filtered index by class or standing), get_topic (full canonical Markdown for one web page), get_checklist (a tickable Markdown guidelines grouped by class), and get_categories (the ten top-level classes with subject counts). There may be additionally a immediate – audit_url – that generates an audit plan for a goal URL in opposition to required spec objects, optionally narrowed to a single class corresponding to safety.

In response to the spec’s design notes, the MCP server and the web site each construct from the identical Markdown supply information, and the server bundles a JSON manifest at construct time, that means there isn’t any runtime parsing and no drift between what the positioning reveals and what the agent queries.

To attach from Claude Desktop, the configuration requires modifying the claude_desktop_config.json file and including a single JSON entry with the transport set to “http” and the URL set to https://mcp.specification.web site/mcp. In response to the spec, any MCP shopper talking the 2025-03-26 Streamable HTTP revision connects with that URL alone, with no shopper SDK to put in and no token to handle.

The discoverability structure mirrors what the spec itself recommends: a server card at /.well-known/mcp/server-card.json, a Hyperlink header with rel=”mcp” on each web page, an entry in /.well-known/api-catalog per RFC 9727, and cross-linking from the related spec web page. MCP as an infrastructure layer has seen speedy adoption throughout the advertising and marketing know-how stack all through 2025, with Google Analytics and marketing data platforms each delivery servers earlier within the 12 months.

How it’s constructed and licensed

The location is a static construct generated with Astro and deployed to Cloudflare Pages from GitHub. Content material lives in plain Markdown information below src/content material/spec/. In response to the about web page, “a web page with no supply hyperlink shouldn’t be merged” – sources are a part of the content material schema, not an non-compulsory annotation. The spec attracts on MDN Net Docs, the Yoast Developer Portal for search engine optimisation patterns, Equalize Digital for accessibility, WP Accessibility Information Base, and the requirements our bodies themselves: WHATWG, W3C, IETF, WCAG, and IANA.

Content material is licensed below CC BY 4.0 and code below MIT. In response to the about web page, the invitation is express: “use it, fork it, ship it.”

The location itself implements each merchandise within the spec. In response to the about web page, this features a strict Content material Safety Coverage served as a response header, the complete safety header set by way of Cloudflare Pages _headers, /.well-known/safety.txt, llms.txt and llms-full.txt for AI brokers, robots.txt, a sitemap index, an RSS feed, JSON-LD structured knowledge on each web page, Open Graph and Twitter Playing cards, and WCAG-aligned color distinction, focus indicators, a skip hyperlink, and semantic landmarks. If any contradiction seems between what the spec says and what the positioning does, the about web page is express: “that could be a bug.”

The timing connects to a number of pressures entrepreneurs and net professionals are navigating concurrently. AI brokers are more and more traversing the online independently of customers – retail AI crawlers now access sites 198 times per visit compared to Google’s ratio of one visit to six crawls. The infrastructure for a way websites sign content material permissions to these brokers is fragmented, with the IAB Tech Lab, IETF, and particular person platforms every proposing totally different mechanisms.

The query of what constitutes a well-formed website for this atmosphere has no single, authoritative, non-vendor reply. The Google Search Central documentation is platform-specific. The WCAG pointers cowl solely accessibility. The IETF RFCs cowl particular protocols. There was no single doc that maps all of it.

What the Web site Specification gives is a consolidated audit floor. The guidelines revealed at specification.web site/guidelines lists each merchandise in a flat, tickable format, print-friendly and arranged by class. A practitioner can work by means of Foundations (14 objects), search engine optimisation (13), Accessibility (20), Safety (12), Properly-Identified URIs (9), Agent Readiness (18), Efficiency (19), Privateness (6), Resilience (5), and Internationalisation (12) – arriving at a complete of 128 checkpoints drawn from major sources.

For digital businesses, the spec supplies a structured approach to outline high quality requirements throughout shopper work with out inventing these requirements from scratch. For in-house groups, it maps the hole between what a website presently does and what present requirements require – together with requirements which have emerged particularly round AI agent interplay, which aren’t but mirrored in most website auditing instruments.

Reactions

The LinkedIn publish asserting the launch drew important engagement inside hours. Chudi Nnorukam-Krisdiva, who described constructing agent tooling for AI operators, wrote that the agent readiness part was the place they stored getting caught when constructing a device known as AVR, noting that with no single spec to level at, they’d been “stitching them manually for citability.dev audits and the seams confirmed each time.” They described the hole as not whether or not websites ship llms.txt and JSON-LD – “most don’t” – however whether or not websites can inform if AI brokers are literally traversing these information. “The spec and the sign are two totally different issues,” they wrote.

Core Net Vitals marketing consultant Erwin Hofman raised a technical concern concerning the Interplay to Subsequent Paint recommendation within the spec, particularly round scheduler.postTask, and famous that preloading fonts carries a price on sluggish connections the place it may possibly change into render-blocking regardless of inlining crucial CSS.

Lily Ray, describing herself as founding father of Algorythmic and VP of search engine optimisation and AI Search at Amsive – and visual within the feedback – reacted with six likes, making her engagement among the many most seen from the search engine optimisation neighborhood.

Timeline

Abstract

Who: Joost de Valk, entrepreneur and creator of Yoast search engine optimisation, together with open-source contributors.

What: The Web site Specification – a platform-agnostic, MIT-licensed reference overlaying 128 technical subjects throughout 10 classes, together with an open MCP server that enables AI brokers to question the complete spec with out authentication.

When: Printed Might 30, 2026, with the LinkedIn announcement and website launch occurring on the identical day.

The place: Out there at specification.web site, with the MCP server at mcp.specification.web site/mcp and supply code on GitHub below a CC BY 4.0 content material licence and MIT code licence.

Why: To consolidate net requirements from WHATWG, W3C, IETF, WCAG, and search engine documentation right into a single, opinionated, source-cited reference that engineers, designers, PMs, and AI brokers can use to audit or construct websites – addressing the absence of any single platform-agnostic doc overlaying foundations, search engine optimisation, accessibility, safety, agent readiness, efficiency, privateness, resilience, and internationalisation collectively.


Share this text


The hyperlink has been copied!




Source link