Google yesterday launched the Open Data Format (OKF), a vendor-neutral, open specification constructed on plain markdown information and YAML frontmatter that formalizes how AI brokers retailer, share, and devour organizational data. Printed on June 12, 2026, the format is now accessible on GitHub alongside three pattern bundles and two reference implementations.
The announcement comes from Sam McVeety, Tech Lead for Knowledge Analytics at Google Cloud, and Amir Hormati, Tech Lead for BigQuery at Google Cloud, each writing on the Google Cloud Weblog. In keeping with Google, OKF v0.1 addresses a particular and protracted failure level in how organizations construct agentic methods: the dearth of any agreed-upon format for the data these brokers must operate.
The fragmented data drawback
Ask any engineering workforce how their AI brokers discover solutions to inner questions – how a selected metric is calculated, which database desk describes a given idea, or what the right be a part of path is between two methods – and the reply tends to be the identical. The data exists, scattered throughout wikis, code feedback, metadata catalog APIs, shared drives, and the heads of some senior engineers, however no single format connects them.
In keeping with Google, this fragmentation signifies that each agent builder is “fixing the identical context-assembly drawback from scratch, each catalog vendor is reinventing the identical information fashions, and the data itself is locked behind whichever floor created it.” The direct consequence for groups constructing agentic methods is that the brokers themselves can’t reliably reply questions on inner construction with out first retrieving context from a patchwork of incompatible sources.
OKF is positioned as a format-level reply to that drawback, not a brand new service, platform, or runtime. In keeping with the specification, the format is “deliberately minimal: a listing of markdown information with YAML frontmatter. There isn’t any schema registry, no central authority, and no required tooling. In case you can cat a file, you possibly can learn OKF; should you can git clone a repo, you possibly can ship it.”
What OKF v0.1 really specifies
The technical construction is intentionally easy. An OKF bundle is a listing tree of markdown information. Every file represents a idea – a single unit of data which will describe a tangible asset reminiscent of a database desk or API endpoint, or an summary concept reminiscent of a enterprise metric or an incident playbook. The idea’s file path throughout the bundle serves as its distinctive identifier: a file at tables/orders.md carries the idea ID tables/orders.
Each idea file has two elements. The primary is a YAML frontmatter block, delimited by --- on the high of the file and a closing --- by itself line. The second is a normal markdown physique containing free-form content material. In keeping with the specification, just one frontmatter area is strictly required: sort, a brief string figuring out the form of idea, with instance values together with BigQuery Desk, API Endpoint, Metric, and Playbook.
Extra advisable fields – listed in precedence order within the spec – are title, description, useful resource, tags, and timestamp. The useful resource area carries a URI that uniquely identifies the underlying asset, whereas timestamp makes use of ISO 8601 format for the datetime of final significant change. Producers might embrace any further keys past these; customers should tolerate unknown keys with out rejecting the doc.
Two reserved filenames carry outlined that means at any degree of the listing hierarchy. An index.md file, when current, enumerates the listing’s contents to assist progressive disclosure, letting a human or agent see what is out there earlier than opening particular person paperwork. A log.md file, when current, information the historical past of modifications to that scope in a flat record of date-grouped entries, latest first, utilizing ISO 8601 date headings in YYYY-MM-DD kind.
Ideas hyperlink to one another utilizing commonplace markdown hyperlinks. In keeping with the specification, a hyperlink from idea A to idea B “asserts a relationship. The precise form of relationship (father or mother/youngster, references, joins-with, depends-on, and so forth.) is conveyed by the encompassing prose, not by the hyperlink itself.” This design means customers constructing a graph view deal with all hyperlinks as directed edges of an untyped relationship. Damaged hyperlinks – hyperlinks whose goal doesn’t exist within the bundle – are explicitly permitted; they might characterize data not but written.
Conformance standards
In keeping with the specification, a bundle is conformant with OKF v0.1 if three circumstances are met: each non-reserved markdown file within the tree incorporates a parseable YAML frontmatter block; each frontmatter block incorporates a non-empty sort area; and each reserved filename, the place current, follows the construction described within the spec.
Shoppers should not reject a bundle due to lacking optionally available frontmatter fields, unknown sort values, unknown further keys, damaged cross-links, or lacking index information. In keeping with the specification, “this permissive consumption mannequin is intentional: OKF is supposed to stay helpful as bundles develop, get refactored, and are partially generated by brokers.”
The total v0.1 specification, together with conformance standards, cross-linking guidelines, and the set of reserved filenames, suits in a single web page spanning 451 traces and 14.7 kilobytes on GitHub. Model numbering follows a scheme: a minor model bump introduces backward-compatible additions reminiscent of new optionally available fields or new standard part headings; a significant model bump might introduce breaking modifications reminiscent of renaming required fields.
The LLM wiki sample behind the design
The mental lineage for OKF attracts explicitly on what Google describes because the “LLM wiki” sample – groups sustaining a shared markdown library that brokers can learn and replace over time, moderately than retrieving the identical information from scattered paperwork on each question.
In keeping with Google, Andrej Karpathy, the AI researcher and educator, articulates the core concept in a GitHub gist. As quoted by Google within the announcement, Karpathy writes: “LLMs do not get bored, remember to replace a cross-reference, and might contact 15 information in a single move.” The bookkeeping that causes people to desert private wikis is, in response to Google’s framing, precisely the form of work language fashions deal with effectively.
A number of naming conventions have emerged from this sample independently: the AGENTS.md and CLAUDE.md household of conference information that builders drop into repositories; Obsidian vaults wired to coding brokers; repos stuffed with index.md and log.md artifacts that brokers seek the advice of earlier than doing actual work; and “metadata as code” approaches that retailer catalog metadata alongside supply code. Every occasion is structurally related however bespoke, and in response to Google, none of them is “deliberately designed to cooperate.”
OKF differs from these patterns primarily in being formally specified – pinning down the small algorithm wanted for interoperability with out dictating tooling or content material fashions.
Three design rules
Google describes three express rules behind the format’s design. The primary is being minimally opinionated: the spec requires precisely one area (sort) of each idea, leaving all the pieces else – what varieties exist, what further fields to incorporate, what sections the physique has – to the producer. In keeping with Google, the specification “defines the interoperability floor, not the content material mannequin.”
The second precept is producer/shopper independence. A bundle hand-authored by a human might be consumed by an AI agent. A bundle generated by a metadata export pipeline might be browsed in a visualizer. A bundle synthesized by one language mannequin might be queried by one other. The format is described because the contract; the tooling at every finish is independently swappable.
The third precept is that OKF is a format, not a platform. In keeping with Google, the format “just isn’t tied to any particular cloud, database, mannequin supplier, or agent framework. It is going to by no means require a proprietary account or SDK to learn, write, or serve.”
Reference implementations and pattern bundles
Google is delivery two reference implementations alongside the specification. The primary is an enrichment agent that walks a BigQuery dataset, drafts an OKF idea doc for each desk and consider, then runs a second language mannequin move that crawls authoritative documentation and enriches every idea with citations, schemas, and be a part of paths. The second is a static HTML visualizer that turns any OKF bundle into an interactive graph view in a single self-contained file, with no backend, no set up on the viewing aspect, and no information leaving the web page.
Three ready-to-browse pattern bundles can be found within the repository: one for GA4 e-commerce information, one for the Stack Overflow public dataset, and one for the Bitcoin public dataset. All three had been produced by the reference enrichment agent and dedicated to the repository as dwelling examples of conformant OKF.
In keeping with Google, these are “proofs of idea, intentionally. The agent demonstrates one solution to produce OKF; nothing in regards to the format requires a particular agent framework or LLM. The visualizer demonstrates one solution to devour it; nothing in regards to the format requires HTML or a graph view.”
Alongside the GitHub launch, Google up to date its Cloud Data Catalog to have the ability to ingest Open Data Format and serve it to brokers.
What the spec explicitly doesn’t do
The specification attracts clear traces round what it doesn’t try and standardize. In keeping with the SPEC.md doc on GitHub, OKF just isn’t designed to outline a set taxonomy of idea varieties, prescribe storage or serving infrastructure, or substitute domain-specific schemas reminiscent of Avro, Protobuf, or OpenAPI. OKF references these schemas; it doesn’t subsume them.
This restraint is notable. Different information catalog and metadata administration initiatives have traditionally tried to standardize extra: vocabulary taxonomies, governance workflows, lineage codecs, and entry management fashions. OKF avoids all of that, treating these considerations because the producer’s drawback whereas standardizing solely the file-level construction that makes bundles interoperable.
Business context: agentic AI and the context drawback
The OKF announcement lands at a second when the promoting and advertising expertise business is actively grappling with how AI brokers entry inner data. PPC Land has tracked the rapid build-out of agentic infrastructure across the advertising technology sector since at least late 2025, when the Advert Context Protocol launched and the business started debating how brokers ought to uncover stock, execute campaigns, and talk throughout platforms with out bespoke integrations.
The underlying drawback OKF addresses – fragmented context that brokers should assemble from scattered, incompatible surfaces – maps immediately onto challenges the agentic advertising standards community has encountered. IAB Tech Lab’s AAMP initiative, formalized on February 26, 2026, consists of interoperability requirements designed partly to deal with how brokers share and devour info throughout organizational boundaries. OKF operates at a decrease and extra generic layer, not particularly within the promoting area, however the identical structural problem applies.
Meta, which published its own approach to agentic data warehouse access in August 2025, used a folder-structure-based strategy to characterize hierarchical information warehouse data as textual content appropriate with language fashions. In keeping with Meta’s engineers on the time, “LLMs talk via textual content so the hierarchical construction of knowledge warehouse properly mapped to a folder construction.” OKF formalizes that instinct right into a versioned, open specification.
Google Cloud’s own agentic AI trajectory has accelerated significantly since mid-2025, with the corporate having revealed a 54-page agentic AI framework doc in November 2025, launched the Google Analytics MCP server in July 2025 for pure language queries towards analytics information, and launched Ask Advisor at Google Advertising Dwell in Might 2026. OKF sits upstream of all these surface-layer instruments: it addresses what the brokers know earlier than they act, not how they act.
Why the format-first guess issues
The selection to publish OKF as a format moderately than a service has vital sensible penalties. A format might be adopted and not using a industrial relationship, and not using a pricing mannequin, and with out migration threat. It could actually dwell in a git repository alongside code. It may be learn by people in any textual content editor and parsed by any agent that may learn information. It may be diffed, versioned, and reviewed via commonplace developer tooling.
In keeping with Google, the analogy is to what Karpathy calls the “lingua franca” that permits data produced by one workforce to be consumed by one other – whether or not that data was written by a human, generated by an enrichment pipeline, or synthesized by a language mannequin. The format is designed to be the medium of alternate, impartial to the events on both aspect.
That can be the chance. Codecs succeed when sufficient events converse them. The specification is at present at v0.1, described explicitly as “a place to begin, not a completed commonplace.” It’s revealed on GitHub underneath Google’s GoogleCloudPlatformaccount, which gives credibility but in addition raises the query of how governance will evolve as exterior contributors suggest modifications.
The OKF repository reveals two pull requests, 12 points, and 5 top-level directories on the time of publication: brokers, okf, bundles, samples, and src. The spec itself is at okf/SPEC.md. The enrichment agent code is within the brokers listing. The format is versioned: the preliminary commit from Amir Hormati – the BigQuery Tech Lead credited as co-author – carries the commit hash ee67a5c with the message “Import Open Data Format reference enrichment agent.”
Sensible implications for information and advertising groups
For organizations working AI-powered analytics workflows – an outline that more and more applies to massive advertising and media shopping for operations – OKF raises a concrete query: the place is the organizational data that brokers want, and in what kind does it exist?
A media company working programmatic campaigns at scale might have data distributed throughout an information warehouse schema, a marketing campaign taxonomy documented in a shared spreadsheet, runbooks maintained in Confluence, and metric definitions embedded in dashboard configuration information. None of those surfaces is natively readable by an AI agent in a constant manner. OKF gives a path to consolidate them: export or creator idea paperwork for every data unit, set up them right into a bundle, hyperlink them, after which level brokers on the bundle moderately than on the authentic scattered surfaces.
The reference enrichment agent demonstrates one automation path for groups with BigQuery information: run the agent towards a dataset, get a first-pass OKF bundle with schemas, be a part of paths, and citations, then refine manually. In keeping with Google, “the format is the contribution. The instruments we have shipped exist to make it actual, and to decrease the price of making an attempt it out.”
Timeline
- Might 28, 2026 – OKF idea paperwork carry a pattern
timestampof2026-05-28T14:30:00Z, indicating the reference enrichment agent was run towards the pattern bundles on that date. - June 11, 2026 – Amir Hormati commits the reference enrichment agent to the
GoogleCloudPlatform/knowledge-catalogrepository with commit hashee67a5c(listed as “yesterday” relative to the June 12 publication date). - June 12, 2026 – Google at the moment publishes the Open Data Format (OKF) v0.1 specification on GitHub, alongside the weblog submit “Introducing the Open Data Format” authored by Sam McVeety and Amir Hormati on the Google Cloud Weblog. The format is revealed as open and vendor-neutral; Google concurrently updates Cloud Data Catalog to ingest OKF bundles.
Associated PPC Land protection:
Abstract
Who: Sam McVeety (Tech Lead, Knowledge Analytics, Google Cloud) and Amir Hormati (Tech Lead, BigQuery, Google Cloud) authored the announcement; the format is revealed by the Google Cloud Knowledge Cloud workforce underneath the GoogleCloudPlatform GitHub group.
What: The Open Data Format (OKF) v0.1 is an open, vendor-neutral specification for representing organizational data as a listing of markdown information with YAML frontmatter. It formalizes the “LLM wiki” sample into an interoperable format. The one required area per idea doc is sort. Two reference implementations ship with the spec: a BigQuery enrichment agent and a static HTML visualizer. Three pattern bundles for GA4 e-commerce, Stack Overflow, and Bitcoin public datasets are included.
When: Printed at the moment, June 12, 2026, on the Google Cloud Weblog. The commit introducing the reference enrichment agent was made the day earlier than, on June 11, 2026. The pattern bundles carry timestamps from Might 28, 2026.
The place: The specification, reference implementations, and pattern bundles can be found within the GoogleCloudPlatform/knowledge-catalog GitHub repository at okf/SPEC.md. Google’s Cloud Data Catalog has additionally been up to date to ingest OKF bundles natively.
Why: Organizations constructing agentic AI methods face a fragmented context panorama wherein the inner data that brokers want – desk schemas, metric definitions, be a part of paths, runbooks, deprecation notices – is scattered throughout metadata catalogs with proprietary APIs, wikis, shared drives, and code feedback. Each agent builder at present solves this context-assembly drawback independently. OKF makes an attempt to ascertain a standard format that any producer can write and any shopper can learn, with out requiring a proprietary runtime, SDK, or industrial relationship.
Source link


