Gracenote, the content material intelligence enterprise unit of Nielsen, this week revealed research exhibiting {that a} main massive language mannequin, working with out entry to exterior information, fabricated each measured attribute for practically one in 5 of the two,600 film and TV titles it was examined towards. The research is among the largest structured comparisons but performed between grounded and ungrounded AI responses within the leisure sector.

The report, titled “Plot Holes in AI: Why Ungrounded LLMs Cannot Repair Content material Discovery,” was launched on June 10, 2026. It covers titles from 13 international locations and evaluates six core metadata attributes: title, description, actors, genres, launch 12 months, and runtime. These are the fields streaming companies mostly show when presenting a film or TV present to a viewer – the data an individual makes use of to resolve whether or not to look at.

What the research examined and the way

Gracenote used two cases of Claude Sonnet 4.0 to generate responses. Each shoppers acquired equivalent directions on the way to discover info; the distinction was the information supply every was permitted to make use of. The ungrounded shopper drew completely from its coaching information. The grounded shopper queried Gracenote’s international video dataset through a Video MCP Server – an implementation of the Mannequin Context Protocol that connects an LLM to Gracenote’s repeatedly up to date leisure data graph.

The research sourced the highest 100 films and one episode from the highest 100 TV reveals in every market, utilizing a grounded Gemini Professional 3.1 shopper to compile the checklist in March 2026. The 13 international locations examined have been Australia, Brazil, Canada, France, Germany, Japan, Mexico, the Netherlands, South Korea, Spain, Sweden, the UK, and the USA. Responses have been scored on attribute-level accuracy and a composite factual high quality metric, producing 4 bands: zero high quality, low high quality, medium high quality, and prime quality.

In keeping with Gracenote, lower than one-third of all ungrounded LLM responses throughout the two,600 titles certified as prime quality. In Mexico and the Netherlands, fewer than 10% did. For the USA particularly, practically half of all responses scored at low or zero high quality.

The hallucination depend: 506 titles

The headline determine is stark. In keeping with Gracenote, the ungrounded mannequin hallucinated all measured metadata for 506 of the two,600 titles examined – that’s 19.5% of the whole pattern. These weren’t circumstances of partial error or minor factual slip; the mannequin generated solely fabricated content material for each measured subject.

The country-level breakdown reveals vital variation. The Netherlands produced the best charge of complete hallucinations, at 28.3% of its titles – 56 in another country’s 200-title pattern. Germany had the bottom, at 9.7%, with 19 titles totally hallucinated. Australia noticed 52 titles with 100% hallucinated info, representing 26.5% of its pattern. The US determine sat at 21.5%, overlaying 43 titles.

These numbers matter as a result of streaming companies are starting to combine LLMs into their search and advice interfaces. The query is just not whether or not AI shall be used for content material discovery – that transition is already beneath means – however whether or not the fashions powering these interfaces may have dependable info to work with.

Forged accuracy: 53% for prime U.S. films

Actor matching produced a few of the lowest accuracy scores within the research. In keeping with Gracenote, the ungrounded mannequin appropriately matched main actors for less than 53% of the highest 100 U.S. films in comparison towards the grounded information. For the broader U.S. title set, the match charge was 56%. The bottom actor match charge throughout all 13 markets was 34%, recorded within the Netherlands. The very best was 71%, in South Korea.

Style accuracy was typically larger. Match charges ranged from 73% in Spain to 86% in the UK. Even so, the figures characterize a significant error charge for any system anticipated to assist viewers navigate a catalog.

The discrepancy between actor and style accuracy displays a structural distinction in how LLMs course of these classes. Style labels are usually broader and extra secure; the mannequin might establish a movie as a thriller or a comedy with cheap chance even when it has pulled the unsuitable content material. Forged particulars are particular and individuated – a unsuitable actor is just unsuitable, with no partial credit score obtainable.

Why comparable titles trigger failures

In keeping with Gracenote, recency is just not at all times the first supply of error. Titles with comparable names proved equally disruptive.

The report paperwork a case involving the 2025 thriller “Heel,” a few couple who kidnap a 19-year-old prison. The ungrounded mannequin matched the title and 12 months appropriately, however returned an outline, solid, and style drawn from “Heels,” a drama sequence that ran on Starz from 2021 to 2023. The composite accuracy rating for the Heel response was 50%; the factual evaluation rating was 10%. The actors attributed to the 2025 movie – Stephen Amell, Alexander Ludwig, and Alison Luff – are solid members from the Starz sequence. The precise solid of the 2025 film included Stephen Graham, Andrea Riseborough, and Anson Boon.

A second instance concerned the 2024 horror-thriller “Trucker,” directed by Errol Sack. The movie was launched 16 years after a James Mottern-directed film of the identical identify from 2008. The ungrounded mannequin returned the 12 months appropriately however offered an outline, solid, style classification, and runtime derived from the 2008 movie. Its solid response named Michelle Monaghan, Nathan Fillion, and Benjamin Bratt – none of whom seem within the 2024 title. The precise solid comprised Katherine Gibson, Dare Taylor, and Chuck Cirino. The composite accuracy was 35%; the factual evaluation 20%.

These examples illustrate a selected failure mode: the mannequin acknowledges the title however selects the unsuitable model of the content material from its chance distribution. As a result of LLMs synthesize responses moderately than retrieving discrete information, they don’t have any native mechanism to differentiate between two movies sharing a reputation throughout a 16-year hole.

Latest releases expose coaching cutoff limits

The ungrounded mannequin’s blind spots prolonged to movies launched near or after its coaching cutoff. In keeping with Gracenote, the mannequin was unable to offer any details about “GOAT,” a 2026 animated movie starring Caleb McLaughlin and Gabrielle Union that earned practically 200 million {dollars} globally earlier than arriving on Netflix on Could 14, 2026. The mannequin’s express response when queried in regards to the movie was: “I haven’t got dependable details about a U.S. film titled ‘GOAT’ from 2026.”

Different titles the mannequin couldn’t handle included the 2026 Chris Pratt thriller “Mercy,” the 2026 Rachel McAdams-led horror movie “Ship Assist,” the 2025 Iranian drama “It Was Simply an Accident,” the 2025 German science fiction movie “Good Luck, Have Enjoyable, Do not Die,” and the 2025 Canadian horror film “Whistle.”

The structural rationalization, in line with Gracenote, is rooted in how frontier fashions are constructed and deployed. A brand new mannequin’s coaching cutoff isn’t the date it’s launched. Information curation, deduplication, artificial information era, security alignment, and post-training procedures introduce a multi-month lag between the most recent information a mannequin was uncovered to and the second it turns into obtainable to customers. After launch, the identical mannequin might stay in manufacturing deployment for months longer. In keeping with Gracenote, at minimal a six-month delay must be assumed earlier than any not too long ago revealed content material can affect a frontier mannequin’s coaching weights.

Hallucination charges tracked carefully with content material age in English-speaking international locations. In keeping with Gracenote, for titles launched earlier than 2025, hallucination charges in Australia, Canada, the UK, and the USA ranged from 11% to 23%. For titles from 2025, these charges climbed significantly throughout most markets. For 2026 titles, charges reached 96% in South Korea, 95% within the Netherlands, and 86% in Sweden.

Non-English markets confirmed a distinct sample. Hallucination charges for older titles have been larger in international locations the place these languages are much less represented within the foundational coaching information. Spain, as an example, confirmed a hallucination charge of 70% even for pre-2025 titles, in contrast with 12% for the USA.

High quality rating breakdown

Gracenote utilized a four-tier high quality scoring system to the entire output for every of the two,600 titles: zero high quality, low high quality, medium high quality, and prime quality. The scores mixed attribute matching accuracy with an general factual evaluation.

Throughout all markets, the aggregated zero, low, and medium high quality outcomes ranged from 77% to 91% of responses, that means genuinely high-quality outputs have been a minority in all places within the research. The US breakdown confirmed 37% of responses at zero high quality, 11.5% at low high quality, 24% at medium high quality, and 27.5% at prime quality.

Australia positioned 25.5% of responses within the zero high quality band and 23% within the prime quality band. The UK carried out comparatively higher on zero high quality at 18.4%, with 27.6% reaching prime quality. Brazil and Spain registered a few of the lowest high-quality charges at 18.2% and 14.9% respectively. Mexico and the Netherlands have been the weakest general performers, with high-quality outcomes at 8.7% and seven.6%.

These figures matter particularly for streaming platforms deploying conversational search or advice interfaces. A viewer who asks an AI assistant whether or not a specific actor seems in a given movie, or what style it belongs to, has a significant chance of receiving an incorrect reply from an ungrounded system.

The MCP Server because the proposed repair

Gracenote’s research can also be, explicitly, a case for its personal product. The corporate launched its Video MCP Server in September 2025, which PPC Land covered at the time. The server connects an LLM to Gracenote’s leisure data graph, which covers 40 million titles throughout 260 streaming catalogs in 70 languages and 80 international locations, permitting the mannequin to retrieve factual content material information moderately than generate probabilistic guesses from coaching alone.

The grounding method used within the research is identical structure Gracenote is providing commercially. In keeping with Gracenote, platforms can entry it both by way of direct information licensing agreements or by connecting their LLMs to the Video MCP Server.

The business stakes round this place turned clearer in February 2026. On February 10, Gracenote renewed its multi-year strategic partnership with Google to offer leisure metadata for Google’s merchandise, together with AI and Gemini experiences. On February 25, Gracenote signed an agreement with Samsung to energy LLM-enabled search, conversational content material discovery, and lean-back curation throughout Samsung’s international good TV platform. Each offers place Gracenote’s structured content material information because the grounding layer for AI-powered leisure interfaces at scale.

That context predates an earlier Gracenote discovering. A report revealed on April 8, 2026, primarily based on a survey of greater than 4,000 U.S. customers performed in January and February, discovered that three in four Americans verify AI chatbot answersearlier than performing on them for TV, film, or sports activities content material. The June 10 accuracy research offers a quantified rationalization for that skepticism.

“Viewers do not care the place a foul reply comes from. If it is unsuitable, they blame the service,” stated Tyler Bell, senior vice chairman of product at Gracenote. “That is why grounding issues. For corporations constructing the following era of leisure discovery, generative AI will solely ship on its promise when it’s grounded in verified content material intelligence that replaces believable guesses with correct details – lowering friction, deepening engagement and strengthening loyalty.”

The connection to promoting is direct. Gracenote has been constructing out a content material focusing on infrastructure alongside its AI grounding work. In December 2025, it launched Content Connect, a platform enabling businesses, manufacturers, SSPs, and DSPs to execute program-level CTV advert focusing on utilizing standardized present metadata. In September 2025, Index Exchange became the first SSP to embed Gracenote contextual intelligence immediately into its platform. A report revealed by Gracenote on Could 14, 2026 and lined by PPC Land discovered that 86% of U.S. media planners would shift linear TV budgets to CTV if show-level information have been obtainable to them.

When AI interfaces grow to be the first entry level right into a streaming catalog – and the early adoption alerts recommend that transition is accelerating – the accuracy of the metadata these interfaces return determines not simply the viewer expertise however the business efficiency of the content material itself. A hallucinated solid checklist or a unsuitable style classification impacts not solely whether or not a viewer finds the best movie, however whether or not advertisers can attain the supposed viewers in the best context. In keeping with a 2026 Gracenote survey, 66% of customers imagine AI shall be essential in offering good leisure experiences.

The MCP server format that Gracenote makes use of on this research has grow to be a acknowledged infrastructure layer throughout promoting expertise extra broadly. PPC Land has tracked Amazon’s MCP Server rolloutFreeWheel’s MCP deployment for premium video, and Meta’s personal AI connectors as a part of a broader shift towards agent-accessible promoting infrastructure.

In keeping with a separate 2025 Gracenote international streaming client survey, 84% of streaming video viewers say the general person expertise of a streaming service is essential to them. One other 2025 Gracenote survey discovered that solely 64% of streaming video viewers know what they need to watch once they activate their TVs, putting huge weight on what the search and discovery layer returns.

Findings scheduled for StreamTV Present presentation

Gracenote will current the report’s findings on the StreamTV Present on June 18, 2026, in Denver. Nandita Arora, senior director of product at Gracenote, will be part of a panel session titled “Reimagining Content material Discovery,” which is able to handle how AI, personalization, unified search, and new person expertise approaches are shaping how streaming companies join viewers with content material.

Timeline

  • September 3, 2025: Gracenote launches its Video MCP Server, connecting LLMs to its leisure data graph to mitigate hallucinations in TV platform search
  • September 16, 2025: Index Exchange becomes the first SSP to combine Gracenote contextual intelligence, embedding model security segments and Do-Not-Air controls
  • October 2025: Gracenote publishes research highlighting the contextual focusing on hole in CTV promoting, primarily based on a survey of 600 U.S. model and company executives
  • December 4, 2025: Nielsen’s Gracenote launches Content Connect, enabling businesses, manufacturers, SSPs, and DSPs to execute program-level CTV advert focusing on
  • December 17, 2025: Gracenote expands its On Sports platform, linking sports activities documentaries and shoulder content material throughout 160 leagues in additional than 50 international locations
  • February 10, 2026: Gracenote renews its multi-year strategic partnership with Google to assist leisure metadata for AI and Gemini experiences
  • February 25, 2026: Gracenote and Samsung Electronics announce an agreement to energy LLM-enabled search and discovery throughout Samsung’s international good TV platform
  • March 2026: Gracenote makes use of a grounded Gemini Professional 3.1 shopper to compile the checklist of prime titles throughout 13 markets for the hallucination research
  • April 8, 2026: Gracenote publishes “TV Search and Discovery within the AI Period,” discovering three in four U.S. consumers verify AI chatbot answers about leisure content material
  • Could 14, 2026: Gracenote publishes “TV Audiences Have Shifted. Advert {Dollars} Have Not,” with 86% of U.S. media planners saying they might shift linear budgets to CTV with show-level information
  • June 10, 2026: Gracenote releases “Plot Holes in AI: Why Ungrounded LLMs Cannot Repair Content material Discovery,” documenting 19.5% full hallucination charges throughout 2,600 titles in 13 international locations
  • June 18, 2026: Nandita Arora, senior director of product at Gracenote, is scheduled to current findings on the StreamTV Present in Denver

Abstract

Who: Gracenote, the content material intelligence enterprise unit of Nielsen, led by Tyler Bell, senior vice chairman of product. Nandita Arora, senior director of product, will current findings publicly on June 18, 2026.

What: A research evaluating the accuracy of an ungrounded massive language mannequin with one grounded in Gracenote’s leisure information through an MCP server, throughout 2,600 film and TV titles in 13 international locations. The ungrounded mannequin hallucinated all measured metadata for 506 titles – 19.5% of the whole. For the highest 100 U.S. films, actor accuracy stood at 53%. Lower than one-third of all responses throughout the research certified as prime quality.

When: The report was revealed on June 10, 2026. The title checklist was compiled in March 2026. The StreamTV Present presentation is scheduled for June 18, 2026, in Denver.

The place: The research lined 13 international locations: Australia, Brazil, Canada, France, Germany, Japan, Mexico, the Netherlands, South Korea, Spain, Sweden, the UK, and the USA. The grounded shopper accessed Gracenote’s international video information by way of an MCP server. The ungrounded shopper was restricted to coaching information solely.

Why: Streaming companies are deploying LLMs for conversational search and content material suggestions, however ungrounded fashions don’t have dependable entry to present or correct leisure metadata. The research quantifies the accuracy hole and argues that grounding – connecting LLMs to verified exterior information sources – is a prerequisite for dependable AI-powered content material discovery at scale.


Source link