Gemini 3 Flash is smart — but when it doesn’t know, it makes stuff up anyway

Gemini 3 Flash typically invents solutions as a substitute of admitting when it doesn’t know one thing
The issue arises with factual or excessive‑stakes questions
However it nonetheless checks as probably the most correct and succesful AI mannequin

Gemini 3 Flash is quick and intelligent. However in case you ask it one thing it doesn’t really know – one thing obscure or difficult or simply exterior its coaching – it is going to nearly all the time attempt to bluff its manner via, in response to a latest analysis from the impartial testing group Synthetic Evaluation.

It appears Gemini 3 Flash hit 91% on the “hallucination charge” portion of the AA-Omniscience benchmark. Which means when it didn’t have the reply, it nonetheless gave one anyway, nearly on a regular basis, one which was totally fictional.

Google Gemini 3 Flash AI doesn’t try this very nicely. That is what the check is for: seeing whether or not a mannequin can differentiate precise data from a guess.

Lest the quantity distract from actuality, it must be famous that Gemini’s excessive hallucination charge doesn’t imply 91% of its whole solutions are false. As a substitute, it implies that in conditions the place the proper reply could be “I don’t know,” it fabricated a solution 91% of the time. That’s a delicate however essential distinction, however one which has real-world implications, particularly as Gemini is built-in into extra merchandise like Google Search.

Okay, it is not solely me. Gemini 3 Flash has a 91% hallucination charge on the Synthetic Evaluation Omniscience Hallucination Charge benchmark!?Are you able to really use this for something severe?I ponder if the rationale Anthropic fashions are so good at coding is that they hallucinate a lot… https://t.co/b3CZbX9pHw pic.twitter.com/uZnF8KKZD4December 18, 2025

This consequence does not diminish the facility and utility of Gemini 3. The mannequin stays the highest-performing in general-purpose checks and ranks alongside, and even forward of, the newest variations of ChatGPT and Claude. It simply errs on the aspect of confidence when it must be modest.

The overconfidence in answering crops up with Gemini’s rivals as nicely. What makes Gemini’s quantity stand out is how typically it occurs in these uncertainty eventualities, the place there’s merely no right reply within the coaching information or no definitive public supply to level to.

Part of the issue is simply that generative AI models are largely word-prediction tools, and predicting a new word is not the same as evaluating truth. And that means the default behavior is to come up with a new word, even when saying “I don’t know” would be more honest.

OpenAI has started addressing this and getting its models to recognize what they don’t know and say so clearly. It’s a tough thing to train, because reward models don’t typically value a blank response over a confident (but wrong) one. Still, OpenAI has made it a goal for the development of future models.

And Gemini does usually cite sources when it can. But even then, it doesn’t always pause when it should. That wouldn’t matter much if Gemini were just a research model, but as Gemini becomes the voice behind many Google features, being confidently wrong could affect quite a lot.

chatbot context. However it’s most likely higher than being misled. Generative AI nonetheless is not all the time dependable, however double-checking any AI response is all the time a good suggestion.

Follow TechRadar on Google News and add us as a preferred source to get our knowledgeable information, evaluations, and opinion in your feeds. Be sure that to click on the Comply with button!

And naturally you may also follow TechRadar on TikTok for information, evaluations, unboxings in video kind, and get common updates from us on WhatsApp too.

Source link

Gemini 3 Flash is smart — but when it doesn’t know, it makes stuff up anyway

[email protected]

Leave a Reply Cancel reply

Jeff Green’s $150M stock bet and the AI future he’s betting on

HandyCare UI template | Home Cleaning Service app in Flutter | Getcleaner App Template

Meta deploys AI and law enforcement to fight scams across Facebook, WhatsApp

Press ESC to close

Share Article:

Candy Match Saga HTML5 GAME

Black Hole (.capx)

Leave a Reply Cancel reply