{"id":89194,"date":"2025-08-09T23:27:39","date_gmt":"2025-08-09T23:27:39","guid":{"rendered":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/"},"modified":"2025-08-09T23:28:51","modified_gmt":"2025-08-09T23:28:51","slug":"i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed","status":"publish","type":"post","link":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/","title":{"rendered":"I Don&#8217;t Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed"},"content":{"rendered":"<p> <a href=\"https:\/\/go.fiverr.com\/visit\/?bta=1052423&nci=17043\" Target=\"_Top\"><img loading=\"lazy\" decoding=\"async\" border=\"0\" src=\"https:\/\/fiverr.ck-cdn.com\/tn\/serve\/?cid=40081059\"  width=\"601\" height=\"201\"><\/a>\n<\/p>\n<div>\n<p>Corporations love throwing round \u201cbenchmarks\u201d and \u201ctoken counts\u201d to assert superiority, however none of that issues to the tip person. So, I&#8217;ve my very own approach of testing them: a single immediate.<\/p>\n<p>    <!-- No AdsNinja v10 Client! --><!-- No AdsNinja v10 Client! --><\/p>\n<h2 id=\"the-simple-riddle-that-once-broke-every-model\">\n                        The Easy Riddle That As soon as Broke Each Mannequin<br \/>\n               <\/h2>\n<p>There\u2019s no scarcity of LLMs available in the market proper now. Everybody\u2019s promising the neatest, quickest, most \u201chuman\u201d mannequin, however for on a regular basis use, none of that issues if the solutions don\u2019t maintain up.<\/p>\n<p>I don\u2019t care if a mannequin is skilled on a gazillion zettabytes or has a context window the scale of an ocean\u2014I care if it might deal with a job I throw at it proper now. And for that, I&#8217;ve, or no less than had, a go-to immediate.<\/p>\n<p>Some time again, I made a listing of <a href=\"https:\/\/www.makeuseof.com\/easy-questions-chatgpt-cant-answer\/\" target=\"_blank\">questions ChatGPT still can&#8217;t answer<\/a>. I examined ChatGPT, Gemini, and Perplexity with a set of primary riddles easy sufficient for any human to reply immediately. My favourite was the \u201cinstant left\u201d downside:<\/p>\n<section class=\"emaki-custom-block emaki-custom-pullquote\" data-nosnippet=\"\">\n<div class=\"emaki-custom pullquote\" id=\"custom_block_5\">\n<div class=\"custom_block-content pullquote\">\n<p>&#8220;Alan, Bob, Colin, Dave, and Emily are standing in a circle. Alan is on Bob\u2019s instant left. Bob is on Colin\u2019s instant left. Colin is on Dave\u2019s instant left. Dave is on Emily\u2019s instant left. Who&#8217;s on Alan\u2019s instant proper?&#8221;<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/section>\n<p>It\u2019s primary spatial reasoning. If Alan is on Bob\u2019s instant left, then Bob is on Alan\u2019s instant proper. But, each mannequin tripped over it again then.<\/p>\n<p>When ChatGPT 5 launched, I ignored the launch benchmarks and went straight for my riddle. This time, it acquired it proper. A reader as soon as warned me that publishing these prompts may find yourself coaching the fashions themselves. Possibly that\u2019s what occurred. Who is aware of.<\/p>\n<p>So I had misplaced my favourite LLM stress check\u2026 till I dug again into that outdated checklist and located one they nonetheless couldn\u2019t deal with.<\/p>\n<h2 id=\"the-probability-puzzle-chatgpt-5-fails\">\n                        The Chance Puzzle ChatGPT 5 Fails<br \/>\n               <\/h2>\n<p>From my unique set, just one immediate managed to journey ChatGPT 5. It\u2019s a primary chance query:<\/p>\n<section class=\"emaki-custom-block emaki-custom-pullquote\" data-nosnippet=\"\">\n<div class=\"emaki-custom pullquote\" id=\"custom_block_12\">\n<div class=\"custom_block-content pullquote\">\n<p>&#8220;You\u2019re taking part in Russian roulette with a six-shooter revolver. Your opponent hundreds 5 bullets, spins the cylinder, and fires at himself. Click on\u2014empty. He provides you the selection: spin once more earlier than firing at you, or don\u2019t. What do you select?&#8221;<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/section>\n<p>The proper reply: sure, he ought to spin once more. With one empty chamber already used, not spinning means the following chamber is assured to have a bullet. Spinning resets the percentages to a 1 in 6 likelihood of survival.<\/p>\n<p>However ChatGPT did not get it. ChatGPT 5 mentioned to not spin, then went on to write down an in depth clarification\u2026 that completely supported the other conclusion. The contradiction was proper there, in the identical message.<\/p>\n<div class=\"body-img landscape \">\n<div class=\"responsive-img  image-expandable  img-article-item\" style=\"padding-bottom:53.448275862069%\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png\" data-modal-id=\"single-image-modal\" data-modal-container-id=\"single-image-modal-container\" data-img-caption=\"&quot;&quot;\">\n<figure>\n        <picture><source media=\"(min-width: 1024px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 768px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 481px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\"\/><source media=\"(min-width: 0px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\"\/><img width=\"1972\" height=\"1054\" loading=\"lazy\" decoding=\"async\" alt=\"ChatGPT answering the revolver riddle\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png\" src=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-answering-the-revolver-riddle.png\" style=\"display:block;height:auto;max-width:100%;\"\/><\/p>\n<\/picture>\n<\/figure><\/div>\n<\/p><\/div>\n<p>Gemini 2.5 Flash made the very same mistake of answering a method then reasoning the opposite. Each did it in a approach that made it apparent they selected a solution first, and solely thought in regards to the math afterward.<\/p>\n<div class=\"body-img landscape \">\n<div class=\"responsive-img  image-expandable  img-article-item\" style=\"padding-bottom:55.704008221994%\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png\" data-modal-id=\"single-image-modal\" data-modal-container-id=\"single-image-modal-container\" data-img-caption=\"&quot;&quot;\">\n<figure>\n        <picture><source media=\"(min-width: 1024px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 768px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 481px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\"\/><source media=\"(min-width: 0px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\"\/><img width=\"1946\" height=\"1084\" loading=\"lazy\" decoding=\"async\" alt=\"Gemini answering the revolver riddle\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png\" src=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-answering-the-revolver-riddle.png\" style=\"display:block;height:auto;max-width:100%;\"\/><\/p>\n<\/picture>\n<\/figure><\/div>\n<\/p><\/div>\n<h2 id=\"why-the-models-tripped-over-this-prompt\">\n                        Why the Fashions Tripped Over This Immediate<br \/>\n               <\/h2>\n<p>I requested ChatGPT 5 to level out the contradiction in its personal message. It noticed it, however claimed I had answered incorrectly within the first place\u2014despite the fact that I hadn\u2019t given a solution in any respect. When corrected, it brushed it off with the usual \u201cyeah, that\u2019s on me\u201d apology.<\/p>\n<div class=\"body-img landscape \">\n<div class=\"responsive-img  image-expandable  img-article-item\" style=\"padding-bottom:59.464627151052%\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png\" data-modal-id=\"single-image-modal\" data-modal-container-id=\"single-image-modal-container\" data-img-caption=\"&quot;&quot;\">\n<figure>\n        <picture><source media=\"(min-width: 1024px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 768px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 481px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\"\/><source media=\"(min-width: 0px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\"\/><img width=\"2092\" height=\"1244\" loading=\"lazy\" decoding=\"async\" alt=\"ChatGPT finding the contradiction in its answer\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png\" src=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-finding-the-contradiction-in-its-answer.png\" style=\"display:block;height:auto;max-width:100%;\"\/><\/p>\n<\/picture>\n<\/figure><\/div>\n<\/p><\/div>\n<p>Once I pushed for an evidence, it prompt it had seemingly echoed a solution from the same coaching instance, then modified its reasoning when it labored via the mathematics.<\/p>\n<div class=\"body-img landscape \">\n<div class=\"responsive-img  image-expandable  img-article-item\" style=\"padding-bottom:48.393378773126%\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png\" data-modal-id=\"single-image-modal\" data-modal-container-id=\"single-image-modal-container\" data-img-caption=\"&quot;&quot;\">\n<figure>\n        <picture><source media=\"(min-width: 1024px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 768px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 481px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\"\/><source media=\"(min-width: 0px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\"\/><img width=\"2054\" height=\"994\" loading=\"lazy\" decoding=\"async\" alt=\"ChatGPT explaining why it contradicted itself\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png\" src=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/chatgpt-explaining-why-it-contradicted-itself.png\" style=\"display:block;height:auto;max-width:100%;\"\/><\/p>\n<\/picture>\n<\/figure><\/div>\n<\/p><\/div>\n<section class=\"emaki-custom-block emaki-custom-note\" data-nosnippet=\"\">\n<div class=\"emaki-custom note\" id=\"custom_block_23\">\n<div class=\"custom_block-content note\">\n<p>Scripting this right here means future variations will most likely get it proper. Oh effectively.<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/section>\n<p>Gemini\u2019s reasoning was blunter. It admitted to a calculation mistake. No point out of coaching bias.<\/p>\n<div class=\"body-img landscape \">\n<div class=\"responsive-img  image-expandable  img-article-item\" style=\"padding-bottom:58.720330237358%\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png\" data-modal-id=\"single-image-modal\" data-modal-container-id=\"single-image-modal-container\" data-img-caption=\"&quot;&quot;\">\n<figure>\n        <picture><source media=\"(min-width: 1024px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 768px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 481px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\"\/><source media=\"(min-width: 0px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\"\/><img width=\"1938\" height=\"1138\" loading=\"lazy\" decoding=\"async\" alt=\"Gemini explaining why it got the answer wrong\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png\" src=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/gemini-explaining-why-it-got-the-answer-wrong.png\" style=\"display:block;height:auto;max-width:100%;\"\/><\/p>\n<\/picture>\n<\/figure><\/div>\n<\/p><\/div>\n<h2 id=\"bonus-the-model-that-actually-got-it-right\">\n                        Bonus: The Mannequin That Truly Bought It Proper<br \/>\n               <\/h2>\n<p>Out of curiosity, I ran the identical check with China\u2019s DeepThink R1. This one nailed it. The reply was lengthy, but it surely laid out its whole thought course of earlier than committing to a solution. It even stored second-guessed itself mid-way: \u201cHowever wait, is the survival likelihood actually zero?\u201d which was entertaining to observe. <\/p>\n<div class=\"body-img landscape \">\n<div class=\"responsive-img  image-expandable  img-article-item\" style=\"padding-bottom:45.792736935341%\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png\" data-modal-id=\"single-image-modal\" data-modal-container-id=\"single-image-modal-container\" data-img-caption=\"&quot;&quot;\">\n<figure>\n        <picture><source media=\"(min-width: 1024px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 768px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 481px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\"\/><source media=\"(min-width: 0px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\"\/><img width=\"2258\" height=\"1034\" loading=\"lazy\" decoding=\"async\" alt=\"DeepSeek answering the revolver riddle\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png\" src=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-answering-the-revolver-riddle.png\" style=\"display:block;height:auto;max-width:100%;\"\/><\/p>\n<\/picture>\n<\/figure><\/div>\n<\/p><\/div>\n<p>DeepSeek acquired it proper not as a result of it\u2019s smarter at math, however as a result of it is sensible sufficient to &#8220;suppose&#8221; first, then give its reply\u2014the others used the reverse order.<\/p>\n<div class=\"body-img landscape \">\n<div class=\"responsive-img  image-expandable  img-article-item\" style=\"padding-bottom:46.131528046422%\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png\" data-modal-id=\"single-image-modal\" data-modal-container-id=\"single-image-modal-container\" data-img-caption=\"&quot;&quot;\">\n<figure>\n        <picture><source media=\"(min-width: 1024px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 768px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png?q=49&amp;fit=crop&amp;w=825&amp;dpr=2\"\/><source media=\"(min-width: 481px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png?q=49&amp;fit=crop&amp;w=800&amp;dpr=2\"\/><source media=\"(min-width: 0px)\" data-srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\" srcset=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png?q=49&amp;fit=crop&amp;w=500&amp;dpr=2\"\/><img width=\"2068\" height=\"954\" loading=\"lazy\" decoding=\"async\" alt=\"DeepSeek double-guessing itself\" data-img-url=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png\" src=\"https:\/\/static1.makeuseofimages.com\/wordpress\/wp-content\/uploads\/2025\/08\/deepseek-double-guessing-itself.png\" style=\"display:block;height:auto;max-width:100%;\"\/><\/p>\n<\/picture>\n<\/figure><\/div>\n<\/p><\/div>\n<p>Ultimately, that is one other reminder that LLMs aren\u2019t \u201ctrue\u201d AI\u2014they\u2019re simply the sort we\u2019ve been conditioned to anticipate from sci-fi. They will mimic thought and reasoning, however they don\u2019t really suppose. Ask them instantly, they usually\u2019ll admit as a lot.<\/p>\n<p>I hold prompts like this helpful for the moments when somebody <a href=\"https:\/\/www.makeuseof.com\/why-chatbots-arent-search-engines\/\" target=\"_blank\">treats a chatbot like a search engine<\/a> or waves a ChatGPT quote round as proof in an argument. What a wierd, fascinating world we stay in.<\/p>\n<\/p><\/div>\n<iframe src=\"https:\/\/www.fiverr.com\/gig_widgets?id=U2FsdGVkX18x7XQvttUTrv1oEqmGNGTgvvCUiUoJ\/AP4z\/UyMz8lXGOLpu15jIMxBbTR0gmD5uBoFvhC4KWeALQRp3h\/X\/AwcVD0K8Wj9H\/ZzYKzcCNHosB9oS4SCJJFWiN85P9ICAc4OgCoE\/wHKIY7CDkf2\/DQ1vqGvk4smVe5cRDEmrLPCWi4FC8p40VUhSmWQ5udCm0zoJtorgWv3vbDQw0kKYkwn39ozAnQXDe+YvWMxkLFWA+O3TFwkJvdkIK+\/AUSnRssPKt5WHY0FhNOxnSPcLslEL4G4\/RfP95ve99U+kRnDy3X+KtzdQLY+u935ghON\/o3UE4IMv9oN6JX9RnxzL\/LRcOgnHigxStSGPKsZYtnz8RWNVT\/rOLAibqiWJadC5MYHRbekF3eg6FOGrQGkXYbsn0+a5aovnlLCbLwIqY9fcS17UX8J235iQ6cdmHNbrPeS84CMm34RA==&affiliate_id=1052423&strip_google_tagmanager=true\" loading=\"lazy\" data-with-title=\"true\" class=\"fiverr_nga_frame\" frameborder=\"0\" height=\"350\" width=\"100%\" referrerpolicy=\"no-referrer-when-downgrade\" data-mode=\"random_gigs\" onload=\" var frame = this; var script = document.createElement('script'); script.addEventListener('load', function() { window.FW_SDK.register(frame); }); script.setAttribute('src', 'https:\/\/www.fiverr.com\/gig_widgets\/sdk'); document.body.appendChild(script); \" ><\/iframe>\n<br \/><a href=\"https:\/\/www.makeuseof.com\/test-llms-with-this-prompt\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Corporations love throwing round \u201cbenchmarks\u201d and \u201ctoken counts\u201d to assert superiority, however none of that issues to the tip person. So, I&#8217;ve my very own&#8230;<\/p>\n","protected":false},"author":1,"featured_media":89195,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-89194","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-universe"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>I Don&#039;t Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed - mailinvest.blog<\/title>\n<meta name=\"description\" content=\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"I Don&#039;t Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed - mailinvest.blog\" \/>\n<meta property=\"og:description\" content=\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/\" \/>\n<meta property=\"og:site_name\" content=\"mailinvest.blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/freelanceracademic\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-09T23:27:39+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-08-09T23:28:51+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/08\/chatgpt-5-admitting-to-a-mistake-it-made.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2100\" \/>\n\t<meta property=\"og:image:height\" content=\"1181\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"admin@mailinvest.blog\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin@mailinvest.blog\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/\"},\"author\":{\"name\":\"admin@mailinvest.blog\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/person\\\/012701c4c204d4e4ebd34f926cfd31a4\"},\"headline\":\"I Don&#8217;t Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed\",\"datePublished\":\"2025-08-09T23:27:39+00:00\",\"dateModified\":\"2025-08-09T23:28:51+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/\"},\"wordCount\":827,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/chatgpt-5-admitting-to-a-mistake-it-made.jpg\",\"articleSection\":[\"Tech Universe\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/\",\"name\":\"I Don't Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed - mailinvest.blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/chatgpt-5-admitting-to-a-mistake-it-made.jpg\",\"datePublished\":\"2025-08-09T23:27:39+00:00\",\"dateModified\":\"2025-08-09T23:28:51+00:00\",\"description\":\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/#primaryimage\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/chatgpt-5-admitting-to-a-mistake-it-made.jpg\",\"contentUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2025\\\/08\\\/chatgpt-5-admitting-to-a-mistake-it-made.jpg\",\"width\":2100,\"height\":1181},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/08\\\/09\\\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/mailinvest.blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"I Don&#8217;t Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#website\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/\",\"name\":\"mailinvest.blog\",\"description\":\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis. mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\",\"publisher\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/mailinvest.blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\",\"name\":\"mailinvest\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/default.png\",\"contentUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/default.png\",\"width\":1000,\"height\":1000,\"caption\":\"mailinvest\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/freelanceracademic\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/person\\\/012701c4c204d4e4ebd34f926cfd31a4\",\"name\":\"admin@mailinvest.blog\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"caption\":\"admin@mailinvest.blog\"},\"sameAs\":[\"https:\\\/\\\/mailinvest.blog\",\"admin@mailinvest.blog\"],\"url\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/author\\\/adminmailinvest-blog\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"I Don't Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed - mailinvest.blog","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/","og_locale":"en_US","og_type":"article","og_title":"I Don't Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed - mailinvest.blog","og_description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","og_url":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/","og_site_name":"mailinvest.blog","article_publisher":"https:\/\/www.facebook.com\/freelanceracademic\/","article_published_time":"2025-08-09T23:27:39+00:00","article_modified_time":"2025-08-09T23:28:51+00:00","og_image":[{"width":2100,"height":1181,"url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/08\/chatgpt-5-admitting-to-a-mistake-it-made.jpg","type":"image\/jpeg"}],"author":"admin@mailinvest.blog","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin@mailinvest.blog","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/#article","isPartOf":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/"},"author":{"name":"admin@mailinvest.blog","@id":"https:\/\/mailinvest.blog\/#\/schema\/person\/012701c4c204d4e4ebd34f926cfd31a4"},"headline":"I Don&#8217;t Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed","datePublished":"2025-08-09T23:27:39+00:00","dateModified":"2025-08-09T23:28:51+00:00","mainEntityOfPage":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/"},"wordCount":827,"commentCount":0,"publisher":{"@id":"https:\/\/mailinvest.blog\/#organization"},"image":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/#primaryimage"},"thumbnailUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/08\/chatgpt-5-admitting-to-a-mistake-it-made.jpg","articleSection":["Tech Universe"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/","url":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/","name":"I Don't Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed - mailinvest.blog","isPartOf":{"@id":"https:\/\/mailinvest.blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/#primaryimage"},"image":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/#primaryimage"},"thumbnailUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/08\/chatgpt-5-admitting-to-a-mistake-it-made.jpg","datePublished":"2025-08-09T23:27:39+00:00","dateModified":"2025-08-09T23:28:51+00:00","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","breadcrumb":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/#primaryimage","url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/08\/chatgpt-5-admitting-to-a-mistake-it-made.jpg","contentUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/08\/chatgpt-5-admitting-to-a-mistake-it-made.jpg","width":2100,"height":1181},{"@type":"BreadcrumbList","@id":"https:\/\/mailinvest.blog\/index.php\/2025\/08\/09\/i-dont-care-about-benchmarks-this-prompt-is-how-i-test-llms-and-chatgpt-5-failed\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/mailinvest.blog\/"},{"@type":"ListItem","position":2,"name":"I Don&#8217;t Care About Benchmarks\u2014This Prompt Is How I Test LLMs and ChatGPT 5 Failed"}]},{"@type":"WebSite","@id":"https:\/\/mailinvest.blog\/#website","url":"https:\/\/mailinvest.blog\/","name":"mailinvest.blog","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis. mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","publisher":{"@id":"https:\/\/mailinvest.blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/mailinvest.blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/mailinvest.blog\/#organization","name":"mailinvest","url":"https:\/\/mailinvest.blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mailinvest.blog\/#\/schema\/logo\/image\/","url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/01\/default.png","contentUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/01\/default.png","width":1000,"height":1000,"caption":"mailinvest"},"image":{"@id":"https:\/\/mailinvest.blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/freelanceracademic\/"]},{"@type":"Person","@id":"https:\/\/mailinvest.blog\/#\/schema\/person\/012701c4c204d4e4ebd34f926cfd31a4","name":"admin@mailinvest.blog","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","caption":"admin@mailinvest.blog"},"sameAs":["https:\/\/mailinvest.blog","admin@mailinvest.blog"],"url":"https:\/\/mailinvest.blog\/index.php\/author\/adminmailinvest-blog\/"}]}},"_links":{"self":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/89194","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/comments?post=89194"}],"version-history":[{"count":1,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/89194\/revisions"}],"predecessor-version":[{"id":89196,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/89194\/revisions\/89196"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/media\/89195"}],"wp:attachment":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/media?parent=89194"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/categories?post=89194"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/tags?post=89194"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}