{"id":73859,"date":"2025-04-21T03:30:57","date_gmt":"2025-04-21T03:30:57","guid":{"rendered":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/"},"modified":"2025-04-21T03:32:02","modified_gmt":"2025-04-21T03:32:02","slug":"openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied","status":"publish","type":"post","link":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/","title":{"rendered":"OpenAI&#8217;s o3 AI model scores lower on a benchmark than the company initially implied"},"content":{"rendered":"<p> <a href=\"https:\/\/go.fiverr.com\/visit\/?bta=1052423&nci=17043\" Target=\"_Top\"><img loading=\"lazy\" decoding=\"async\" border=\"0\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/fiverr.ck-cdn.com\/tn\/serve\/?cid=40081059\"  width=\"601\" height=\"201\"><\/a>\n<br \/><img decoding=\"async\" src=\"https:\/\/mailinvest.blog\/wp-content\/themes\/breek\/assets\/images\/transparent.gif\" data-lazy=\"true\" data-src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/04\/GettyImages-2206295463.jpg?resize=1200,800\" \/><\/p>\n<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\">A discrepancy between first- and third-party benchmark outcomes for OpenAI\u2019s o3 AI mannequin is <a rel=\"nofollow\" href=\"https:\/\/www.reddit.com\/r\/singularity\/comments\/1k2lap5\/epoch_ai_has_released_o3_o4mini_gpt41_gpt41_mini\/\">raising questions about the company\u2019s transparency<\/a> and mannequin testing practices.<\/p>\n<p class=\"wp-block-paragraph\">When OpenAI <a href=\"https:\/\/techcrunch.com\/2024\/12\/20\/openai-announces-new-o3-model\/\">unveiled o3 in December<\/a>, the corporate claimed the mannequin may reply simply over  a fourth of questions on FrontierMath, a difficult set of math issues. That rating blew the competitors away \u2014 the next-best mannequin managed to reply solely round 2% of FrontierMath issues accurately.<\/p>\n<p class=\"wp-block-paragraph\">\u201cAs we speak, all choices on the market have lower than 2% [on FrontierMath],\u201d Mark Chen, chief analysis officer at OpenAI, <a rel=\"nofollow\" href=\"https:\/\/www.youtube.com\/watch?v=SKBG1sqdyIU\">said during a livestream<\/a>. \u201cWe\u2019re seeing [internally], with o3 in aggressive test-time compute settings, we\u2019re in a position to recover from 25%.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Because it seems, that determine was possible an higher certain, achieved by a model of o3 with extra computing behind it than the mannequin OpenAI publicly launched final week.<\/p>\n<p class=\"wp-block-paragraph\">Epoch AI, the analysis institute behind FrontierMath, launched outcomes of its impartial benchmark assessments of o3 on Friday. Epoch discovered that o3 scored round 10%, effectively beneath OpenAI\u2019s highest claimed rating.<\/p>\n<blockquote class=\"wp-block-quote twitter-tweet is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">OpenAI has launched o3, their extremely anticipated reasoning mannequin, together with o4-mini, a smaller and cheaper mannequin that succeeds o3-mini.<\/p>\n<p class=\"wp-block-paragraph\">We evaluated the brand new fashions on our suite of math and science benchmarks. Ends in thread! <a rel=\"nofollow\" href=\"https:\/\/t.co\/5gbtzkEy1B\">pic.twitter.com\/5gbtzkEy1B<\/a><\/p>\n<p class=\"wp-block-paragraph\">\u2014 Epoch AI (@EpochAIResearch) <a rel=\"nofollow\" href=\"https:\/\/twitter.com\/EpochAIResearch\/status\/1913379475468833146?ref_src=twsrc%5Etfw\">April 18, 2025<\/a><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">That doesn\u2019t imply OpenAI lied, per se. The benchmark outcomes the corporate revealed in December present a lower-bound rating that matches the rating Epoch noticed. Epoch additionally famous its testing setup possible differs from OpenAI\u2019s, and that it used an up to date launch of FrontierMath for its evaluations.<\/p>\n<p class=\"wp-block-paragraph\">\u201cThe distinction between our outcomes and OpenAI\u2019s is perhaps as a result of OpenAI evaluating with a extra highly effective inside scaffold, utilizing extra test-time [computing], or as a result of these outcomes had been run on a unique subset of FrontierMath (the 180 issues in frontiermath-2024-11-26 vs the 290 issues in frontiermath-2025-02-28-private),\u201d <a rel=\"nofollow\" href=\"https:\/\/epoch.ai\/gradient-updates\/how-much-energy-does-chatgpt-use\">wrote<\/a> Epoch.<\/p>\n<p class=\"wp-block-paragraph\"><a rel=\"nofollow\" href=\"https:\/\/x.com\/arcprize\/status\/1912567067024453926\">According to a post on X<\/a> from the ARC Prize Basis, a company that examined a pre-release model of o3, the general public o3 mannequin \u201cis a unique mannequin [\u2026] tuned for chat\/product use,\u201d corroborating Epoch\u2019s report.<\/p>\n<p class=\"wp-block-paragraph\">\u201cAll launched o3 compute tiers are smaller than the model we [benchmarked],\u201d wrote ARC Prize. Typically talking, greater compute tiers might be anticipated to attain higher benchmark scores.<\/p>\n<blockquote class=\"wp-block-quote twitter-tweet is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">Re-testing launched o3 on ARC-AGI-1 will take a day or two. As a result of right this moment\u2019s launch is a materially completely different system, we&#8217;re re-labeling our previous reported outcomes as \u201cpreview\u201d:<\/p>\n<p class=\"wp-block-paragraph\">o3-preview (low): 75.7%, $200\/job<br \/>o3-preview (excessive): 87.5%, $34.4k\/job<\/p>\n<p class=\"wp-block-paragraph\">Above makes use of o1 professional pricing\u2026<\/p>\n<p class=\"wp-block-paragraph\">\u2014 Mike Knoop (@mikeknoop) <a rel=\"nofollow\" href=\"https:\/\/twitter.com\/mikeknoop\/status\/1912606277257298415?ref_src=twsrc%5Etfw\">April 16, 2025<\/a><\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">OpenAI\u2019s personal Wenda Zhou, a member of the technical employees, <a rel=\"nofollow\" href=\"https:\/\/www.youtube.com\/watch?v=sq8GBPUb3rk\">said during a livestream last week<\/a> that the o3 in manufacturing is \u201cextra optimized for real-world use circumstances\u201d and pace versus the model of o3 demoed in December. Consequently, it could exhibit benchmark \u201cdisparities,\u201d he added.<\/p>\n<p class=\"wp-block-paragraph\">\u201c[W]e\u2019ve finished [optimizations] to make the [model] extra value environment friendly [and] extra helpful on the whole,\u201d Zhou mentioned. \u201cWe nonetheless hope that \u2014 we nonetheless suppose that \u2014 it is a a lot better mannequin [\u2026] You received\u2019t have to attend as lengthy whenever you\u2019re asking for a solution, which is an actual factor with these [types of] fashions.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Granted, the truth that the general public launch of o3 falls wanting OpenAI\u2019s testing guarantees is a little bit of a moot level, because the firm\u2019s o3-mini-high and o4-mini fashions outperform o3 on FrontierMath, and OpenAI plans to debut a extra highly effective o3 variant, o3-pro, within the coming weeks.<\/p>\n<p class=\"wp-block-paragraph\">It&#8217;s, nevertheless, one other reminder that AI benchmarks are finest not taken at face worth \u2014 notably when the supply is an organization with providers to promote.<\/p>\n<p class=\"wp-block-paragraph\">Benchmarking \u201ccontroversies\u201d have gotten a typical incidence within the AI business as distributors race to seize headlines and mindshare with new fashions.<\/p>\n<p class=\"wp-block-paragraph\">In January, Epoch was <a href=\"https:\/\/techcrunch.com\/2025\/01\/19\/ai-benchmarking-organization-criticized-for-waiting-to-disclose-funding-from-openai\/\">criticized<\/a> for ready to reveal funding from OpenAI till after the corporate introduced o3. Many teachers who contributed to FrontierMath weren\u2019t knowledgeable of OpenAI\u2019s involvement till it was made public.<\/p>\n<p class=\"wp-block-paragraph\">Extra lately, Elon Musk\u2019s xAI was <a href=\"https:\/\/techcrunch.com\/2025\/02\/22\/did-xai-lie-about-grok-3s-benchmarks\/\">accused<\/a> of publishing deceptive benchmark charts for its newest AI mannequin, Grok 3. Simply this month, Meta admitted to touting benchmark scores for a model of <a href=\"https:\/\/techcrunch.com\/2025\/04\/11\/metas-vanilla-maverick-ai-model-ranks-below-rivals-on-a-popular-chat-benchmark\/\">a model that differed from the one the company made available to developers<\/a>.<\/p>\n<p class=\"wp-block-paragraph\"><em>Up to date 4:21 p.m. Pacific: Added feedback from Wenda Zhou, a member of the OpenAI technical employees, from a livestream final week.<\/em><\/p>\n<\/div>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><br \/>\n<br \/><iframe data-lazy=\"true\" data-src=\"https:\/\/www.fiverr.com\/gig_widgets?id=U2FsdGVkX18x7XQvttUTrv1oEqmGNGTgvvCUiUoJ\/AP4z\/UyMz8lXGOLpu15jIMxBbTR0gmD5uBoFvhC4KWeALQRp3h\/X\/AwcVD0K8Wj9H\/ZzYKzcCNHosB9oS4SCJJFWiN85P9ICAc4OgCoE\/wHKIY7CDkf2\/DQ1vqGvk4smVe5cRDEmrLPCWi4FC8p40VUhSmWQ5udCm0zoJtorgWv3vbDQw0kKYkwn39ozAnQXDe+YvWMxkLFWA+O3TFwkJvdkIK+\/AUSnRssPKt5WHY0FhNOxnSPcLslEL4G4\/RfP95ve99U+kRnDy3X+KtzdQLY+u935ghON\/o3UE4IMv9oN6JX9RnxzL\/LRcOgnHigxStSGPKsZYtnz8RWNVT\/rOLAibqiWJadC5MYHRbekF3eg6FOGrQGkXYbsn0+a5aovnlLCbLwIqY9fcS17UX8J235iQ6cdmHNbrPeS84CMm34RA==&affiliate_id=1052423&strip_google_tagmanager=true\" loading=\"lazy\" data-with-title=\"true\" class=\"fiverr_nga_frame\" frameborder=\"0\" height=\"350\" width=\"100%\" referrerpolicy=\"no-referrer-when-downgrade\" data-mode=\"random_gigs\" onload=\" var frame = this; var script = document.createElement('script'); script.addEventListener('load', function() { window.FW_SDK.register(frame); }); script.setAttribute('src', 'https:\/\/www.fiverr.com\/gig_widgets\/sdk'); document.body.appendChild(script); \" ><\/iframe>\n<br \/><a href=\"https:\/\/techcrunch.com\/2025\/04\/20\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A discrepancy between first- and third-party benchmark outcomes for OpenAI\u2019s o3 AI mannequin is raising questions about the company\u2019s transparency and mannequin testing practices. When&#8230;<\/p>\n","protected":false},"author":1,"featured_media":73860,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[8309,97],"class_list":["post-73859","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-universe","tag-o3","tag-openai"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>OpenAI&#039;s o3 AI model scores lower on a benchmark than the company initially implied - mailinvest.blog<\/title>\n<meta name=\"description\" content=\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"OpenAI&#039;s o3 AI model scores lower on a benchmark than the company initially implied - mailinvest.blog\" \/>\n<meta property=\"og:description\" content=\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/\" \/>\n<meta property=\"og:site_name\" content=\"mailinvest.blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/freelanceracademic\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-04-21T03:30:57+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-21T03:32:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/04\/GettyImages-2206295463.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"800\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"admin@mailinvest.blog\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin@mailinvest.blog\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/\"},\"author\":{\"name\":\"admin@mailinvest.blog\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/person\\\/012701c4c204d4e4ebd34f926cfd31a4\"},\"headline\":\"OpenAI&#8217;s o3 AI model scores lower on a benchmark than the company initially implied\",\"datePublished\":\"2025-04-21T03:30:57+00:00\",\"dateModified\":\"2025-04-21T03:32:02+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/\"},\"wordCount\":787,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2025\\\/04\\\/GettyImages-2206295463.jpg\",\"keywords\":[\"o3\",\"OpenAI\"],\"articleSection\":[\"Tech Universe\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/\",\"name\":\"OpenAI's o3 AI model scores lower on a benchmark than the company initially implied - mailinvest.blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2025\\\/04\\\/GettyImages-2206295463.jpg\",\"datePublished\":\"2025-04-21T03:30:57+00:00\",\"dateModified\":\"2025-04-21T03:32:02+00:00\",\"description\":\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/#primaryimage\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2025\\\/04\\\/GettyImages-2206295463.jpg\",\"contentUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2025\\\/04\\\/GettyImages-2206295463.jpg\",\"width\":1200,\"height\":800},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/2025\\\/04\\\/21\\\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/mailinvest.blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"OpenAI&#8217;s o3 AI model scores lower on a benchmark than the company initially implied\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#website\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/\",\"name\":\"mailinvest.blog\",\"description\":\"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis. mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.\",\"publisher\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/mailinvest.blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#organization\",\"name\":\"mailinvest\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/default.png\",\"contentUrl\":\"https:\\\/\\\/mailinvest.blog\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/default.png\",\"width\":1000,\"height\":1000,\"caption\":\"mailinvest\"},\"image\":{\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/freelanceracademic\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/mailinvest.blog\\\/#\\\/schema\\\/person\\\/012701c4c204d4e4ebd34f926cfd31a4\",\"name\":\"admin@mailinvest.blog\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g\",\"caption\":\"admin@mailinvest.blog\"},\"sameAs\":[\"https:\\\/\\\/mailinvest.blog\",\"admin@mailinvest.blog\"],\"url\":\"https:\\\/\\\/mailinvest.blog\\\/index.php\\\/author\\\/adminmailinvest-blog\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"OpenAI's o3 AI model scores lower on a benchmark than the company initially implied - mailinvest.blog","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/","og_locale":"en_US","og_type":"article","og_title":"OpenAI's o3 AI model scores lower on a benchmark than the company initially implied - mailinvest.blog","og_description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","og_url":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/","og_site_name":"mailinvest.blog","article_publisher":"https:\/\/www.facebook.com\/freelanceracademic\/","article_published_time":"2025-04-21T03:30:57+00:00","article_modified_time":"2025-04-21T03:32:02+00:00","og_image":[{"width":1200,"height":800,"url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/04\/GettyImages-2206295463.jpg","type":"image\/jpeg"}],"author":"admin@mailinvest.blog","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin@mailinvest.blog","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/#article","isPartOf":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/"},"author":{"name":"admin@mailinvest.blog","@id":"https:\/\/mailinvest.blog\/#\/schema\/person\/012701c4c204d4e4ebd34f926cfd31a4"},"headline":"OpenAI&#8217;s o3 AI model scores lower on a benchmark than the company initially implied","datePublished":"2025-04-21T03:30:57+00:00","dateModified":"2025-04-21T03:32:02+00:00","mainEntityOfPage":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/"},"wordCount":787,"commentCount":0,"publisher":{"@id":"https:\/\/mailinvest.blog\/#organization"},"image":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/#primaryimage"},"thumbnailUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/04\/GettyImages-2206295463.jpg","keywords":["o3","OpenAI"],"articleSection":["Tech Universe"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/","url":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/","name":"OpenAI's o3 AI model scores lower on a benchmark than the company initially implied - mailinvest.blog","isPartOf":{"@id":"https:\/\/mailinvest.blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/#primaryimage"},"image":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/#primaryimage"},"thumbnailUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/04\/GettyImages-2206295463.jpg","datePublished":"2025-04-21T03:30:57+00:00","dateModified":"2025-04-21T03:32:02+00:00","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis.mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what's new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","breadcrumb":{"@id":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/#primaryimage","url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/04\/GettyImages-2206295463.jpg","contentUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2025\/04\/GettyImages-2206295463.jpg","width":1200,"height":800},{"@type":"BreadcrumbList","@id":"https:\/\/mailinvest.blog\/index.php\/2025\/04\/21\/openais-o3-ai-model-scores-lower-on-a-benchmark-than-the-company-initially-implied\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/mailinvest.blog\/"},{"@type":"ListItem","position":2,"name":"OpenAI&#8217;s o3 AI model scores lower on a benchmark than the company initially implied"}]},{"@type":"WebSite","@id":"https:\/\/mailinvest.blog\/#website","url":"https:\/\/mailinvest.blog\/","name":"mailinvest.blog","description":"Technology is forever changing, and there are always new pieces of technology to replace obsolete ones. Tons of people enjoy reading tech blogs on a daily basis. mailinvest.blog tracks all the latest consumer technology breakthroughs and shows you what&#039;s new, what matters and how technology can enrich your life. mailinvest.blog also provides the information, tools, and advice that helps when deciding what to buy.","publisher":{"@id":"https:\/\/mailinvest.blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/mailinvest.blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/mailinvest.blog\/#organization","name":"mailinvest","url":"https:\/\/mailinvest.blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mailinvest.blog\/#\/schema\/logo\/image\/","url":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/01\/default.png","contentUrl":"https:\/\/mailinvest.blog\/wp-content\/uploads\/2022\/01\/default.png","width":1000,"height":1000,"caption":"mailinvest"},"image":{"@id":"https:\/\/mailinvest.blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/freelanceracademic\/"]},{"@type":"Person","@id":"https:\/\/mailinvest.blog\/#\/schema\/person\/012701c4c204d4e4ebd34f926cfd31a4","name":"admin@mailinvest.blog","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/98ed217bd0f3d6a6dcae2d9b0c76e305b049a07275e315e1407e19ec8b08e139?s=96&d=mm&r=g","caption":"admin@mailinvest.blog"},"sameAs":["https:\/\/mailinvest.blog","admin@mailinvest.blog"],"url":"https:\/\/mailinvest.blog\/index.php\/author\/adminmailinvest-blog\/"}]}},"_links":{"self":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/73859","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/comments?post=73859"}],"version-history":[{"count":1,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/73859\/revisions"}],"predecessor-version":[{"id":73861,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/posts\/73859\/revisions\/73861"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/media\/73860"}],"wp:attachment":[{"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/media?parent=73859"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/categories?post=73859"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailinvest.blog\/index.php\/wp-json\/wp\/v2\/tags?post=73859"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}