- OpenAI’s o3 defeated Elon Musk’s Grok 4 at chess
- Magnus Carlsen delivered biting commentary on the standard of Grok’s logic
- Grok 4 made repeated blunders, whereas o3 performed regular
The AI chess match between OpenAI’s o3 mannequin and xAI’s Grok 4 invited loads of hypothesis as a form of proxy battle between the 2 corporations and their respective CEOs. Any comparability to the times of Deep Blue and Bobby Fischer quickly pale, although, as OpenAI o3 repeatedly worn out Grok 4, successful 4 video games in a row, accompanied by the derisive commentary of former world chess champion Magnus Carlsen and grandmaster David Howell.
The showdown occurred on Kaggle’s Recreation Enviornment, a digital coliseum the place AI fashions battle in chess and different video games. The match featured eight of essentially the most distinguished LLMs within the enterprise: OpenAI’s o3 and o4-mini, Google’s Gemini 2.5 Professional and Flash, Anthropic’s Claude Opus, Moonshot’s DeepSeek and Kimi, and xAI’s Grok 4. The ultimate got here right down to Grok and o3, however Grok’s efficiency within the last spherical did not appear to be a battle of champions.
Carlsen and Howell veered between serious commentary and a roast as Grok’s performance came off as somewhat erratic. In the first game, it quickly sacrificed its bishop, then began trading pieces like it was in a hurry to go home. Things didn’t improve in the next game for Grok.
“[Grok] is like that one guy in a club tournament who has learnt theory and literally knows nothing else,” Carlsen said during the second game. “Makes the worst blunders after that.”
Grok’s performance was so off-the-rails that Carlsen rated it around 800 ELO, or slightly above a beginner. He gave o3 a modest but respectable 1200, in the middle of most hobby players. Though o3 didn’t play brilliantly, it didn’t have to. It played solid chess. It didn’t blunder pieces. It converted its advantages and carried out the classic chess moves.
“o3 is fairly ruthless in conversions; it looks like a chess player. Grok looks like it learnt a few opening moves and knows the rules, but not much more.,” Carlsen said. “Grok’s moves are chess-related moves. They just came at the wrong time and in weird sequences.”
Chess AI
The chess wasn’t the main point of the tournament, despite its prominence. It was about how general-purpose AI models handle events with strict rules like chess games. Turns out, they’re not great, but o3 is the best of the limited sample. As AI becomes embedded in everything, the ability to follow rules and spot patterns becomes essential. Chess is a uniquely transparent way to observe that. You either made the right move or you didn’t. When a model plays well, you can see the logic; otherwise, queens fall like dominoes, and the game becomes as confused as that metaphor.
Chess is a window into how well an AI can plan, evaluate options, avoid catastrophic mistakes, and stay logically consistent. If Grok throws away a queen because it doesn’t grasp long-term consequences, what might it do in a legal document, or when booking travel?
That the final was between OpenAI and xAI did add some drama with Sam Altman and Elon Musk at loggerheads in public. The chess last didn’t resolve the battle between them, but it surely did give OpenAI a PR win within the realm of public notion, and a restricted however very actual praise from Magnus Carlsen.
You might also like
Source link