OpenAI model finds counterexample to 80-year-old Erdős conjecture in collaboration with mathematicians

An OpenAI model just disproved an 80-year-old math conjecture. Not assisted. Not suggested a direction. Found the actual counterexample that broke it.

That sentence deserves more attention than it’s getting.

The conjecture comes from Paul Erdős, one of the most prolific and influential mathematicians of the 20th century. Researchers Alex Wei, Hongxun Wu, and Thomas Bloom were working with an OpenAI model when it surfaced a counterexample that had eluded human mathematicians for eight decades. They shared the full story on the OpenAI Podcast, and I’ve been thinking about it ever since.

Why Math Is Different

We throw around the word “reasoning” a lot in AI circles. Models reason about code. Models reason about documents. But math is the domain where that claim is actually falsifiable. There is no partial credit. No convincing-but-wrong answer that slips through because it sounds plausible. A counterexample either works or it doesn’t. The math community has had eight decades to find this one, and a model found it.

That is not the same category of thing as autocomplete.

What the Collaboration Actually Looked Like

From what Wei, Wu, and Bloom described on the podcast, this wasn’t the model operating alone. It was a collaboration, with mathematicians steering the problem space and the model doing something that looks more like combinatorial search at a scale humans can’t sustain. That framing matters. The model didn’t wake up and decide to do number theory. But it also didn’t just run a brute-force script someone handed it. It was generating and evaluating candidate structures in a way that contributed meaningfully to the result.

I want to be careful not to overclaim here, but I also don’t want to underclaim. Dismissing this as “just a tool” misses what’s actually happening. The model’s contribution was the find. The mathematicians provided the framework, the verification, and the judgment. That’s a genuine collaboration, not a ghost-writing arrangement.

The Broader Pattern Is Hard to Ignore

This doesn’t sit in isolation. Anthropic’s internal data, published this week, shows Claude achieving a 52x speedup on AI training code optimization, up from roughly 3x in May 2024. Their engineers are shipping 8 times as much code per quarter compared to the 2021-2025 baseline. On open-ended coding problems, Claude’s success rate jumped 50 percentage points in six months, sitting at 76% now. Anthropic has also published data showing that Mythos Preview improved on human researchers’ next-step decisions 64% of the time, up from 22% in 2024.

These are not incremental numbers. Something is accelerating.

Anthropic has been direct about what this might mean. Their internal data shows Claude is already accelerating AI development itself, which they describe as a possible path toward recursive self-improvement. They note it’s not guaranteed, and they’re right to hedge. But they’re also right to say it deserves greater attention than it’s currently getting.

What I Actually Think

The Erdős result is a proof of concept for something the AI field has been speculating about for years: models contributing to original knowledge, not just organizing existing knowledge. Code generation is impressive and economically significant. This is scientifically significant in a different way.

The people who should be paying closest attention are researchers in any field that involves exhaustive search over a large combinatorial space. Conjectures that have resisted human effort for decades, not because humans aren’t smart enough to verify a solution, but because the search space is too large to navigate manually. That’s exactly where this class of model is going to keep showing up.

I’ve spent a lot of time this year watching AI move from “useful for drafts” to “useful for design” to “useful for production code.” The Erdős counterexample tells me the next frontier is genuinely unsolved problems. Not busywork. Problems that actually matter.

The mathematicians still had to ask the right question. That part isn’t going away. But the gap between asking the question and finding the answer just got a lot smaller.

Sources & Further Reading

#AI #MachineLearning #Mathematics #OpenAI #ArtificialIntelligence

OpenAI model finds counterexample to 80-year-old Erdős conjecture in collaboration with mathematicians

Sources & Further Reading

Netflix releases VOID, an AI tool that removes objects from video and corrects physics post-removal

Tesla AI5 chip tape-out and what vertical silicon integration means for the AI hardware race

Google Gemini 2.5 Pro tops coding benchmarks and delivers usable 1M token context window

Tesla posts video of car driving itself through LA with no human input, signaling a shift in real-world autonomous AI capability

Claude Code multi-agent architecture and what proper configuration actually unlocks for solo developers

LLMs write plausible code not correct code, and what that distinction means for engineers in production

Leave a Reply Cancel reply

Sources & Further Reading

Similar Posts

Leave a Reply Cancel reply