DeepMind AI Co-Mathematician hits 48 percent on FrontierMath Tier 4
May 10, 2026
Google DeepMind unveiled a multi-agent system for open mathematical research on May 7, 2026. It solves 48 percent of the hardest FrontierMath tasks and helped an Oxford mathematician on a 60-year-old open problem.
Google DeepMind opens an AI research workshop for math
Google DeepMind unveiled the AI Co-Mathematician on May 7, 2026, a multi-agent system that works alongside human researchers on open problems in pure mathematics. The companion paper "AI Co-Mathematician: Accelerating Mathematicians with Agentic AI" was posted on arXiv on May 7, 2026.
What the system actually does
The AI Co-Mathematician is a stateful, asynchronous workspace. A "project coordinator" agent orchestrates a hierarchy of specialized agents that pursue different proof strategies in parallel, log failed hypotheses, and produce LaTeX drafts with margin notes and provenance information. The underlying model is Gemini 3.1 Pro.
Result on FrontierMath Tier 4
On Epoch AI's FrontierMath Tier 4 benchmark, the system solved 23 of 48 tasks, or 48 percent, according to the DeepMind blog post and a thread by Pushmeet Kohli on X. Epoch AI describes the benchmark as designed so that some of its problems may remain unsolved by AI for decades. Comparison numbers from the DeepMind release: Gemini 3.1 Pro alone scores 19 percent, GPT-5.5 Pro 39.6 percent, GPT-5.4 Pro 37.5 percent, and Claude Opus 4.7 and 4.6 score 22.9 percent.
Real research application
Marc Lackenby, a mathematician at the University of Oxford, used the system, according to DeepMind and the Oxford press notice, to resolve Problem 21.10 in the Kourovka Notebook, a long-running collection of open problems in group theory. A reviewer agent flagged a flaw in the AI's first proof attempt, and Lackenby then realized how to close the gap.
Why it matters
Mathematical research is one of the areas where language models have most often failed, because a single logical error invalidates the entire proof. The fact that a multi-agent setup turns a 19 percent base-model success rate into 48 percent shows how much potential lies in orchestration. For industry and government, the direct consequence is less dramatic than the headlines suggest: math stays human. But specific engineering tasks, such as verifying cryptographic constructions or optimizing discrete structures, could benefit measurably from systems like this.
In plain language
Picture a research team trying to crack a very hard puzzle. One person collects ideas, another checks them, a third writes everything up cleanly, a fourth hunts for mistakes. The AI Co-Mathematician is a digital team that plays exactly these roles and works alongside real mathematicians. When the team takes a wrong turn, it asks a human expert and keeps searching.
A practical example
A German research group at a technical university is investigating the security of a new post-quantum encryption scheme in 2026. Classically, a doctoral student would spend six months trying alone to prove a lower bound on the complexity of a lattice problem. With the AI Co-Mathematician, the work runs in parallel: three agent pairs explore different proof techniques, while a reviewer agent continuously checks lemmas. After eight weeks, a draft proof is ready, complete with a LaTeX version and a record of dead ends. The student manually verifies the result, submits it to a conference like CRYPTO, and saves four to five months of working time.
Scope and limits
First, FrontierMath is a synthetic benchmark. Scoring 48 percent there does not mean the system performs equally well on real research mathematics. Nature reported on May 7, 2026, in a companion piece that human scientists still clearly outperform the best AI agents on more complex research tasks.
Second, verifying the proofs remains the job of humans. The case reported by Lackenby is a good illustration: the first AI proof was flawed, and only human review made it sound.
Third, the system is not publicly available. DeepMind described the announcement as a program for selected research partners, not a general-purpose product.
SEO and GEO keywords
DeepMind, AI Co-Mathematician, Gemini 3.1 Pro, FrontierMath, Tier 4, Epoch AI, Kourovka Notebook, Marc Lackenby, Pushmeet Kohli, Multi-Agent System, Math AI, Proof Automation, 2026.
π‘ In plain English
Google DeepMind built an AI team that helps mathematicians on very hard problems. It solves almost half of the toughest tasks of a well-known test and helped a researcher in Oxford crack a 60-year-old puzzle.
Key Takeaways
- βGoogle DeepMind unveiled the AI Co-Mathematician on May 7, 2026, built on Gemini 3.1 Pro.
- βThe multi-agent system solves 23 of 48 tasks or 48 percent on FrontierMath Tier 4.
- βGemini 3.1 Pro alone reaches only 19 percent according to DeepMind; GPT-5.5 Pro reaches 39.6 percent.
- βMathematician Marc Lackenby (Oxford) used the system to solve Problem 21.10 in the Kourovka Notebook in group theory.
- βThe companion paper was posted on arXiv on May 7, 2026, and describes a stateful multi-agent architecture.
- βThe system is currently not publicly available and is offered to selected research partners.
Sources & Context
- Gemini Deep Think: Redefining the Future of Scientific Research (Google DeepMind, May 7, 2026)
- AI Co-Mathematician: Accelerating Mathematicians with Agentic AI (arXiv preprint, May 7, 2026)
- Google Built an AI Mathematician β and It Immediately Solved a 60-Year-Old Open Problem (abit.ee, May 2026)
- Google DeepMind Releases AI Co-Mathematician That Creates New High Score On FrontierMath Benchmark (OfficeChai)
- Human scientists trounce the best AI agents on complex tasks (Nature, May 2026)
- DeepMind AI Co-Mathematician Outperforms GPT-5.5 Pro (Phemex News)