DeepMind AI Co-Mathematician hits 48 percent on FrontierMath Tier 4

Google DeepMind opens an AI research workshop for math

Google DeepMind unveiled the AI Co-Mathematician on May 7, 2026, a multi-agent system that works alongside human researchers on open problems in pure mathematics. The companion paper "AI Co-Mathematician: Accelerating Mathematicians with Agentic AI" was posted on arXiv on May 7, 2026.

What the system actually does

The AI Co-Mathematician is a stateful, asynchronous workspace. A "project coordinator" agent orchestrates a hierarchy of specialized agents that pursue different proof strategies in parallel, log failed hypotheses, and produce LaTeX drafts with margin notes and provenance information. The underlying model is Gemini 3.1 Pro.

Result on FrontierMath Tier 4

On Epoch AI's FrontierMath Tier 4 benchmark, the system solved 23 of 48 tasks, or 48 percent, according to the DeepMind blog post and a thread by Pushmeet Kohli on X. Epoch AI describes the benchmark as designed so that some of its problems may remain unsolved by AI for decades. Comparison numbers from the DeepMind release: Gemini 3.1 Pro alone scores 19 percent, GPT-5.5 Pro 39.6 percent, GPT-5.4 Pro 37.5 percent, and Claude Opus 4.7 and 4.6 score 22.9 percent.

Real research application

Marc Lackenby, a mathematician at the University of Oxford, used the system, according to DeepMind and the Oxford press notice, to resolve Problem 21.10 in the Kourovka Notebook, a long-running collection of open problems in group theory. A reviewer agent flagged a flaw in the AI's first proof attempt, and Lackenby then realized how to close the gap.

Why it matters

Mathematical research is one of the areas where language models have most often failed, because a single logical error invalidates the entire proof. The fact that a multi-agent setup turns a 19 percent base-model success rate into 48 percent shows how much potential lies in orchestration. For industry and government, the direct consequence is less dramatic than the headlines suggest: math stays human. But specific engineering tasks, such as verifying cryptographic constructions or optimizing discrete structures, could benefit measurably from systems like this.

In plain language

Picture a research team trying to crack a very hard puzzle. One person collects ideas, another checks them, a third writes everything up cleanly, a fourth hunts for mistakes. The AI Co-Mathematician is a digital team that plays exactly these roles and works alongside real mathematicians. When the team takes a wrong turn, it asks a human expert and keeps searching.

A practical example

A German research group at a technical university is investigating the security of a new post-quantum encryption scheme in 2026. Classically, a doctoral student would spend six months trying alone to prove a lower bound on the complexity of a lattice problem. With the AI Co-Mathematician, the work runs in parallel: three agent pairs explore different proof techniques, while a reviewer agent continuously checks lemmas. After eight weeks, a draft proof is ready, complete with a LaTeX version and a record of dead ends. The student manually verifies the result, submits it to a conference like CRYPTO, and saves four to five months of working time.

Scope and limits

First, FrontierMath is a synthetic benchmark. Scoring 48 percent there does not mean the system performs equally well on real research mathematics. Nature reported on May 7, 2026, in a companion piece that human scientists still clearly outperform the best AI agents on more complex research tasks.

Second, verifying the proofs remains the job of humans. The case reported by Lackenby is a good illustration: the first AI proof was flawed, and only human review made it sound.

Third, the system is not publicly available. DeepMind described the announcement as a program for selected research partners, not a general-purpose product.

SEO and GEO keywords

DeepMind, AI Co-Mathematician, Gemini 3.1 Pro, FrontierMath, Tier 4, Epoch AI, Kourovka Notebook, Marc Lackenby, Pushmeet Kohli, Multi-Agent System, Math AI, Proof Automation, 2026.