Mistral Medium 3.5: A 128-Billion-Parameter Model from France Combines Chat, Reasoning, and Coding

Mistral Medium 3.5: Dense 128-Billion-Parameter Model With a 256k Context Window

French AI company Mistral released its new flagship model, Medium 3.5, on May 2, 2026. Unlike many competitors, Mistral did not pick a Mixture-of-Experts architecture. Instead, it shipped a dense model with 128 billion parameters, all of which are activated for every generated token. The context window is 256,000 tokens, and the model is multimodal, with a vision encoder that handles variable image sizes.

Token Pricing and Modified MIT License on Hugging Face

Mistral charges 1.50 U.S. dollars per million input tokens and 7.50 U.S. dollars per million output tokens. That is more than many open-weight competitors, but cheaper than the flagships of OpenAI and Anthropic. The weights are available on Hugging Face under a modified MIT license.

SWE-Bench Verified at 77.6 Percent: Benchmark Details

On internal benchmarks, Mistral reports 77.6 percent on SWE-Bench Verified, which tests real GitHub bug fixes, and 91.4 percent on τ³-Telecom, a test of agentic tool use in telecommunications. That places the model in the upper tier, without quite reaching the top scores of GPT-5.5 or Claude.

One Model Replaces Three: Medium 3.1, Magistral, and Devstral 2 Retired

A notable detail is the consolidation. Medium 3.5 replaces three earlier lines: Medium 3.1, Magistral, and Devstral 2. Developers can configure reasoning effort per request. That lowers cost on easy tasks and lifts quality on hard ones.

Why it matters

For European companies, this matters because it offers a high-performance open-weight model that can also run on premises. According to Mistral, Medium 3.5 runs on as few as four GPUs. That meaningfully lowers the bar for self-hosting, especially in regulated industries such as pharma, banking, and insurance, where data must not leave the company. Mistral is also the only serious EU candidate for frontier LLMs.

Practical example

A German health insurer wants to automatically classify incoming claims and draft follow-up questions to policyholders. Running Medium 3.5 on four H100 GPUs in its own data center allows the model to process patient data without that data going to a U.S. cloud. Monthly costs are mainly electricity and hardware, not API fees. At scale, this pays off quickly, and a large part of the GDPR and EU AI Act debate about data export simply disappears.