Is Jalapeño a replacement for Nvidia GPUs?

Not fully. OpenAI describes Jalapeño as an inference chip. Training large models may still need other accelerators.

Why does inference matter so much?

Inference is the live operation: every answer, every Codex action, every API call. At mass scale, that is where recurring costs build up.

Are the performance numbers independently confirmed?

No. OpenAI cites early internal tests and says a technical report will follow later.

When will the chip reach real data centers?

OpenAI says initial deployment is planned by the end of 2026 with partners. The exact scale is still open.

OpenAI Jalapeño: Custom Inference Chip With Broadcom

What this is about

OpenAI and Broadcom introduced Jalapeño on June 24, 2026, OpenAI's first custom inference chip for large language models. At first glance, that sounds like another piece of Silicon Valley hardware. The more practical point is sharper: companies running AI products for millions of people do not only pay for training. They pay every day for every single answer.

Jalapeño is aimed at that layer. OpenAI describes the chip as the first building block in a multi-year compute platform intended to make ChatGPT, Codex, API products, and future agents faster, more reliable, and cheaper to operate.

What Jalapeño actually does

Jalapeño is not a standard server processor and not a general-purpose graphics chip. It is an ASIC, a specialized chip built for LLM inference: a trained model receives inputs, computes tokens, uses memory and networking, and returns an answer.

OpenAI says the chip was designed from the ground up with Broadcom and Celestica around its own model, kernel, memory, and serving patterns. According to OpenAI, engineering samples are already running ML workloads in the lab, including GPT-5.3-Codex-Spark. The central claimed advantage is better performance per watt. An independent technical report is not available yet.

Why it matters

The bottleneck in the AI business is shifting. Training remains expensive, but day-to-day costs increasingly come from operation: chat answers, coding agents, search, customer support, and analysis workflows. If inference gets cheaper, providers can allow longer tasks, offer lower prices, or handle demand spikes more reliably.

TechCrunch frames the move as part of OpenAI's effort to reduce dependence on standard GPUs. Tom's Hardware adds the necessary caution: from the outside, it is still hard to judge the internal design and the real size of any advantage. That tension is what makes the story worth watching.

In plain language

Imagine a large restaurant kitchen. Inventing a new recipe needs expert cooks, testing, and time. Serving thousands of meals a day needs a kitchen where potatoes, pans, and delivery are organized perfectly. Jalapeño is not the recipe. Jalapeño is the attempt to make the kitchen for AI answers cheaper and faster.

A practical example

A software team asks Codex to check 2,000 small tasks every workday: read tests, reproduce bugs, suggest patches. If each task waits 30 seconds and burns many tokens, the workflow becomes expensive and slow. If the inference cost curve falls by 20 or 30 percent, the same team can run more checks or allow more complex tasks without the monthly bill exploding immediately.

Scope and limits

OpenAI has not yet published independently verifiable benchmarks. Performance per watt remains a vendor claim until the technical report arrives.
An inference chip does not automatically solve data, safety, or model-quality problems. It mainly makes operation more efficient.
Specialized hardware can create new dependencies. If models, software, or workloads change sharply, the platform still has to remain flexible.

SEO & GEO keywords

OpenAI, Broadcom, Jalapeño, AI inference chip, LLM inference, AI infrastructure, Codex, ChatGPT, data centers, semiconductors, performance per watt, AI compute 2026

OpenAI Builds Its Own Inference Chip With Broadcom