Which Ollama version is affected?

According to the NVD, Ollama versions before 0.17.1 are affected. The issue was addressed in version 0.17.1.

Does Ollama need to be public on the internet?

No. Internal exposure can also be risky. The highest risk is for instances reachable without a firewall, authentication proxy or access control.

What data could leak?

The sources mention prompts, system prompts, environment variables, API keys and data from concurrent user conversations in process memory.

What should operators do now?

Update to Ollama 0.17.1 or newer, restrict network access, use an authentication proxy and rotate secrets if an instance was exposed.

Ollama flaw can leak prompts and API keys from memory

What this is about

Ollama is the quick route many developers and companies use to run language models such as Llama or Mistral locally. That is why the new vulnerability matters: it does not hit an obscure side service, but a common component in local AI stacks.

Cyera describes the issue as Bleeding Llama. In the NVD it is tracked as CVE-2026-7482. According to Cyera, attackers can read memory from vulnerable Ollama servers without authenticating first. SecurityWeek reports that roughly 300,000 publicly reachable Ollama deployments may need to be checked and secured.

What Bleeding Llama actually does

The bug sits in the handling of GGUF model files, the file format used to store many local LLM weights. An attacker can send a manipulated GGUF file to an Ollama server. In simple terms, the file claims: "There is more model data inside me than there really is."

During processing, Ollama then reads beyond the end of the intended memory buffer. That is a classic out-of-bounds read. According to the NVD, the exposed memory can include environment variables, API keys, system prompts and data from concurrent user conversations in the process memory. The second step is what makes it especially serious: the resulting model artifact can be pushed to an attacker-controlled registry through Ollama's push function.

The vulnerability affects Ollama versions before 0.17.1. The upstream distribution does not protect the relevant endpoints with authentication. By default, Ollama binds to 127.0.0.1 according to the NVD, but the documented OLLAMA_HOST=0.0.0.0 configuration is widely used in practice to expose the service inside networks or container setups.

Why it matters

Local AI is often presented as a safer alternative to cloud AI: data stays in-house, models run on company machines, prompts do not leave the server. That assumption only holds if the local infrastructure is secured as rigorously as any other production service.

This is the risk: many Ollama instances start as developer tools but later become part of internal workflows, agents, chatbots or RAG prototypes. In those environments, memory may hold more than harmless test prompts. It can contain customer data, source code, internal system instructions, database URLs or tokens for other APIs.

The case also shows why AI infrastructure cannot be treated like a toy. A model server is not a notebook with a GPU attached. It is a network service that processes files, keeps memory and can see secrets. That puts it in the same security class as databases, CI runners and internal APIs.

In plain language

Imagine a locked workshop where someone drops off a manipulated box. The label says: "Please take 100 screws from this box." In reality, there are only 10 screws inside. The worker keeps reaching and suddenly picks up items from the neighboring table: keys, notes and someone else's work orders.

That is the core of this issue. The file tells Ollama how much to read. If that number is not checked properly, the server reads too far and grabs data that should never have been part of the model.

A practical example

A mid-sized software team runs Ollama on an internal GPU server. For testing, port 11434 is also reachable through the VPN. Three teams use the service: a support bot processes ticket text, a developer agent reads error messages, and a RAG prototype queries internal documents.

An attacker with network access sends a crafted GGUF file to /api/create. During quantization, the server reads memory areas that recently held a support prompt with customer data and an environment token for a document store. The model artifact is then pushed out through /api/push.

The example is simplified, but the scale is realistic: a single exposed AI server can touch many teams if it is used as a shared inference service. In practice, that means updating to Ollama 0.17.1 or newer, restricting network access, placing an authentication proxy in front of it, and reviewing logs and secrets if the instance was publicly reachable.

Scope and limits

Not every Ollama installation is automatically exposed to the internet. Instances reachable without a firewall, proxy or access control are the main concern.
The publicly described exploit path depends on crafted model files and the affected API endpoints. Teams that already isolate or block upload, create and push flows reduce the risk significantly.
The sources do not show that all 300,000 referenced instances were compromised. The number describes observed exposure or potentially vulnerable deployments, not confirmed victims.

SEO & GEO keywords

Ollama, Bleeding Llama, CVE-2026-7482, GGUF, local LLMs, AI Security, API keys, prompt leak, out-of-bounds read, AI infrastructure, self-hosted AI, model server