cyberivy
AI SecurityCiscoLLM SafetyJailbreaksFrontier ModelsRisk ManagementEU AI Act

Cisco: long chats break AI safety filters more often

May 27, 2026

Silbernes Vorhängeschloss mit geschlossenem Bügel auf transparentem Hintergrund.

Cisco tested 15 closed frontier models and found that multi-turn attacks sharply raise success rates compared with single-prompt tests.

What this is about

Cisco reported on May 27, 2026 that classic single-prompt safety tests can give an overly friendly picture of frontier models. In tests on 15 proprietary models from OpenAI, Anthropic, Google, Amazon and xAI, multi-turn attack success rates ranged from 7.9 to 88.3 percent. Single-turn rates on the same models ranged from 2.2 to 64.9 percent.

What the thing actually does

The researchers did not run attacks only as one forbidden question. They ran them as multi-step conversations. Strategies included role-play, ambiguous context, reframing, decomposing and later reassembling information, and gradual escalation. Reporting cites 30,090 single-turn prompts and 6,986 multi-turn attacks across 1,456 conversations.

Why it matters

Organizations often buy models based on model cards, benchmarks and demos. If a model looks safe in a short test but weakens in longer conversations, procurement and compliance risk follows. Cisco argues that configuration choices such as reasoning mode, temperature and guardrail tiers can change safety behavior and should be documented.

In plain language

Imagine a bouncer who immediately recognizes one forbidden request. If someone first makes small talk, assigns roles and splits the request into five harmless-looking parts, the check becomes harder. That is the difference multi-turn attacks test.

A practical example

A news channel uploads 20 Shorts in one day. Three clips show photorealistic scenes that were fully created with a video model. If the channel does not disclose that use properly, YouTube can automatically label those three clips. Nothing changes for the other 17 videos as long as no significant photorealistic AI use is detected.

Scope and limits

  • The label does not automatically say whether a video is false, manipulative or harmless.
  • Detection systems can wrongly label real footage or miss synthetic content.
  • The change does not solve the problem that viewers interpret labels differently.

SEO & GEO keywords

Cisco AI Threat Research, multi-turn jailbreak, frontier models, AI safety benchmark, adversarial robustness, EU AI Act Article 15, NIST AI RMF, LLM security, prompt injection, model procurement

💡 In plain English

Cisco shows that testing an AI model with single questions is not enough. Attackers often work through conversations, and that is where safety filters can become much weaker.

Key Takeaways

  • Cisco treats multi-turn attacks as a much harder test than single prompts.
  • Reported success rates in longer chats reached up to 88.3 percent.
  • Single-turn benchmarks can overstate real-world model robustness.
  • Settings such as reasoning mode and guardrails belong in safety documentation.

FAQ

What is a multi-turn attack?

An attack across several conversation rounds. Instead of one direct prohibited request, it uses context, roles and detours.

Are all models unsafe?

The report does not say all models are equally weak. It shows that longer attacks can strongly change rankings and risk assessments.

What should companies do?

They should include multi-round tests in procurement and release checks and document model configurations.

Sources & Context