cyberivy
Thinking MachinesMira MuratiAI ResearchMultimodal AIVoice AIHuman AI CollaborationReal-Time AI

Thinking Machines wants to move AI beyond turn-taking

May 12, 2026

Cyber-Ivy-Titelgrafik mit dunklem Hintergrund, grünem Pflanzenmotiv und heller Schriftmarke

Thinking Machines is showing interaction models that take in audio, video, and text while responding. It is still research, but it targets a real weakness in today’s AI assistants.

What this is about

Thinking Machines Lab, the company founded by former OpenAI CTO Mira Murati, introduced a research direction called interaction models on May 11, 2026. The idea is that AI should no longer simply wait until a human has finished speaking or typing. Instead, it should continuously take in audio, video, and text while responding or working in the background.

At first, that may sound like a product demo. It becomes interesting because Thinking Machines is targeting a real weakness of current AI systems: many models are powerful, but the interface is still slow and turn-based. Human collaboration often works through interruption, showing, asking, and thinking in parallel.

What interaction models actually do

According to Thinking Machines, interaction models are trained for interaction from the start, instead of adding real-time behavior through many external components. The system works with multiple streams and small time slices. Audio, video, and text are not treated as a completed user request, but as continuous context.

The company describes an approach with a fast interaction model and an asynchronous background model. The fast model stays present in the conversation while the background model handles longer tasks, tool use, or research. TechCrunch reports that the demonstrated TML-Interaction-Small is supposed to respond in about 0.40 seconds. Thinking Machines plans a limited research preview first and a wider release later in 2026.

Why it matters

Most current assistants feel like a chat window: the human speaks, the model waits, the model replies, the human waits. For short questions, that is fine. For shared work on code, design, support, education, or live translation, it often feels too rigid.

If a model can keep thinking while listening, different workflows become possible. A support assistant could listen, look up relevant customer data, and ask for clarification. A tutoring assistant could see that a student is stuck on an equation before a complete question is formed. A developer could show code while the AI already marks risky sections. That is not automatically better, but it shifts the interface from prompting to collaboration.

In plain language

The difference is like email versus cooking together. With email, you write everything, wait for the reply, and correct things later. In the kitchen, the other person sees that you are adding too much salt and immediately says: “Stop, use less.”

Interaction models try to bring AI closer to that kitchen situation: listen, see, respond, and still handle longer tasks in the background.

A practical example

A service team handles 800 support calls a day. Today, an employee has to listen, take notes, open a CRM, and search for contract data at the same time. A turn-based AI assistant can only help once the question has been clearly stated.

An interaction model could recognize during the call that the issue is an invoice for 1,240 euros, load contract data in the background, and suggest a clarification: “Do you mean the April invoice or the credit note from May 3?” The human remains responsible, but the AI no longer works only after the end of a sentence.

Scope and limits

  • This is not yet a broadly usable product release. External users cannot yet test quality and latency at scale.
  • Real-time audio and video increase privacy risks. Systems like this need clear consent, storage rules, and visible controls.
  • Fast interaction does not equal reliable truth. A model can interrupt fluently and still draw the wrong conclusion.

SEO & GEO keywords

Thinking Machines Lab, Mira Murati, interaction models, real-time AI, full-duplex AI, multimodal AI, AI voice assistants, TML-Interaction-Small, human AI collaboration, AI research preview

💡 In plain English

Thinking Machines wants AI assistants that do not only take turns listening and replying. They should continuously see, hear, speak, and handle longer work in the background.

Key Takeaways

  • Thinking Machines introduced interaction models as a research preview on May 11, 2026.
  • The approach is meant to process audio, video, and text in parallel in real time.
  • According to TechCrunch, TML-Interaction-Small is claimed to respond in about 0.40 seconds.
  • The main value is in collaboration, support, education, live translation, and work with visible context.
  • Privacy, reliability, and external testing remain key open questions.

FAQ

Can people use interaction models now?

No, they are not broadly available yet. Thinking Machines describes a limited research preview in the coming months.

What is new compared with voice chatbots?

The claim is to make interaction native to the model: simultaneous listening, seeing, replying, and background work instead of just turn-taking with a voice interface.

Why is privacy especially important here?

Because such systems can continuously process audio, video, and text. Without consent and storage rules, assistance can quickly become surveillance.

Sources & Context