Source linked

Gemini 3.5 Live Translate Drops Turn-Taking for Real-Time Speech Across 70+ Languages

Continuous streaming speech-to-speech translation arrives with just seconds of latency, preserving tone and pacing across 70+ languages, and immediately lands in Google Translate, Meet, and the Gemini Live API.

deepmindgemini 3 5 live translategooglespeech to speech translationreal time translationgoogle ai studio

Google DeepMind just killed the awkward pause in voice translation. Gemini 3.5 Live Translate streams speech-to-speech across 70+ languages in near real-time, without waiting for the speaker to finish a sentence before responding. The model balances context and latency — it stays just a few seconds behind the speaker, generating continuous audio that preserves intonation, pacing, and pitch.

Continuous Streaming Beats Turn-by-Turn

Most speech translation systems operate in a rigid turn-based loop: listen, wait, translate, speak. That model works for dictation but breaks down in live conversation where rhythm and timing matter. Gemini 3.5 Live Translate processes audio as it streams, generating translated speech without a hard cut. The result sounds like the same person speaking a different language, not a stilted robotic overlay.

The model handles multilingual inputs automatically — no manual language configuration required. It also builds in noise robustness for loud, unpredictable environments. That's table-stakes for any system that hopes to work in a real cafe or street, not just a quiet lab.

70+ Languages, One Model, No Manual Config

Deployed across three product surfaces starting today. Developers get public preview through the Gemini Live API and Google AI Studio. Google Translate on Android and iOS now includes the model globally — just pair headphones and the app mirrors the speaker’s tone across the same 70+ languages. Android users also get a new 'listening mode' that streams translated audio directly through the phone's earpiece when held to the ear, no headphones needed.

For enterprises, Google Meet gets a private preview this month. The upgrade expands from a previous limit of five languages (and only to/from English) to 70+ languages with over 2,000 language combinations in a single meeting.

From Grab to API Partners: Where It Ships

Grab, the Southeast Asian ride-hailing giant, is already testing 3.5 Live Translate for near real-time multilingual communication between drivers and travelers at pickups. That use case sees over 10 million voice calls per month on Grab’s platform. Early feedback from partners including CJ ENM and LiveKit highlights translation quality, accuracy, and low latency.

Developers can build their own voice translation apps using the Gemini Live API with managed media-streaming integrations from Agora, Fishjam, LiveKit, Pipecat, and Vision Agents. The Gemini Cookbook has demo code and examples.

Watermarking Built In

All audio generated by 3.5 Live Translate is watermarked with SynthID — an imperceptible signal woven directly into the waveform. That means AI-generated speech remains detectable, even in a multi-hop pipeline. Google's model card details the safety and responsibility approach.

With this release, the gap between hearing someone and understanding them in your own language just shrank to a few seconds of delay. That's not a revolution — it's a product that finally ships what we've been told was coming.


Source: Fluid, natural voice translation with Gemini 3.5 Live Translate
Domain: deepmind.google

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.