"Google" has introduced two significant projects in the field of artificial intelligence aimed at breaking down language barriers and accelerating text generation. The new audio-model system "Gemini 3.5 Live Translate" allows for near real-time speech translation in more than "70 languages," and "Google DeepMind" announced the experimental "DiffusionGemma" – a model that generates text up to "four times faster" than standard autoregressive models thanks to diffusion techniques in the "Gemma 4" architecture.
"Gemini 3.5 Live Translate": translation that follows the speaker
Unlike traditional systems that wait for the speaker to finish a phrase before outputting a translation, "Gemini 3.5 Live Translate" works as a "stream translator." The model processes and translates speech continuously, with a minimal delay of a few seconds, striving to preserve the "intonation," "pace," and "tonality" of the original speech.
The system launches simultaneously on three main platforms. Globally, it is available in the "Google Translate" app for "Android" and "iOS." In public preview mode, it is offered to developers through the "Gemini Live API" and "Google AI Studio." Concurrently, in private preview, it has been activated for selected "Google Workspace" corporate clients in "Google Meet," where speech translation expands from "5" supported languages to over "70" and more than "2,000" possible language combinations within a single meeting.
New "listen mode" and audio content protection
One of the practical novelties is the so-called "listen mode" for "Android." It allows users to hear the translation directly through the phone's speaker without needing to use headphones – it is enough to place the device to the ear, similar to a standard phone call. This makes the service more convenient for use on the move and in situations where discreet listening is important.
All generated audio outputs from "Gemini 3.5 Live Translate" are marked with a "SynthID" watermark. The goal is to clearly distinguish "materials created by artificial intelligence" from human speech and to enable traceability and verification of origin. This is part of "Google's" broader efforts to introduce transparency standards for the use of generative models in audio and video content.
"DiffusionGemma": text generated by the logic of diffusion
In parallel with audio translation, "Google DeepMind" introduced "DiffusionGemma" – a model that brings the diffusion approach used in image generation to the field of text. Instead of predicting words one by one, as classical autoregressive models do, "DiffusionGemma" starts from "noise" and gradually refines entire blocks of up to "256 tokens" in parallel, similar to how diffusion models "clean" an image step by step.
The model is based on the "Gemma 4" architecture and uses a "mixture of experts" with a total of "26 billion parameters," but during inference, it activates only about "3.8 billion." This allows for significant acceleration: "DiffusionGemma" reaches over "1000 tokens per second" on a single "Nvidia H100" GPU and about "700 tokens per second" on a consumer "GeForce RTX 5090" graphics card. The model's weights are published on "Hugging Face" under "Apache 2.0" – an open license that gives freedom for experiments and integrations.
Four times faster text – but with a cost in quality
"Google" CEO "Sundar Pichai" paid special attention to "DiffusionGemma" on social media, calling it a "workhorse capable of accelerating inference by 4 times." According to him, the model brings the company's long-standing research in the field of "text diffusion" into the "Gemma 4" family and opens the door to a new type of high-speed text systems.
"Google" itself, however, emphasizes that "DiffusionGemma" is an "experimental model" and in terms of output quality, it "yields" to the standard "Gemma 4." The company recommends using it primarily in scenarios where "speed is critical": inline text editing, fast iterations during development, and short "agent cycles." For production systems requiring maximum quality and reliability, "Google" advises relying on proven autoregressive models.
Infrastructure and ecosystem: optimizations and integrations
"Nvidia" has optimized "DiffusionGemma" for its entire product line – from consumer "GPUs" to "DGX Spark" systems. This means the model can be used both in laboratory conditions and in larger infrastructures without additional integration effort. Support is available from day one within "vLLM," "Hugging Face Transformers," and "Unsloth," which allows developers to embed it quickly into existing projects.
Thus, "Gemini 3.5 Live Translate" and "DiffusionGemma" become two sides of the same trend: combining "pragmatic products" like a real-time translator with "experimental models" for accelerated text generation. For users, this means easier communication across language barriers, and for developers – new tools for building faster and more flexible systems based on artificial intelligence.