Google DeepMind released DiffusionGemma on June 10, a 26-billion-parameter open-source AI model that generates text at more than 1,000 tokens per second on a single H100 GPU. The speed comes from a fundamentally different approach: the model generates 256 tokens at the same time in each forward pass, rather than producing one word at a time as standard AI language models do.
The traditional approach, known as autoregressive generation, has dominated large language model design since the earliest GPT models. DiffusionGemma tests whether diffusion, a method already dominant in image generation, can replace that process for text. Google DeepMind described this as one of the most significant public experiments in rethinking how text AI works.
The model is available under an Apache 2.0 license and can be accessed through Hugging Face, Kaggle, and Google Cloud’s Vertex AI Model Garden. The open release means developers and researchers can download, run, and modify the model without licensing fees or restrictions.
DiffusionGemma is built on the Gemma 4 foundation and incorporates techniques from Google’s internal Gemini Diffusion research. It is one of several Gemma models released in June, alongside smaller versions optimized for mobile devices and a unified multimodal model that handles text and images without a separate encoder.
The 4x speed claim refers to comparisons against models of similar size running on equivalent hardware. In practical terms, generating a 1,000-word response would take roughly a quarter of the time compared to a conventional model. For applications requiring real-time or near-real-time output, that difference is substantial.
Diffusion models work by starting from noise and progressively refining output toward a target. For images, this means starting from static and shaping it into a picture. For text, the process is more complex because language is sequential and meaning depends heavily on word order. Earlier attempts at diffusion-based text models struggled with coherence. Google DeepMind says DiffusionGemma resolves most of those problems at scale.
The release comes as competition among AI labs intensifies. OpenAI, Anthropic, Meta, and Mistral have all released model updates in June 2026. The open-weights approach Google has taken with DiffusionGemma is partly a response to the strong community adoption of Meta’s Llama series, which demonstrated that open models can drive wide developer ecosystem growth.
Researchers are expected to test DiffusionGemma extensively over the coming weeks. Benchmark results comparing it against GPT-5.5 and Anthropic’s latest Claude models on standard evaluations are expected to surface on academic preprint servers and AI leaderboards by the end of June.
According to Google DeepMind’s official site, further details about the model architecture and training process are available in the accompanying research paper released alongside the model weights.




