Google has introduced a new artificial intelligence model, Gemma 4, adding to its growing portfolio of AI tools with an emphasis on multimodal capability and flexible deployment across devices.

Announced on April 2, the model is designed to handle both text and image inputs while producing text-based responses. Some of its smaller variants extend into audio processing, reflecting a broader shift toward systems that can interpret and respond across different forms of data within a single framework.
The company has released Gemma 4 with open weights, offering both pre-trained and instruction-tuned versions. This allows developers to modify and adapt the model according to specific needs, a move that may appeal to those working across varied technical environments.
Gemma 4 comes in four sizes—E2B, E4B, 26B A4B, and 31B—each tailored for different levels of computing power. The smaller versions are intended for use on mobile devices and laptops, while the larger models are built for more demanding tasks on high-performance systems.
A notable feature is its context window, which extends up to 256,000 tokens. This enables the model to process significantly larger amounts of information within a single interaction, a capability that can affect how longer documents, conversations, or datasets are handled.
Across all versions, there is a stated focus on reasoning and coding. The model is designed to approach problems step by step, with improvements in generating, completing, and correcting code. It also includes native support for function calling, allowing for more structured interactions and enabling the development of autonomous systems.
Under the hood, Gemma 4 combines Dense and Mixture-of-Experts architectures. It uses a hybrid attention mechanism that blends local and global processing, an approach intended to balance speed with performance, particularly when working with long inputs.
The model’s multimodal capabilities extend to image and video understanding, including tasks such as object detection, document parsing, and chart analysis. It can also interpret mixed inputs, where text and images are presented together within a single prompt.
In its smaller configurations, Gemma 4 supports audio-related functions such as speech recognition and translation, further widening its range of use.
The release reflects a continued effort to build systems that can operate across different platforms while handling increasingly complex inputs. For developers and users alike, the model’s flexibility and scale suggest a tool designed to fit into a wide range of practical settings without being confined to a single use case.
Read More:
Nothing Phone (4a) Pro Review Finds Balance in Design but Leaves Questions Unanswered
As with similar systems, its impact will depend largely on how it is applied in real-world environments, where performance, reliability, and adaptability tend to matter more than specifications alone.
iNews covers the latest and most impactful stories across
entertainment,
business,
sports,
politics, and
technology,
from AI breakthroughs to major global developments. Stay updated with the trends shaping our world. For news tips, editorial feedback, or professional inquiries, please email us at
info@zoombangla.com.
Get the latest news and Breaking News first by following us on
Google News,
Twitter,
Facebook,
Telegram
, and subscribe to our
YouTube channel.


