Introduction to Gemma 4
Gemma 4 is a significant advancement in the field of multimodal understanding, offering a byte-for-byte equivalent to proprietary models. This means that developers and researchers can now access high-performance models without being locked into a specific vendor. The Gemma 4 family of models includes four sizes, ranging from models suitable for phones to those that require multi-GPU servers. Additionally, Gemma 4 supports multimodal input, including text, image, video, and audio, making it a versatile tool for a wide range of applications.
Architecture and Technical Details
The Gemma 4 model architecture is based on a mixture of experts (MoE) approach, which allows for efficient use of parameters. The model activates only 3.8B of its 26B parameters at a time, using 128 tiny experts. This sparse architecture enables the model to achieve high performance while reducing the computational requirements. The MoE layer uses a router to select the experts to activate, and the model also employs techniques such as Grouped Query Attention (GQA) and p-RoPE to improve its performance.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "google/gemma-4-31b-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")Example code for loading and using the Gemma 4 model
Running Inference with Gemma 4
To run inference with Gemma 4, you can use the ollama run command with the desired model size. For example, to run the 26B model, you can use the command `ollama run gemma4:26b`. You can also use the `vllm serve` command to serve the model and make it available for use in your application. Additionally, you can fine-tune the model on a free Google Colab T4 GPU.
ollama run gemma4:26bExample command for running inference with the 26B model

Getting Started with Gemma 4
To get started with Gemma 4, you can access the models on the Kaggle platform or through the Google AI Studio. The models are available under an Apache 2.0 license, which allows for unrestricted use in commercial and non-commercial applications. You can also find documentation and tutorials on the official Gemma 4 GitHub page.
4
Model sizes available
256K
Context size supported
Apache 2.0
License type
Gemma 4 vs Proprietary Models
Gemma 4 vs Proprietary Models
| Component | Open / This Approach | Proprietary Alternative |
|---|---|---|
| Model provider | Any โ OpenAI, Anthropic, Ollama | Single vendor lock-in |
| Licensing | Apache 2.0 | Restrictive licensing |
| Customizability | Highly customizable | Limited customizability |
๐ Key Takeaway
Gemma 4 offers a byte-for-byte equivalent to proprietary models, making it an attractive option for developers and researchers who want to access high-performance models without being locked into a specific vendor. With its native function calling and support for up to 256K context, Gemma 4 is a powerful tool for building agents. The model is available under an Apache 2.0 license, which allows for unrestricted use in commercial and non-commercial applications.
Key Links