Gemma is a set of lightweight, generative artificial intelligence (AI) open models. Gemma models are available to run in your applications and on your hardware, mobile devices, or hosted services. You can also customize these models using tuning techniques so that they excel at performing tasks that matter to you and your users. Gemma models are based on Gemini models and are intended for the AI development community to extend and take further.
Fine-tuning can help improve a model's performance in specific tasks. Because models in the Gemma model family are open weight, you can tune any of them using the AI framework of your choice and the Vertex AI SDK. You can open a notebook example to fine-tune the Gemma model using a link available on the Gemma model card in Model Garden.
The following Gemma models are available to use with Vertex AI. To learn more about and test the Gemma models, see their Model Garden model cards.
Model name | Use cases | Model Garden model card |
---|---|---|
Gemma 2 | Best for text generation, summarization, and extraction. | Go to the Gemma 2 model card |
Gemma | Best for text generation, summarization, and extraction. | Go to the Gemma model card |
CodeGemma | Best for code generation and completion. | Go to the CodeGemma model card |
PaliGemma | Best for image captioning tasks and visual question and answering tasks. | Go to the PaliGemma model card |
The following are some options for where you can use Gemma:
Use Gemma with Vertex AI
Vertex AI offers a managed platform for rapidly building and scaling machine learning projects without needing in-house MLOps expertise. You can use Vertex AI as the downstream application that serves the Gemma models. For example, you might port weights from the Keras implementation of Gemma. Next, you can use Vertex AI to serve that version of Gemma to get predictions. We recommend using Vertex AI if you want end-to-end MLOps capabilities, value-added ML features, and a serverless experience for streamlined development.
To get started with Gemma, see the following notebooks:
Fine-tune Gemma using PEFT and then deploy to Vertex AI from Vertex
Fine-tune Gemma using PEFT and then deploy to Vertex AI from Huggingface
Fine-tune Gemma with Ray on Vertex AI and then deploy to Vertex AI
Use Gemma in other Google Cloud products
You can use Gemma with other Google Cloud products, such as Google Kubernetes Engine and Dataflow.
Use Gemma with GKE
Google Kubernetes Engine (GKE) is the Google Cloud solution for managed Kubernetes that provides scalability, security, resilience, and cost effectiveness. We recommend this option if you have existing Kubernetes investments, your organization has in-house MLOps expertise, or if you need granular control over complex AI/ML workloads with unique security, data pipeline, and resource management requirements. To learn more, see the following tutorials in the GKE documentation:
- Serve Gemma with vLLM
- Serve Gemma with TGI
- Serve Gemma with Triton and TensorRT-LLM
- Serve Gemma with JetStream
- Serve Gemma with Saxml
Use Gemma with Dataflow
You can use Gemma models with Dataflow for sentiment analysis. Use Dataflow to run inference pipelines that use the Gemma models. To learn more, see Run inference pipelines with Gemma open models.
Use Gemma with Colab
You can use Gemma with Colaboratory to create your Gemma solution. In Colab, you can use Gemma with framework options such as PyTorch and JAX. To learn more, see:
- Get started with Gemma using Keras.
- Get started with Gemma using PyTorch.
- Basic tuning with Gemma using Keras.
- Distributed tuning with Gemma using Keras.
Gemma model sizes and capabilities
Gemma models are available in several sizes so you can build generative AI solutions based on your available computing resources, the capabilities you need, and where you want to run them. Each model is available in a tuned and an untuned version:
Pretrained - This version of the model wasn't trained on any specific tasks or instructions beyond the Gemma core data training set. We don't recommend using this model without performing some tuning.
Instruction-tuned - This version of the model was trained with human language interactions so that it can participate in a conversation, similar to a basic chat bot.
Mix fine-tuned - This version of the model is fine-tuned on a mixture of academic datasets and accepts natural language prompts.
Lower parameter sizes means lower resource requirements and more deployment flexibility.
Model name | Parameters size | Input | Output | Tuned versions | Intended platforms |
---|---|---|---|---|---|
Gemma 2 | |||||
Gemma 27B | 27 billion | Text | Text |
|
Large servers or server clusters |
Gemma 9B | 9 billion | Text | Text |
|
Higher-end desktop computers and servers |
Gemma 2B | 2 billion | Text | Text |
|
Mobile devices and laptops |
Gemma | |||||
Gemma 7B | 7 billion | Text | Text |
|
Desktop computers and small servers |
Gemma 2B | 2.2 billion | Text | Text |
|
Mobile devices and laptops |
CodeGemma | |||||
CodeGemma 7B | 7 billion | Text | Text |
|
Desktop computers and small servers |
CodeGemma 2B | 2 billion | Text | Text |
|
Desktop computers and small servers |
PaliGemma | |||||
PaliGemma 3B | 3 billion | Text | Text |
|
Desktop computers and small servers |
Gemma has been tested using Google's purpose built v5e TPU hardware and NVIDIA's L4(G2 Standard), A100(A2 Standard), H100(A3 High) GPU hardware.
What's next
- See Gemma documentation.