Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

About supervised fine-tuning for Gemini models

Supervised fine-tuning helps you adapt a Gemini model to your specific needs. By providing a labeled dataset of examples, you can adjust the model's weights to optimize its performance for a particular task. This method is a good option when you have a well-defined task. It's particularly effective for domain-specific applications where the language or content differs significantly from the data the base model was trained on. You can tune models with text, image, audio, and document data.

This page describes the following topics:

Use cases for supervised fine-tuning: Discover when to use supervised fine-tuning to improve model performance.
Supported models: Learn which Gemini models are available for supervised fine-tuning.
Limitations: Review the specifications and constraints for tuning.
Known issues: Understand current issues with supervised fine-tuning.
Configure a tuning job region: Learn how to set the region for your tuning job.
Quota: Review quota information.
Pricing: Review pricing information.

To learn more about the benefits of tuning, see When to use supervised fine-tuning for Gemini and Hundreds of organizations are fine-tuning Gemini models. Here's their favorite use cases.

Use cases for supervised fine-tuning

Foundation models are a good choice when the expected output or task can be clearly and concisely defined in a prompt, and the prompt consistently produces the expected output. If you want a model to learn something niche or specific that deviates from general patterns, consider tuning that model. For example, you can use model tuning to teach the model the following:

Specific structures or formats for generating output.
Specific behaviors, such as when to provide a terse or verbose output.
Specific customized outputs for specific types of inputs.

The following examples are use cases that are difficult to capture with only prompt instructions:

Classification: The expected response is a specific word or phrase. Tuning the model can help prevent it from generating verbose responses.

Prompt: Classify the following text into one of the following classes: [business, entertainment]. Text: Diversify your investment portfolio

Response: business
Summarization: The summary follows a specific format. For example, you might need to remove personally identifiable information (PII) in a chat summary. This formatting of replacing names with #Person1 and #Person2 is difficult to describe in a prompt, and the foundation model might not naturally produce such a response.

Prompt: Summarize: Jessica: That sounds great! See you in Times Square! Alexander: See you at 10!

Response: #Person1 and #Person2 agree to meet at Times Square at 10:00 AM.

Extractive question answering: The question is about a context and the answer is a substring of the context.

Prompt: Context: There is evidence that there have been significant changes in Amazon rainforest vegetation over the last 21,000 years through the Last Glacial Maximum (LGM) and subsequent deglaciation. Question: What does LGM stand for?

Response: Last Glacial Maximum

Chat: You need to customize model response to follow a persona, role, or character.

Prompt: User: What's the weather like today?

Response: Assistant: As the virtual shopkeeper of Example Organization, I can only help you with the purchases and shipping.

You can also tune a model in the following situations:

Prompts don't consistently produce the expected results.
The task is too complicated to define in a prompt (for example, behavior cloning for a behavior that's hard to articulate).
You have complex intuitions about a task that are difficult to formalize in a prompt.
You want to reduce context length by removing few-shot examples from prompts.

Supported models

The following Gemini models support supervised fine-tuning:

For models that support thinking, set the thinking budget to off or its lowest value. This can improve performance and reduce costs for tuned tasks. During supervised fine-tuning, the model learns from the training data and omits the thinking process. Therefore, the resulting tuned model can perform tuned tasks effectively without a thinking budget.

Limitations

Gemini 2.5 Flash
Gemini 2.5 Flash-Lite

Specification	Value
Maximum input and output training tokens	131,072
Maximum input and output serving tokens	Same as base Gemini model
Maximum validation dataset size	5000 examples
Maximum training dataset file size	1GB for JSONL
Maximum training dataset size	1M text-only examples or 300K multimodal examples
Adapter size	Supported values are 1, 2, 4, 8, and 16

Gemini 2.5 Pro

Specification	Value
Maximum input and output training tokens	131,072
Maximum input and output serving tokens	Same as base Gemini model
Maximum validation dataset size	5000 examples
Maximum training dataset file size	1GB for JSONL
Maximum training dataset size	1M text-only examples or 300K multimodal examples
Adapter size	Supported values are 1, 2, 4, and 8

Gemini 2.0 Flash
Gemini 2.0 Flash-Lite

Specification	Value
Maximum input and output training tokens	131,072
Maximum input and output serving tokens	Same as base Gemini model
Maximum validation dataset size	5000 examples
Maximum training dataset file size	1GB for JSONL
Maximum training dataset size	1M text-only examples or 300K multimodal examples
Adapter size	Supported values are 1, 2, 4, and 8

Known issues

Applying controlled generation when submitting inference requests to tuned Gemini models can result in decreased model quality. During tuning, controlled generation isn't applied, so the tuned model isn't able to handle it well at inference time. Because supervised fine-tuning customizes the model to generate structured output, you don't need to apply controlled generation when making inference requests on tuned models.

Configure a tuning job region

When you run a tuning job, your data, including the transformed dataset and the final tuned model, is stored in the region you specify. To use available hardware accelerators, computation might be offloaded to other regions within the US or EU multi-regions. This process is transparent and doesn't change where your data is stored.

You can specify the region for a tuning job in the following ways:

Vertex AI SDK: Specify the region when you initialize the client.

import vertexai
vertexai.init(project='myproject', location='us-central1')

REST API: If you create a supervised fine-tuning job by sending a POST request to the tuningJobs.create method, use the URL to specify the region. Replace both instances of TUNING_JOB_REGION with the region where the job runs.
```
 https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs
```
Google Cloud console: In the model creation workflow, select the region from the Region drop-down list on the Model details page.

Quota

Quotas limit the number of concurrent tuning jobs that you can run. Each project has a default quota to run at least one tuning job. This is a global quota, shared across all available regions and supported models. To run more jobs concurrently, you need to request additional quota for Global concurrent tuning jobs.

Pricing

For pricing details, see Vertex AI pricing.

Tuning cost: The cost of tuning is calculated by multiplying the number of tokens in your training dataset by the number of epochs.
Inference cost: After tuning, standard inference pricing applies to prediction requests made to your tuned model. Inference pricing is the same for each stable version of Gemini. For more information, see Available Gemini stable model versions.

What's next

Prepare a supervised fine-tuning dataset.
Learn about deploying a tuned Gemini model.