This guide shows you how to tune a Gemini model by using supervised fine-tuning. This page covers the following topics: The following diagram summarizes the overall workflow: Before you can tune a model, you must prepare a supervised fine-tuning dataset. For instructions, see the documentation for your data modality: The following Gemini models support supervised tuning: You can create a supervised fine-tuning job by using the Google Cloud console, the Google Gen AI SDK, the Vertex AI SDK for Python, the REST API, or Colab Enterprise. The following table helps you decide which option is best for your use case. To tune a text model by using the Google Cloud console, do the following: In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page. Click Create tuned model. Under Model details, configure the following: Under Tuning setting, configure the following: Optional: To disable intermediate checkpoints and use only the latest checkpoint, select the Export last checkpoint only toggle. Click Continue. The Tuning dataset page opens. Select your tuning dataset: Optional: To get validation metrics during training, select the Enable model validation toggle. Click Start Tuning. Your new model appears in the Gemini Pro tuned models section on the Tune and Distill page. When the tuning job is complete, the Status is Succeeded. To create a model tuning job, send a POST request by using the
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following.
You can create a model tuning job in Vertex AI by using the
side panel in Colab Enterprise. The side panel
adds the relevant code snippets to your notebook. You can then modify
the code snippets and run them to create your tuning job. To learn more
about using the side panel with your Vertex AI tuning jobs,
see Interact with Vertex AI
to tune a model.
In the Google Cloud console, go to
the Colab Enterprise My notebooks page.
In the Region menu, select the region that contains your notebook. Click the notebook that you want to open. If you haven't created a notebook yet,
create a notebook.
To the right of your notebook, in the side panel, click the
The side panel expands the Tuning tab. Click the Tune a Gemini model button.
Colab Enterprise adds code cells to your notebook for
tuning a Gemini model.
In your notebook, find the code cell that stores parameter values.
You'll use these parameters to interact with Vertex AI.
Update the values for the following parameters: In the next code cell, update the model tuning parameters: Run the code cells that the side panel added to your notebook.
After the last code cell runs, click the
The side panel shows information about your model tuning job.
After the tuning job has completed, you can go directly from
the Tuning details tab to a page where you can test your model.
Click Test.
The Google Cloud console opens to the Vertex AI
Text chat page, where you can test your model.
For your first tuning job, use the default hyperparameters. They are set to recommended values based on benchmarking results. For a discussion of best practices for supervised fine-tuning, see the blog post Supervised Fine Tuning for Gemini: A best practices guide. You can view a list of your tuning jobs, get the details of a specific job, or cancel a running job. To view a list of tuning jobs in your project, use the Google Cloud console, the Google Gen AI SDK, the Vertex AI SDK for Python, or send a GET request by using the To view your tuning jobs in the Google Cloud console, go to the Vertex AI Studio page. Your Gemini tuning jobs are listed in the Gemini Pro tuned models table. To view a list of model tuning jobs, send a GET request by using the
Before using any of the request data,
make the following replacements:
HTTP method and URL:
To send your request, choose one of these options:
Execute the following command:
Execute the following command:
You should receive a JSON response similar to the following. To get the details of a specific tuning job, use the Google Cloud console, the Google Gen AI SDK, the Vertex AI SDK for Python, or send a GET request by using the To view details of a tuned model in the Google Cloud console, go to the Vertex AI Studio page. In the Gemini Pro tuned models table, find your model and click Details. The model details page opens. To get the details of a model tuning job, send a GET request by using the
Before using any of the request data,
make the following replacements:
HTTP method and URL:
To send your request, choose one of these options:
Execute the following command:
Execute the following command:
You should receive a JSON response similar to the following. To cancel a running tuning job, use the Google Cloud console, the Vertex AI SDK for Python, or send a POST request using the To cancel a tuning job in the Google Cloud console, go to the Vertex AI Studio page. In the Gemini Pro tuned models table, click Click Cancel. To cancel a model tuning job, send a POST request by using the
Before using any of the request data,
make the following replacements:
HTTP method and URL:
To send your request, choose one of these options:
Execute the following command:
Execute the following command:
You should receive a JSON response similar to the following. After your model is tuned, you can interact with its endpoint in the same way as a base Gemini model. You can use the Vertex AI SDK for Python, the Google Gen AI SDK, or send a POST request by using the For models that support reasoning, such as Gemini 2.5 Flash, set the thinking budget to 0 for tuned tasks to optimize performance and cost. During supervised fine-tuning, the model learns to mimic the ground truth in the tuning dataset and omits the thinking process. Therefore, the tuned model can handle the task effectively without a thinking budget. The following examples show how to prompt a tuned model with the question "Why is the sky blue?". To test a tuned model in the Google Cloud console, go to the Vertex AI Studio page. In the Gemini Pro tuned models table, find your model and click Test. A new page opens where you can create a conversation with your tuned model. To test a tuned model with a prompt, send a POST request and specify the
Before using any of the request data,
make the following replacements:
If the model returns a response that's too generic, too short, or the model gives a fallback
response, try increasing the temperature. Specify a lower value for less random responses and a higher value for more
random responses. For each token selection step, the top-K tokens with the highest
probabilities are sampled. Then tokens are further filtered based on top-P with
the final token selected using temperature sampling. Specify a lower value for less random responses and a higher value for more
random responses. Specify a lower value for shorter responses and a higher value for potentially longer
responses.
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following. To delete a tuned model: Call the
Before using any of the request data,
make the following replacements:
HTTP method and URL:
To send your request, choose one of these options:
Execute the following command:
Execute the following command:
You should receive a successful status code (2xx) and an empty response. You can configure a model tuning job to collect and report tuning and evaluation metrics, which can then be visualized in Vertex AI Studio. To view the metrics for a tuned model: In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page. In the Tune and Distill table, click the name of the tuned model that you want to view. The metrics appear on the Monitor tab. Visualizations are available after the tuning job starts and are updated in real time. The model tuning job automatically collects the following tuning metrics for If you provide a validation dataset when you create the tuning job, the following validation metrics are collected for
Before you begin
Supported models
Create a tuning job
Method
Description
Use Case
Google Cloud console
A graphical user interface for creating and managing tuning jobs.
Best for getting started, visual exploration, or one-off tuning tasks without writing code.
Google Gen AI SDK
A high-level Python SDK focused specifically on generative AI workflows.
Ideal for Python developers who want a simplified, generative AI-centric interface.
Vertex AI SDK for Python
The comprehensive Python SDK for all Vertex AI services.
Recommended for integrating model tuning into larger MLOps pipelines and automation scripts.
REST API
A language-agnostic interface for making direct HTTP requests to the Vertex AI API.
Use for custom integrations, non-Python environments, or when you need fine-grained control over requests.
Colab Enterprise
An interactive notebook environment with a side panel that generates code snippets for tuning.
Excellent for experimentation, iterative development, and documenting your tuning process in a notebook.
Console
gemini-2.5-flash
.
Google Gen AI SDK
Vertex AI SDK for Python
REST
tuningJobs.create
method. Some parameters are not supported by all models. Include only the applicable parameters for the model that you're tuning.
true
to use only the latest checkpoint.projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. For more information, see Customer-managed encryption keys (CMEK).roles/aiplatform.tuningServiceAgent
role to the service account. Also grant the Tuning Service Agent roles/iam.serviceAccountTokenCreator
role to the customer-managed Service Account.POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs
{
"baseModel": "BASE_MODEL",
"supervisedTuningSpec" : {
"trainingDatasetUri": "TRAINING_DATASET_URI",
"validationDatasetUri": "VALIDATION_DATASET_URI",
"hyperParameters": {
"epochCount": "EPOCH_COUNT",
"adapterSize": "ADAPTER_SIZE",
"learningRateMultiplier": "LEARNING_RATE_MULTIPLIER"
},
"export_last_checkpoint_only": EXPORT_LAST_CHECKPOINT_ONLY,
},
"tunedModelDisplayName": "TUNED_MODEL_DISPLAYNAME",
"encryptionSpec": {
"kmsKeyName": "KMS_KEY_NAME"
},
"serviceAccount": "SERVICE_ACCOUNT"
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs" | Select-Object -Expand ContentExample curl command
PROJECT_ID=myproject
LOCATION=global
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
"https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/tuningJobs" \
-d \
$'{
"baseModel": "gemini-2.5-flash",
"supervisedTuningSpec" : {
"training_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/gemini/text/sft_train_data.jsonl",
"validation_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/gemini/text/sft_validation_data.jsonl"
},
"tunedModelDisplayName": "tuned_gemini"
}'
Colab Enterprise
PROJECT_ID
: The ID of the project that your
notebook is in.
REGION
: The region that your notebook is in.
TUNED_MODEL_DISPLAY_NAME
: The name of your
tuned model.
source_model
: The Gemini model that
you want to use, for example, gemini-2.0-flash-001
.
train_dataset
: The URL of your training dataset.
validation_dataset
: The URL of your validation dataset.
Tuning hyperparameters
View and manage tuning jobs
View a list of tuning jobs
tuningJobs
method. Console
Google Gen AI SDK
Vertex AI SDK for Python
REST
tuningJobs.list
method.
GET https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs
curl
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs"PowerShell
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs" | Select-Object -Expand ContentGet details of a tuning job
tuningJobs
method. Console
Google Gen AI SDK
Vertex AI SDK for Python
REST
tuningJobs.get
method and specify the TuningJob_ID
.
GET https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID
curl
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID"PowerShell
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID" | Select-Object -Expand ContentCancel a tuning job
tuningJobs
method. Console
Vertex AI SDK for Python
REST
tuningJobs.cancel
method and specify the TuningJob_ID
.
POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel
curl
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d "" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel"PowerShell
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel" | Select-Object -Expand ContentEvaluate the tuned model
generateContent
method. Console
Google Gen AI SDK
Vertex AI SDK for Python
from vertexai.generative_models import GenerativeModel
sft_tuning_job = sft.SupervisedTuningJob("projects/<PROJECT_ID>/locations/<TUNING_JOB_REGION>/tuningJobs/<TUNING_JOB_ID>")
tuned_model = GenerativeModel(sft_tuning_job.tuned_model_endpoint_name)
print(tuned_model.generate_content(content))
REST
TUNED_ENDPOINT_ID
.
topP
and topK
are applied. Temperature controls the degree of randomness in token selection.
Lower temperatures are good for prompts that require a less open-ended or creative response, while
higher temperatures can lead to more diverse or creative results. A temperature of 0
means that the highest probability tokens are always selected. In this case, responses for a given
prompt are mostly deterministic, but a small amount of variation is still possible.
0.5
, then the model will
select either A or B as the next token by using temperature and excludes C as a
candidate.
1
means the next selected token is the most probable among all
tokens in the model's vocabulary (also called greedy decoding), while a top-K of
3
means that the next token is selected from among the three most
probable tokens by using temperature.
POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent
{
"contents": [
{
"role": "USER",
"parts": {
"text" : "Why is sky blue?"
}
}
],
"generation_config": {
"temperature":TEMPERATURE,
"topP": TOP_P,
"topK": TOP_K,
"maxOutputTokens": MAX_OUTPUT_TOKENS
}
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent" | Select-Object -Expand ContentDelete a tuned model
Vertex AI SDK for Python
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)
# To find out which models are available in Model Registry
models = aiplatform.Model.list()
model = aiplatform.Model(MODEL_ID)
model.delete()
REST
models.delete
method.
DELETE https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models/MODEL_ID
curl
curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models/MODEL_ID"PowerShell
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models/MODEL_ID" | Select-Object -Expand ContentTuning and validation metrics
Model tuning metrics
Gemini 2.0 Flash
:
/train_total_loss
: Loss for the tuning dataset at a training step./train_fraction_of_correct_next_step_preds
: The token accuracy at a training step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the tuning dataset./train_num_predictions
: Number of predicted tokens at a training step.Model validation metrics
Gemini 2.0 Flash
. If you don't specify a validation dataset, only the tuning metrics are available.
/eval_total_loss
: Loss for the validation dataset at a validation step./eval_fraction_of_correct_next_step_preds
: The token accuracy at a validation step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the validation dataset./eval_num_predictions
: Number of predicted tokens at a validation step.What's next
Tune Gemini models by using supervised fine-tuning
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-21 UTC.