This document describes how to tune a Gemini model by using supervised fine-tuning.
Before you begin
Before you begin, you must prepare a supervised fine-tuning dataset. Depending on your use case, there are different requirements.
- Prepare a text dataset for tuning: Text tuning
- Prepare an image dataset for tuning: Image tuning
- Prepare a document dataset for tuning: Document tuning
- Prepare an audio dataset for tuning: Audio tuning
Supported Models
gemini-1.5-pro-002
(In GA)gemini-1.5-flash-002
(In GA)gemini-1.0-pro-002
(In preview, only supports text tuning)
Create a tuning job
You can create a supervised fine-tuning job by using the REST API, the Vertex AI SDK for Python, the Google Cloud console, or Colab Enterprise.
REST
To create a model tuning job, send a POST request by using the
tuningJobs.create
method. Some of the parameters are not supported by all of the models. Ensure
that you include only the applicable parameters for the model that you're
tuning.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
- BASE_MODEL: Name of the
foundation model to tune. Supported values:
gemini-1.5-pro-002
,gemini-1.5-flash-002
, andgemini-1.0-pro-002
. - TRAINING_DATASET_URI: Cloud Storage URI of your training dataset. The dataset must be formatted as a JSONL file. For best results, provide at least 100 to 500 examples. For more information, see About supervised tuning datasets .
- VALIDATION_DATASET_URIOptional: The Cloud Storage URI of your validation dataset file.
- EPOCH_COUNTOptional: The number of complete passes the model makes over the entire training dataset during training. Leave it unset to use the pre-populated recommended value.
- ADAPTER_SIZEOptional: The Adapter size to use for the tuning job. The adapter size influences the number of trainable parameters for the tuning job. A larger adapter size implies that the model can learn more complex tasks, but it requires a larger training dataset and longer training times.
- LEARNING_RATE_MULTIPLIER: Optional: A multiplier to apply to the recommended learning rate. Leave it unset to use the recommended value.
- TUNED_MODEL_DISPLAYNAMEOptional: A display name for the tuned model. If not set, a random name is generated.
HTTP method and URL:
POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs
Request JSON body:
{ "baseModel": "BASE_MODEL", "supervisedTuningSpec" : { "trainingDatasetUri": "TRAINING_DATASET_URI", "validationDatasetUri": "VALIDATION_DATASET_URI", "hyperParameters": { "epochCount": EPOCH_COUNT, "adapterSize": "ADAPTER_SIZE", "learningRateMultiplier": LEARNING_RATE_MULTIPLIER }, }, "tunedModelDisplayName": "TUNED_MODEL_DISPLAYNAME" }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Example curl command
PROJECT_ID=myproject
LOCATION=us-central1
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
"https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/tuningJobs" \
-d \
$'{
"baseModel": "gemini-1.5-pro-002",
"supervisedTuningSpec" : {
"training_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",
"validation_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/sft_validation_data.jsonl"
},
"tunedModelDisplayName": "tuned_gemini_pro"
}'
Python
Console
To tune a text model with supervised fine-tuning by using the Google Cloud console, perform the following steps:
In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.
Click Create tuned model.
Under Tuning method, select the radio button for Supervised tuning.
Under Model details, configure the following:
- In the Tuned model name field, enter a name for your new tuned model, up to 128 characters.
- In the Base model field, select
gemini-1.5-pro-002
. - In the Region drop-down field, Select the region where the pipeline tuning job runs and where the tuned model is deployed.
Optional: expand the Advanced Options drop down arrow and configure the following:
- In the Number of epochs field, enter the number of steps to run for model tuning.
- In the Adapter Size field, enter the adapter size to use for model tuning.
- In the Learning rate multiplier field, enter a Enter the step size at each iteration. The default value is 1.
Click Continue.
The Tuning dataset page opens.
To upload a dataset file, select one of the following:
- If you haven't uploaded a dataset yet, select the radio button for Upload file to Cloud Storage.
- In the Select JSONL file field, click Browse and select your dataset file.
- In the Dataset location field, click Browse and select the Cloud Storage bucket where you want to store your dataset file.
- If your dataset file is already in a Cloud Storage bucket, select the radio button for Existing file on Cloud Storage.
- In Cloud Storage file path field, click Browse and select the Cloud Storage bucket where your dataset file is located.
(Optional) To get validation metrics during training, click the Enable model validation toggle.
- In the Validation dataset file, enter the Cloud Storage path of your validation dataset.
Click Start Tuning.
Your new model appears under the Gemini Pro tuned models section on the Tune and Distill page. When the model is finished tuning, the Status says Succeeded.
Colab Enterprise
You can create a model tuning job in Vertex AI by using the side panel in Colab Enterprise. The side panel adds the relevant code snippets to your notebook. Then, you modify the code snippets and run them to create your tuning job. To learn more about using the side panel with your Vertex AI tuning jobs, see Interact with Vertex AI to tune a model.
-
In the Google Cloud console, go to the Colab Enterprise Notebooks page.
-
In the Region menu, select the region that contains your notebook.
-
On the My notebooks tab, click the notebook that you want to open. If you haven't created a notebook yet, create a notebook.
-
To the right of your notebook, in the side panel, click the Tuning button.
The side panel expands the Tuning tab.
-
Click the Tune a Gemini model button.
Colab Enterprise adds code cells to your notebook for tuning a Gemini model.
-
In your notebook, find the code cell that stores parameter values. You'll use these parameters to interact with Vertex AI.
-
Update the values for the following parameters:
-
PROJECT_ID
: The ID of the project that your notebook is in. -
REGION
: The region that your notebook is in. -
TUNED_MODEL_DISPLAY_NAME
: The name of your tuned model.
-
-
In the next code cell, update the model tuning parameters:
-
source_model
: The Gemini model that you want to use, for example,gemini-1.0-pro-002
. -
train_dataset
: The URL of your training dataset. -
validation_dataset
: The URL of your validation dataset. - Adjust the remaining parameters as needed.
-
-
Run the code cells that the side panel added to your notebook.
-
After the last code cell runs, click the View tuning job button that appears.
-
The side panel shows information about your model tuning job.
- The Monitor tab shows tuning metrics when the metrics are ready.
- The Dataset tab shows a summary and metrics about your dataset after the dataset has been processed.
- The Details tab shows information about your tuning job, such as the tuning method and the base model (source model) that you used.
-
After the tuning job has completed, you can go directly from the Tuning details tab to a page where you can test your model. Click Test.
The Google Cloud console opens to the Vertex AI Text chat page, where you can test your model.
Tuning Hyperparameters
It's recommended to submit your first tuning job without changing the hyperparameters. The default value is the recommended value based on our benchmarking results to yield the best model output quality.
- Epochs: The number of complete passes the model makes over the entire training dataset during training. Vertex AI automatically adjusts the default value to your training dataset size. This value is based on benchmarking results to optimize model output quality.
- Adapter size: The Adapter size to use for the tuning job. The adapter size influences the number of trainable parameters for the tuning job. A larger adapter size implies that the model can learn more complex tasks, but it requires a larger training dataset and longer training times.
- Learning Rate Multiplier: A multiplier to apply to the recommended learning rate. You can increase the value to converge faster, or decrease the value to avoid overfitting.
View a list of tuning jobs
You can view a list of tuning jobs in your current project by using the Google Cloud console,
the Vertex AI SDK for Python, or by sending a GET request by using the tuningJobs
method.
REST
To view a list of model tuning jobs, send a GET request by using the
tuningJobs.list
method.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
HTTP method and URL:
GET https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Python
Console
To view your tuning jobs in the Google Cloud console, go to the Vertex AI Studio page.
Your Gemini tuning jobs are listed in the table under the Gemini Pro tuned models section.
Get details of a tuning job
You can get the details of a tuning job in your current project
by using the Google Cloud console, the Vertex AI SDK for Python, or by sending a GET
request by using the tuningJobs
method.
REST
To view a list of model tuning jobs, send a GET request by using the
tuningJobs.get
method and specify the TuningJob_ID
.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
- TUNING_JOB_ID: The ID of the tuning job.
HTTP method and URL:
GET https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Python
Console
To view details of a tuned model in the Google Cloud console, go to the Vertex AI Studio page.
In the Gemini Pro tuned models table, find your model and click Details.
The details of your model are shown.
Cancel a tuning job
You can cancel a tuning job in your current project by using the Google Cloud console,
the Vertex AI SDK for Python, or by sending a POST request using the tuningJobs
method.
REST
To view a list of model tuning jobs, send a GET request by using the
tuningJobs.cancel
method and specify the TuningJob_ID
.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
- TUNING_JOB_ID: The ID of the tuning job.
HTTP method and URL:
POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel
To send your request, choose one of these options:
curl
Execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d "" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Python
Console
To cancel a tuning job in the Google Cloud console, go to the Vertex AI Studio page.
In the Gemini Pro tuned models table, click
Manage run.Click Cancel.
Test the tuned model with a prompt
You can test a tuning job in your current project by using the
Vertex AI SDK for Python, or by sending a POST request using the tuningJobs
method.
The following example prompts a model with the question "Why is sky blue?".
REST
To test a tuned model with a prompt, send a POST request and
specify the TUNED_ENDPOINT_ID
.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
- ENDPOINT_ID: The tuned model endpoint ID from the GET API.
- TEMPERATURE:
The temperature is used for sampling during response generation, which occurs when
topP
andtopK
are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of0
means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.
- TOP_P:
Top-P changes how the model selects tokens for output. Tokens are selected
from the most (see top-K) to least probable until the sum of their probabilities
equals the top-P value. For example, if tokens A, B, and C have a probability of
0.3, 0.2, and 0.1 and the top-P value is
0.5
, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.Specify a lower value for less random responses and a higher value for more random responses.
- TOP_K:
Top-K changes how the model selects tokens for output. A top-K of
1
means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of3
means that the next token is selected from among the three most probable tokens by using temperature.For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.
Specify a lower value for less random responses and a higher value for more random responses.
- MAX_OUTPUT_TOKENS:
Maximum number of tokens that can be generated in the response. A token is
approximately four characters. 100 tokens correspond to roughly 60-80 words.
Specify a lower value for shorter responses and a higher value for potentially longer responses.
HTTP method and URL:
POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent
Request JSON body:
{ "contents": [ { "role": "USER", "parts": { "text" : "Why is sky blue?" } } ], "generation_config": { "temperature":TEMPERATURE, "topP": TOP_P, "topK": TOP_K, "maxOutputTokens": MAX_OUTPUT_TOKENS } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Python
from vertexai.generative_models import GenerativeModel
sft_tuning_job = sft.SupervisedTuningJob("projects/<PROJECT_ID>/locations/<TUNING_JOB_REGION>/tuningJobs/<TUNING_JOB_ID>")
tuned_model = GenerativeModel(sft_tuning_job.tuned_model_endpoint_name)
print(tuned_model.generate_content(content))
Console
To view details of a tuned model in the Google Cloud console, go to the Vertex AI Studio page.
In the Gemini Pro tuned models table, select Test.
It opens up a page where you can create a conversation with your tuned model.
Tuning and validation metrics
You can configure a model tuning job to collect and report model tuning and model evaluation metrics, which can then be visualized in Vertex AI Studio.
To view details of a tuned model in the Google Cloud console, go to the Vertex AI Studio page.
In the Tune and Distill table, click the name of the tuned model that you want to view metrics for.
The tuning metrics appear under the Monitor tab.
Model tuning metrics
The model tuning job automatically collects the following tuning metrics
for gemini-1.5-pro-002
.
/train_total_loss
: Loss for the tuning dataset at a training step./train_fraction_of_correct_next_step_preds
: The token accuracy at a training step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the tuning dataset./train_num_predictions:
Number of predicted tokens at a training step.
Model validation metrics:
You can configure a model tuning job to collect the following validation metrics
for gemini-1.5-pro-002
.
/eval_total_loss
: Loss for the validation dataset at a validation step./eval_fraction_of_correct_next_step_preds
: The token accuracy at an validation step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the validation dataset./eval_num_predictions
: Number of predicted tokens at a validation step.
The metrics visualizations are available after the tuning job starts running. It will be updated in real time as tuning progresses. If you don't specify a validation dataset when you create the tuning job, only the visualizations for the tuning metrics are available.
What's next
- To learn how supervised fine-tuning can be used in a solution that builds a generative AI knowledge base, see Jump Start Solution: Generative AI knowledge base.