Supervised tuning uses labeled examples to tune a model. Each example demonstrates output you want from your text model during inference. Supervised tuning is a good option when the output of your model isn't very complex and is easy to define.
- If the output from your model is difficult to define, consider tuning your text model by using Reinforcement learning from human feedback (RLHF) tuning.
- To learn about tuning a code model with supervised tuning, see Tune code models.
Text model supervised tuning step-by-step guidance
The following guided tutorial helps you learn how to use supervised tuning to tune a text foundation model in the Google Cloud console.
To follow step-by-step guidance for this task directly in the Google Cloud console, click Guide me:
Workflow for supervised model tuning
The supervised model tuning workflow on Vertex AI includes the following steps:
- Prepare your model tuning dataset.
- Upload the model tuning dataset to a Cloud Storage bucket.
- Create a supervised model tuning job.
After model tuning completes, the tuned model is deployed to a Vertex AI endpoint. The name of the endpoint is the same as the name of the tuned model. Tuned models are available to select in Vertex AI Studio when you want to create a new prompt.
Supported models
The following text foundation models support supervised tuning:
- Text generation -
text-bison@002
andtext-bison-32k@002
- Text chat -
chat-bison@002
andchat-bison-32k@002
- Code generation -
code-bison@002
andtext-bison-32k@002
- Code chat -
codechat-bison@002
andcodechat-bison-32k@002
- Text embeddings -
textembedding-gecko@001
(Preview)
Use cases for using supervised tuning on text models
Foundation text models work well when the expected output or task can be clearly and concisely defined in a prompt and the prompt consistently produces the expected output. If you want a model to learn something niche or specific that deviates from general language patterns, then you might want to consider tuning that model. For example, you can use model tuning to teach the model the following:
- Specific structures or formats for generating output.
- Specific behaviors such as when to provide a terse or verbose output.
- Specific customized outputs for specific types of inputs.
The following examples are use cases that are difficult to capture with only prompt instructions:
Classification: The expected response is a specific word or phrase.
Tuning the model can help prevent the model from generating verbose responses.
Summarization: The summary follows a specific format. For example, you might need to remove personally identifiable information (PII) in a chat summary.
This formatting of replacing the names of the speakers with
#Person1
and#Person2
is difficult to describe and the foundation model might not naturally produce such a response.Extractive question answering: The question is about a context and the answer is a substring of the context.
The response "Last Glacial Maximum" is a specific phrase from the context.
Chat: You need to customize model response to follow a persona, role, or character.
You can also tune a model in the following situations:
- Prompts are not producing the expected results consistently enough.
- The task is too complicated to define in a prompt. For example, you want the model to do behavior cloning for a behavior that's hard to articulate in a prompt.
- You have complex intuitions about a task that are easy to elicit but difficult to formalize in a prompt.
- You want to reduce the context length by removing the few-shot examples.
Prepare a supervised tuning dataset
The dataset used to tune a foundation model needs to include examples that align with the task that you want the model to perform. Structure your training dataset in a text-to-text format. Each record, or row, in the dataset contains the input text (also referred to as the prompt) which is paired with its expected output from the model. Supervised tuning uses the dataset to teach the model to mimic a behavior, or task, you need by giving it hundreds of examples that illustrate that behavior.
Your dataset must include a minimum of 10 examples, but we recommend at least 100 to 500 examples for good results. The more examples you provide in your dataset, the better the results.
For sample datasets, see Sample datasets on this page.
Dataset format
Your model tuning dataset must be in JSON Lines (JSONL) format, where each line contains a single tuning example. The dataset format used to tune a text generation model is different from the dataset format for tuning a text chat model. Before tuning your model, you must upload your dataset to a Cloud Storage bucket.
Text
Each example is composed of an input_text
field that contains the prompt to
the model and an output_text
field that contains an example response that
the tuned model is expected to produce. Additional fields from structured
prompts, such as context
, are ignored.
The maximum token length for input_text
is 8,192 and the maximum token
length for output_text
is 1,024. If either field exceeds the maximum token
length, the excess tokens are truncated.
The maximum number of examples that a dataset for a text generation model can contain is 10,000.
Dataset example
{"input_text": "question: How many people live in Beijing? context: With over 21 million residents, Beijing is the world's most populous national capital city and is China's second largest city after Shanghai. It is located in Northern China, and is governed as a municipality under the direct administration of the State Council with 16 urban, suburban, and rural districts.[14] Beijing is mostly surrounded by Hebei Province with the exception of neighboring Tianjin to the southeast; together, the three divisions form the Jingjinji megalopolis and the national capital region of China.", "output_text": "over 21 million people"}
{"input_text": "question: How many parishes are there in Louisiana? context: The U.S. state of Louisiana is divided into 64 parishes (French: paroisses) in the same manner that 48 other states of the United States are divided into counties, and Alaska is divided into boroughs.", "output_text": "64"}
Include instructions in examples
For tasks such as classification, it is possible to create a dataset of examples that don't contain instructions. However, excluding instructions from the examples in the dataset leads to worse performance after tuning than including instructions, especially for smaller datasets.
Excludes instructions:
{"input_text": "5 stocks to buy now", "output_text": "business"}
Includes instructions:
{"input_text": "Classify the following text into one of the following classes: [business, entertainment] Text: 5 stocks to buy now", "output_text": "business"}
Chat
Each conversation example in a chat tuning dataset is composed of a messages
field (required) and a context
field (optional).
The messages
field consists of an array of author-content pairs. The
author
field refers to the author of the message and is set to either user
or assistant
in an alternating manner. The content
field is the content of
the message. Each conversation example should have two to three user-assistant
message pairs, which represent a message from the user and a response from the
model.
The context
field lets you
specify a context for the chat. If you specify a context for an example, it
will override the value provided in default_context
.
For each conversation example, the maximum token length for context
and
messages
combined is 8,192 tokens. Additionally, each content
field for
assistant
shouldn't exceed 1,024 tokens.
The maximum number of author
fields that the examples in the dataset for a
text chat model can contain is 10,000. This maximum is for the sum of all
author
fields in all messages
in all the examples.
Example
{
"context": "You are a pirate dog named Captain Barktholomew.",
"messages": [
{
"author": "user",
"content": "Hi"
},
{
"author": "assistant",
"content": "Argh! What brings ye to my ship?"
},
{
"author": "user",
"content": "What's your name?"
},
{
"author": "assistant",
"content": "I be Captain Barktholomew, the most feared pirate dog of the seven seas."
}
]
}
Sample datasets
You can use a sample dataset to get started with tuning the
text-bison@002
model. The following is a classification task dataset
that contains sample medical transcriptions for various medical specialties. The
data is from mtsamples.com as made
available on
Kaggle.
Sample tuning dataset URI:
gs://cloud-samples-data/vertex-ai/model-evaluation/peft_train_sample.jsonl
Sample eval dataset URI:
gs://cloud-samples-data/vertex-ai/model-evaluation/peft_eval_sample.jsonl
To use these datasets, specify the URIs in the applicable parameters when creating a text model supervised tuning job.
For example:
...
"dataset_uri": "gs://cloud-samples-data/vertex-ai/model-evaluation/peft_train_sample.jsonl",
...
"evaluation_data_uri": "gs://cloud-samples-data/vertex-ai/model-evaluation/peft_eval_sample.jsonl",
...
Maintain consistency with production data
The examples in your datasets should match your expected production traffic. If your dataset contains specific formatting, keywords, instructions, or information, the production data should be formatted in the same way and contain the same instructions.
For example, if the examples in your dataset include a "question:"
and a
"context:"
, production traffic should also be formatted to include a
"question:"
and a "context:"
in the same order as it appears in the dataset
examples. If you exclude the context, the model will not recognize the pattern,
even if the exact question was in an example in the dataset.
Upload tuning datasets to Cloud Storage
To run a tuning job, you need to upload one or more datasets to a Cloud Storage bucket. You can either create a new Cloud Storage bucket or use an existing one to store dataset files. The region of the bucket doesn't matter, but we recommend that you use a bucket that's in the same Google Cloud project where you plan to tune your model.
After your bucket is ready, upload your dataset file to the bucket.
Supervised tuning region settings
You can specify three Google Cloud region settings when you configure a supervised tuning job. One region is where the pipeline that tunes your model runs. The other region is where the model tuning job runs and the tuned model is uploaded.
Pipeline job region
The pipeline job region is the region where the pipeline job runs. If the optional model upload region isn't specified, then the model is uploaded and deployed to the pipeline job region. Intermediate data, such as the transformed dataset, is stored in the pipeline job region. To learn which regions you can use for the pipeline job region, see Supported pipeline job and model upload regions. You must specify the pipeline job region using one of the following methods:
If you use the Vertex AI SDK, you can specify the region where the pipeline job runs using the
tuning_job_location
parameter on thetune_model
method of the object that represents the model you're tuning (for example, theTextGenerationModel.tune_model
method).If you create a supervised tuning job by sending a POST request using the
pipelineJobs.create
method, then you use the URL to specify the region where the pipeline job runs. In the following URL, replacing both instances ofPIPELINE_JOB_REGION
with the region where the pipeline runs:https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs
If you use the Google Cloud console to create a supervised model tuning job, then you specify the pipeline job region in the Region control when you create your tuning job. In the Google Cloud console, the Region control specifies both the pipeline job region and the model upload region. When you use the Google Cloud console to create a supervised model tuning job, both regions are always the same.
Model upload region
You use the optional tuned_model_location
parameter to specify where your
tuned model is uploaded. If the model upload region isn't specified, then the
tuned model is uploaded to the pipeline job region.You can
use one of the
Supported pipeline job and model upload regions
for your model upload region. You can specify the model upload region using one
of the following methods:
If you use the Vertex AI SDK, the
tuned_model_location
parameter is specified on thetune_model
method of the object that represents the model you're tuning (for example, theTextGenerationModel.tune_model
method).If you create a supervised model tuning job by sending a POST request using the
pipelineJobs
method, then you can use thelocation
parameter to specify the model upload region.If you use the Google Cloud console to create a supervised model tuning job, then you specify the model upload region in the Region control when you create your tuning job. In the Google Cloud console, the Region control specifies both the model upload region and the pipeline job region. When you use the Google Cloud console to create a supervised model tuning job, both regions are always the same.
Model tuning region
The model tuning region is where the model tuning computation occurs. This
region is determined by the accelerator type you choose. If you specify TPU
for your accelerator type, then your model tuning computation happens in
europe-west4
. If you specify GPU
for your accelerator type, then model
tuning happens in us-central1
.
Supported pipeline job and model upload regions
You can use one of the following regions to specify the model upload region and to specify the pipeline job region:
us-central1
europe-west4
asia-southeast1
us-west1
europe-west3
europe-west2
asia-northeast1
us-east4
us-west4
northamerica-northeast1
europe-west9
europe-west1
asia-northeast3
Create a text model supervised tuning job
You can create a supervised text model tuning job by using the Google Cloud console, API, or the Vertex AI SDK for Python. For guidance on model tuning configurations, see Recommended configurations.
REST
To create a model tuning job, send a POST request by using the
pipelineJobs
method.
Note that some of the parameters are not supported by all of the models. Ensure
that you only include the applicable parameters for the model that you're
tuning.
Before using any of the request data, make the following replacements:
- PIPELINEJOB_DISPLAYNAME: A display name for the pipelineJob.
- OUTPUT_DIR: The URI of the bucket to output pipeline artifacts to.
- PROJECT_ID: Your project ID.
- MODEL_DISPLAYNAME: A display name for the model uploaded (created) by the pipelineJob.
- DATASET_URI: URI of your dataset file.
- PIPELINE_JOB_REGION:
The region where the pipeline tuning job runs. This is also the default region for where the tuned
model is uploaded. If you want to upload your model to a different region, then use the
location
parameter to specify the tuned model upload region. For more information, see Model upload region. - MODEL_UPLOAD_REGION: (optional) The region where the tuned model is uploaded. If you don't specify a model upload region, then the tuned model uploads to the same region where the pipeline job runs. For more information, see Model upload region.
- ACCELERATOR_TYPE:
(optional, default
GPU
) The type of accelerator to use for model tuning. The valid options are:GPU
: Uses eight A100 80 GB GPUs for tuning. Make sure you have enough quota. If you chooseGPU
, then VPC‑SC is supported. CMEK is supported if the tuning location and model upload location areus-centra1
. For more information, see Supervised tuning region settings. If you chooseGPU
, then your model tuning computations happen in theus-central1
region.TPU
: Uses 64 cores of the TPU v3 pod for tuning. Make sure you have enough quota. CMEK isn't supported, but VPC‑SC is supported. If you chooseTPU
, then your model tuning computations happen in theeurope-west4
region.
- LARGE_MODEL_REFERENCE: Name of the
foundation model to tune. The options are:
text-bison@002
chat-bison@002
- DEFAULT_CONTEXT (chat only): The
context that applies to all tuning examples in the tuning dataset. Setting the
context
field in an example overrides the default context. - STEPS:
The
number of steps to run for model tuning. The default value is 300. The batch
size varies by tuning location and model size. For 8k models, such as
text-bison@002
,chat-bison@002
,code-bison@002
, andcodechat-bison@002
:us-central1
has a batch size of 8.europe-west4
has a batch size of 24.
text-bison-32k
,chat-bison-32k
,code-bison-32k
, andcodechat-bison-32k
:us-central1
has a batch size of 8.europe-west4
has a batch size of 8.
For example, if you're training
text-bison@002
ineurope-west4
, there are 240 examples in a training dataset, and you setsteps
to 20, then the number of training examples is the product of 20 steps and the batch size of 24, or 480 training steps. In this case, there are two epochs in the training process because it goes through the examples two times. Inus-central1
, if there are 240 examples in a training dataset and you setsteps
to 15, then the number of training examples is the product of 15 steps and the batch size of 8, or 120 training steps. In this case, there are 0.5 epochs because there are half as many training steps as there are examples. - LEARNING_RATE_MULTIPLIER: A
multiplier to apply to the recommended learning rate. To use the recommended learning rate,
use
1.0
. - EVAL_DATASET_URI (text only):
(optional) The URI of the JSONL file that contains the evaluation dataset for batch prediction and
evaluation. Evaluation isn't supported for
chat-bison
. For more information, see Dataset format for tuning a code model. The evaluation dataset requires between ten and 250 examples. - EVAL_INTERVAL (text only):
(optional, default
20
) The number of tuning steps between each evaluation. An evaluation interval isn't supported for chat models. Because the evaluation runs on the entire evaluation dataset, a smaller evaluation interval results in a longer tuning time. For example, ifsteps
is 200 andEVAL_INTERVAL
is 100, then you will get only two data points for the evaluation metrics. This parameter requires that theevaluation_data_uri
is set. - ENABLE_EARLY_STOPPING (text only):
(optional, default
true
) Aboolean
that, if set totrue
, stops tuning before completing all the tuning steps if model performance, as measured by the accuracy of predicted tokens, does not improve enough between evaluations runs. Iffalse
, tuning continues until all the tuning steps are complete. This parameter requires that theevaluation_data_uri
is set. Enable early stopping isn't supported for chat models. - TENSORBOARD_RESOURCE_ID: (optional) The ID of a Vertex AI TensorBoard instance. The Vertex AI TensorBoard instance is used to create an experiment after the tuning job completes. The Vertex AI TensorBoard instance needs to be in the same region as the tuning pipeline.
- ENCRYPTION_KEY_NAME:
(optional) The fully qualified name of a customer-managed encryption key (CMEK) that you want to use
for data encryption. A CMEK is available only in
us-central1
. If you useus-central1
and don't specify a CMEK, then a Google-owned and Google-managed encryption key is used. A Google-owned and Google-managed encryption key is used by default in all other available regions. For more information, see CMEK overview. - TEMPLATE_URI: The tuning template to use depends
on the model that you're tuning:
- Text model:
https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0
- Chat model:
https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-chat-model/v3.0.0
- Text model:
- SERVICE_ACCOUNT: (optional) The service
account that Vertex AI uses to run your pipeline job. By default, your project's
Compute Engine default service account (
PROJECT_NUMBER‑compute@developer.gserviceaccount.com
) is used. Learn more about attaching a custom service account.
HTTP method and URL:
POST https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs
Request JSON body:
{ "displayName": "PIPELINEJOB_DISPLAYNAME", "runtimeConfig": { "gcsOutputDirectory": "gs://OUTPUT_DIR", "parameterValues": { "project": "PROJECT_ID", "model_display_name": "MODEL_DISPLAYNAME", "dataset_uri": "gs://DATASET_URI", "location": "MODEL_UPLOAD_REGION", "accelerator_type": "ACCELERATOR_TYPE", "large_model_reference": "LARGE_MODEL_REFERENCE", "default_context": "DEFAULT_CONTEXT (chat only)", "train_steps": STEPS, "learning_rate_multiplier": LEARNING_RATE_MULTIPLIER, "evaluation_data_uri": "gs://EVAL_DATASET_URI (text only)", "evaluation_interval": EVAL_INTERVAL (text only), "enable_early_stopping": ENABLE_EARLY_STOPPING (text only), "enable_checkpoint_selection": "ENABLE_CHECKPOINT_SELECTION (text only)", "tensorboard_resource_id": "TENSORBOARD_ID", "encryption_spec_key_name": "ENCRYPTION_KEY_NAME" } }, "encryptionSpec": { "kmsKeyName": "ENCRYPTION_KEY_NAME" }, "serviceAccount": "SERVICE_ACCOUNT", "templateUri": "TEMPLATE_URI" }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following. Note that pipelineSpec
has been truncated to save space.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Console
To tune a text model with supervised tuning by using the Google Cloud console, perform the following steps:
- In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.
- Click the Tune and distill tab.
- Click Create tuned model.
- Click Supervised tuning.
- Configure model details:
- Tuned model name: Enter a name for your tuned model.
- Base model: Select the model that you want to tune.
- Region: Select the region where the pipeline tuning job runs and where the tuned model is deployed.
- Output directory: Enter the Cloud Storage location where artifacts are stored when your model is tuned.
- Expand Advanced Options to configure advanced settings.
- Train steps:
Enter the
number of steps to run for model tuning. The default value is 300. The batch
size varies by tuning location and model size. For 8k models, such as
text-bison@002
,chat-bison@002
,code-bison@002
, andcodechat-bison@002
:us-central1
has a batch size of 8.europe-west4
has a batch size of 24.
text-bison-32k
,chat-bison-32k
,code-bison-32k
, andcodechat-bison-32k
:us-central1
has a batch size of 8.europe-west4
has a batch size of 8.
For example, if you're training
text-bison@002
ineurope-west4
, there are 240 examples in a training dataset, and you setsteps
to 20, then the number of training examples is the product of 20 steps and the batch size of 24, or 480 training steps. In this case, there are two epochs in the training process because it goes through the examples two times. Inus-central1
, if there are 240 examples in a training dataset and you setsteps
to 15, then the number of training examples is the product of 15 steps and the batch size of 8, or 120 training steps. In this case, there are 0.5 epochs because there are half as many training steps as there are examples. - Learning rate multiplier: Enter the step size at each iteration. The default value is 1.
- Accelerator type:
(optional) Enter the
type of accelerator to use for model tuning. The valid options are:
GPU
: Uses eight A100 80 GB GPUs for tuning. Make sure you have enough quota. If you chooseGPU
, then VPC‑SC is supported. CMEK is supported if the tuning location and model upload location areus-centra1
. For more information, see Supervised tuning region settings. If you chooseGPU
, then your model tuning computations happen in theus-central1
region.TPU
: Uses 64 cores of the TPU v3 pod for tuning. Make sure you have enough quota. CMEK isn't supported, but VPC‑SC is supported. If you chooseTPU
, then your model tuning computations happen in theeurope-west4
region.
- Add a TensorBoard instance: (optional) The ID of a Vertex AI TensorBoard instance. The Vertex AI TensorBoard instance is used to create an experiment after the tuning job completes. The Vertex AI TensorBoard instance needs to be in the same region as the tuning pipeline.
- Encryption
(optional) Choose to use a Google-owned and Google-managed encryption key or a customer-managed encryption key
(CMEK). A CMEK is available for encryption only in the
us-central1
region. In all other available regions, a Google-owned and Google-managed encryption key is used. For more information, see CMEK overview. - Service account (optional) Choose a a user-managed service account. A service account determines which Google Cloud resources your service code can access. If you don't choose a service account, then a service agent is used that includes permissions appropriate for most models.
- Train steps:
Enter the
number of steps to run for model tuning. The default value is 300. The batch
size varies by tuning location and model size. For 8k models, such as
- Click Continue
- If you want to upload your dataset file, select
Upload a JSONL file
- In Select JSONL file, click Browse and select your dataset file.
- In Dataset location, click Browse and select the Cloud Storage bucket where you want to store your dataset file.
Use an existing JSONL file
In Cloud Storage file path, click Browse and select the Cloud Storage bucket where your dataset file is located.
Upload JSONL file to Cloud Storage. If your
dataset file is already in a Cloud Storage bucket, select
Existing JSONL file on Cloud Storage.
- (Optional) To evaluate your tuned model, select Enable model
evaluation and configure your model evaluation:
- Evaluation dataset:
(optional) The URI of the JSONL file that contains the evaluation dataset for batch prediction and
evaluation. Evaluation isn't supported for
chat-bison
. For more information, see Dataset format for tuning a code model. The evaluation dataset requires between ten and 250 examples. - Evaluation interval:
(optional, default
20
) The number of tuning steps between each evaluation. An evaluation interval isn't supported for chat models. Because the evaluation runs on the entire evaluation dataset, a smaller evaluation interval results in a longer tuning time. For example, ifsteps
is 200 andEVAL_INTERVAL
is 100, then you will get only two data points for the evaluation metrics. This parameter requires that theevaluation_data_uri
is set. - Enable early stopping:
(optional, default
true
) Aboolean
that, if set totrue
, stops tuning before completing all the tuning steps if model performance, as measured by the accuracy of predicted tokens, does not improve enough between evaluations runs. Iffalse
, tuning continues until all the tuning steps are complete. This parameter requires that theevaluation_data_uri
is set. Enable early stopping isn't supported for chat models. - Enable checkpoint selection: When enabled, Vertex AI selects and returns the checkpoint with the best model evaluation performance from all checkpoints created during the tuning job. When disabled, the final checkpoint created during the tuning job is returned. Each checkpoint refers to a snapshot of the model during a tuning job.
- TensorBoard instance: (optional) The ID of a Vertex AI TensorBoard instance. The Vertex AI TensorBoard instance is used to create an experiment after the tuning job completes. The Vertex AI TensorBoard instance needs to be in the same region as the tuning pipeline.
- Evaluation dataset:
(optional) The URI of the JSONL file that contains the evaluation dataset for batch prediction and
evaluation. Evaluation isn't supported for
- Click Start tuning.
Example curl command
PROJECT_ID=myproject
DATASET_URI=gs://my-gcs-bucket-uri/dataset
OUTPUT_DIR=gs://my-gcs-bucket-uri/output
ACCELERATOR_TYPE=GPU
LOCATION=us-central1
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
"https://europe-west4-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/europe-west4/pipelineJobs?pipelineJobId=tune-large-model-$(date +%Y%m%d%H%M%S)" -d \
$'{
"displayName": "tune-llm",
"runtimeConfig": {
"gcsOutputDirectory": "'${OUTPUT_DIR}'",
"parameterValues": {
"project": "'${PROJECT_ID}'",
"model_display_name": "The display name for your model in the UI",
"dataset_uri": "'${DATASET_URI}'",
"location": "'${LOCATION}'",
"accelerator_type:": "'${ACCELERATOR_TYPE}'",
"large_model_reference": "text-bison@002",
"train_steps": 300,
"learning_rate_multiplier": 1,
"encryption_spec_key_name": "projects/myproject/locations/us-central1/keyRings/sample-key/cryptoKeys/sample-key"
}
},
"encryptionSpec": {
"kmsKeyName": "projects/myproject/locations/us-central1/keyRings/sample-key/cryptoKeys/sample-key"
},
"templateUri": "https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0"
}'
Recommended configurations
The following table shows the recommended configurations for tuning a foundation model by task:
Task | No. of examples in dataset | Train steps |
---|---|---|
Classification | 100+ | 100-500 |
Summarization | 100-500+ | 200-1000 |
Extractive QA | 100+ | 100-500 |
Chat | 200+ | 1,000 |
For train steps, you can try more than one value to get the best performance on a particular dataset, for example, 100, 200, 500.
View a list of tuned models
You can view a list of models in your current project, including your tuned models, by using the Google Cloud console or the Vertex AI SDK for Python.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Console
To view your tuned models in the Google Cloud console, go to the Vertex AI Model Registry page.
Load a tuned text model
The following sample code uses the Vertex AI SDK for Python to load a text generation model that was tuned using supervised tuning:
import vertexai from vertexai.preview.language_models import TextGenerationModel model = TextGenerationModel.get_tuned_model(TUNED_MODEL_NAME)
Replace TUNED_MODEL_NAME
with
the qualified resource name of your tuned model.
This name is in the format
projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID
.
You can find the model ID of your tuned model in
Vertex AI Model Registry.
Tuning and evaluation metrics
You can configure a model tuning job to collect and report model tuning and model evaluation metrics, which can then be visualized by using Vertex AI TensorBoard.
Model tuning metrics
You can configure a model tuning job to collect the following tuning metrics forchat-bison
, code-bison
, codechat-bison
, and text-bison
:
/train_total_loss
: Loss for the tuning dataset at a training step./train_fraction_of_correct_next_step_preds
: The token accuracy at a training step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the tuning dataset./train_num_predictions:
Number of predicted tokens at a training step.
Model evaluation metrics
You can configure a model tuning job to collect the following evaluation metrics
for code-bison
and text-bison
:
/eval_total_loss
: Loss for the evaluation dataset at an evaluation step./eval_fraction_of_correct_next_step_preds
: The token accuracy at an evaluation step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the evaluation dataset./eval_num_predictions
: Number of predicted tokens at an evaluation step.
The metrics visualizations are available after the model tuning job completes. If you specify only a Vertex AI TensorBoard instance ID and not an evaluation dataset when you create the tuning job, only the visualizations for the tuning metrics are available.
Troubleshooting
The following topics might help you solve issues with tuning a foundation text model using supervised tuning.
Attempting to tune a model returns a 500 error or Internal error encountered
If you encounter this 500 error when trying to tune a model, try this workaround:
Run the following cURL command to create an empty Vertex AI dataset. Ensure that you configure your project ID in the command.
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://europe-west4-aiplatform.googleapis.com/ui/projects/$PROJECT_ID/locations/europe-west4/datasets \
-d '{
"display_name": "test-name1",
"metadata_schema_uri": "gs://google-cloud-aiplatform/schema/dataset/metadata/image_1.0.0.yaml",
"saved_queries": [{"display_name": "saved_query_name", "problem_type": "IMAGE_CLASSIFICATION_MULTI_LABEL"}]
}'
After the command completes, wait five minutes and try model tuning again.
Error: Permission 'aiplatform.metadataStores.get' denied on resource '...europe-west4/metadataStores/default'.
Make sure that the Compute Engine API is enabled and that the default
Compute Engine service account
(PROJECT_NUM‑compute@developer.gserviceaccount.com
) is
granted the
aiplatform.admin
and the
storage.objectAdmin
roles.
To grant the aiplatform.admin
and the
storage.objectAdmin
roles to the Compute Engine service account, do the following:
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
If you prefer to use a terminal on your machine, install and configure the Google Cloud CLI.
Attach the
aiplatform.admin
role to your Compute Engine service account using thegcloud projects add-iam-policy-binding
command:Replace the following:
PROJECT_ID
with your Google Cloud project ID.PROJECT_NUM
with your Google Cloud project number.
gcloud projects add-iam-policy-binding PROJECT_ID --member serviceAccount:PROJECT_NUM-compute@developer.gserviceaccount.com --role roles/aiplatform.admin
Attach the
storage.objectAdmin
role to your Compute Engine service account using thegcloud projects add-iam-policy-binding
command:PROJECT_ID
with your Google Cloud project ID.PROJECT_NUM
with your Google Cloud project number.
gcloud projects add-iam-policy-binding PROJECT_ID --member serviceAccount:PROJECT_NUM-compute@developer.gserviceaccount.com --role roles/storage.objectAdmin
Error: Vertex AI Service Agent service-{project-number}@gcp-sa-aiplatform.iam.gserviceaccount.com does not have permission to access Artifact Registry repository projects/vertex-ai-restricted/locations/us/repositories/llm.
This permission error is due to a propagation delay. A subsequent retry should resolve this error.
What's next
- Learn how to evaluate a tuned model.
- Learn how to tune a foundation model using RLHF tuning.
- Learn how to tune a code model.