Tune a model

Use model tuning to improve a model's performance on specific tasks or desired behaviors by giving the model many examples illustrating that task or behavior. Tuning is required when you want the model to learn a specific behavior or niche. For more information, see Scenarios to use model tuning.

Prepare your model tuning dataset

Your dataset must include at least 10 examples that align with the task that you want the model to perform. For better results, we recommend at least 100 examples. For more information, see Prepare your model tuning dataset.

This example can be used as your sample dataset.

{"input_text": "question: How many people live in Beijing? context: With over 21 million residents, Beijing is the world's most populous national capital city and is China's second largest city after Shanghai. It is located in Northern China, and is governed as a municipality under the direct administration of the State Council with 16 urban, suburban, and rural districts.[14] Beijing is mostly surrounded by Hebei Province with the exception of neighboring Tianjin to the southeast; together, the three divisions form the Jingjinji megalopolis and the national capital region of China.", "output_text": "over 21 million people"}
{"input_text": "question: How many parishes are there in Louisiana? context: The U.S. state of Louisiana is divided into 64 parishes (French: paroisses) in the same manner that 48 other states of the United States are divided into counties, and Alaska is divided into boroughs.", "output_text": "64"}
{"input_text": "question: How many churches in Texas? context: In 2010, there were a number of religious congregations in the state of Texas.", "output_text": "27,848"}
{"input_text": "question: How many lakes in North Dakota? context: North Dakota has many lakes and rivers offering exciting action for walleye, northern pike, perch, bass, salmon, catfish and other game fish with seasons for most species open year-round.", "output_text": "400"}
{"input_text": "question: How many rivers in the United States? context: The United States of America has over 250,000 rivers, with a total of about 3,500,000 miles of rivers. The longest river in the USA is the Missouri River (it is a tributary of the Mississippi River and is 2,540 miles long), but the biggest in terms of water volume is the deeper Mississippi River", "output_text": "over 250,000"}
{"input_text": "question: How many mountains in Oregon? context: There are 4760 named mountain ranges in Oregon and approximately 3,764 mountains altogether.", "output_text": "3,764"}
{"input_text": "question: How many small businesses in the state of Vermont? context: Vermont has 78,883 small businesses, most of which are sole proprietors. Vermont small businesses employ 157,131 workers, which is 60.2% of the state's workforce. The top three industries for small business in Vermont are professional, scientific, and technical services; construction; and retail.", "output_text": "78,883"}
{"input_text": "question: How many states grow apples? context: 2,500 varieties of apples are grown in the United States. 7,500 varieties of apples are grown throughout the world. 100 varieties of apples are grown commercially in the United States. Apples are grown commercially in 36 states.", "output_text": "36"}
{"input_text": "question: How many states grow mangos? context: Because mangos need a tropical climate to flourish, only Florida, California, Hawaii, and Puerto Rico grow mangos.", "output_text": "4"}
{"input_text": "question: How often does it storm in Missouri? context: Thunderstorms normally occur between 40 and 50 days per year. During any year, there are usually a few of these thunderstorms that are severe, and produce large hail and damaging winds. Tornadoes have produced extensive damage and loss of life in the St. Louis area.", "output_text": "Thunderstorms normally occur between 40 and 50 days per year."}

Upload the dataset to a Cloud Storage bucket

Follow these steps to upload your dataset:

  1. Create a new Cloud Storage bucket, or use an existing bucket to store your dataset file. The region of the bucket doesn't matter, but we recommend that you use a bucket that's in the same Google Cloud project where you plan to run model tuning.

  2. After your bucket is ready, upload your dataset file to the bucket.

Create a model tuning job

Select a tab, and follow the instructions to run the sample.

REST

To create a model tuning job, send a POST request by using the pipelineJobs method.

Before using any of the request data, make the following replacements:

  • PIPELINEJOB_DISPLAYNAME: A display name for the pipelineJob.
  • OUTPUT_DIR: The URI of the bucket to output pipeline artifacts to.
  • PROJECT_ID: Your project ID.
  • MODEL_DISPLAYNAME: A display name for the model uploaded (created) by the pipelineJob.
  • DATASET_URI: URI of your dataset file.
  • PIPELINE_JOB_REGION: The region where the pipeline tuning job runs. This is also the default region for where the tuned model is uploaded. If you want to upload your model to a different region, then use the location parameter to specify the tuned model upload region. For more information, see Model upload region.
  • MODEL_UPLOAD_REGION: (optional) The region where the tuned model is uploaded. If you don't specify a model upload region, then the tuned model uploads to the same region where the pipeline job runs. For more information, see Model upload region.
  • ACCELERATOR_TYPE: (optional, default GPU) The type of accelerator to use for model tuning. The valid options are:
    • GPU: Uses eight A100 80 GB GPUs for tuning. Make sure you have enough quota. If you choose GPU, then VPC‑SC is supported. CMEK is supported if the tuning location and model upload location are us-centra1. For more information, see Supervised tuning region settings. If you choose GPU, then your model tuning computations happen in the us-central1 region.
    • TPU: Uses 64 cores of the TPU v3 pod for tuning. Make sure you have enough quota. CMEK isn't supported, but VPC‑SC is supported. If you choose TPU, then your model tuning computations happen in the europe-west4 region.
  • LARGE_MODEL_REFERENCE: Name of the foundation model to tune. The options are:
    • text-bison@002
    • chat-bison@002
  • DEFAULT_CONTEXT (chat only): The context that applies to all tuning examples in the tuning dataset. Setting the context field in an example overrides the default context.
  • STEPS: The number of steps to run for model tuning. The default value is 300. The batch size varies by tuning location and model size. For 8k models, such as text-bison@002, chat-bison@002, code-bison@002, and codechat-bison@002:
    • us-central1 has a batch size of 8.
    • europe-west4 has a batch size of 24.
    For 32k models, such as text-bison-32k, chat-bison-32k, code-bison-32k, and codechat-bison-32k:
    • us-central1 has a batch size of 8.
    • europe-west4 has a batch size of 8.

    For example, if you're training text-bison@002 in europe-west4, there are 240 examples in a training dataset, and you set steps to 20, then the number of training examples is the product of 20 steps and the batch size of 24, or 480 training steps. In this case, there are two epochs in the training process because it goes through the examples two times. In us-central1, if there are 240 examples in a training dataset and you set steps to 15, then the number of training examples is the product of 15 steps and the batch size of 8, or 120 training steps. In this case, there are 0.5 epochs because there are half as many training steps as there are examples.

  • LEARNING_RATE_MULTIPLIER: A multiplier to apply to the recommended learning rate. To use the recommended learning rate, use 1.0.
  • EVAL_DATASET_URI (text only): (optional) The URI of the JSONL file that contains the evaluation dataset for batch prediction and evaluation. Evaluation isn't supported for chat-bison. For more information, see Dataset format for tuning a code model. The evaluation dataset requires between ten and 250 examples.
  • EVAL_INTERVAL (text only): (optional, default 20) The number of tuning steps between each evaluation. An evaluation interval isn't supported for chat models. Because the evaluation runs on the entire evaluation dataset, a smaller evaluation interval results in a longer tuning time. For example, if steps is 200 and EVAL_INTERVAL is 100, then you will get only two data points for the evaluation metrics. This parameter requires that the evaluation_data_uri is set.
  • ENABLE_EARLY_STOPPING (text only): (optional, default true) A boolean that, if set to true, stops tuning before completing all the tuning steps if model performance, as measured by the accuracy of predicted tokens, does not improve enough between evaluations runs. If false, tuning continues until all the tuning steps are complete. This parameter requires that the evaluation_data_uri is set. Enable early stopping isn't supported for chat models.
  • TENSORBOARD_RESOURCE_ID: (optional) The ID of a Vertex AI TensorBoard instance. The Vertex AI TensorBoard instance is used to create an experiment after the tuning job completes. The Vertex AI TensorBoard instance needs to be in the same region as the tuning pipeline.
  • ENCRYPTION_KEY_NAME: (optional) The fully qualified name of a customer-managed encryption key (CMEK) that you want to use for data encryption. A CMEK is available only in us-central1. If you use us-central1 and don't specify a CMEK, then a Google-managed encryption key is used. A Google-managed encryption key is used by default in all other available regions. For more information, see CMEK overview.
  • TEMPLATE_URI: The tuning template to use depends on the model that you're tuning:
    • Text model: https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0
    • Chat model: https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-chat-model/v3.0.0
  • SERVICE_ACCOUNT: (optional) The service account that Vertex AI uses to run your pipeline job. By default, your project's Compute Engine default service account (PROJECT_NUMBER‑compute@developer.gserviceaccount.com) is used. Learn more about attaching a custom service account.

HTTP method and URL:

POST https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs

Request JSON body:

{
  "displayName": "PIPELINEJOB_DISPLAYNAME",
  "runtimeConfig": {
    "gcsOutputDirectory": "gs://OUTPUT_DIR",
    "parameterValues": {
      "project": "PROJECT_ID",
      "model_display_name": "MODEL_DISPLAYNAME",
      "dataset_uri": "gs://DATASET_URI",
      "location": "MODEL_UPLOAD_REGION",
      "accelerator_type": "ACCELERATOR_TYPE",
      "large_model_reference": "LARGE_MODEL_REFERENCE",
      "default_context": "DEFAULT_CONTEXT (chat only)",
      "train_steps": STEPS,
      "learning_rate_multiplier": LEARNING_RATE_MULTIPLIER,
      "evaluation_data_uri": "gs://EVAL_DATASET_URI (text only)",
      "evaluation_interval": EVAL_INTERVAL (text only),
      "enable_early_stopping": ENABLE_EARLY_STOPPING (text only),
      "enable_checkpoint_selection": "ENABLE_CHECKPOINT_SELECTION (text only)",
      "tensorboard_resource_id": "TENSORBOARD_ID",
      "encryption_spec_key_name": "ENCRYPTION_KEY_NAME"
    }
  },
  "encryptionSpec": {
    "kmsKeyName": "ENCRYPTION_KEY_NAME"
  },
  "serviceAccount": "SERVICE_ACCOUNT",
  "templateUri": "TEMPLATE_URI"
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following. Note that pipelineSpec has been truncated to save space.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

from __future__ import annotations


def tuning(
    project_id: str,
) -> None:

    import vertexai
    from vertexai.language_models import TextGenerationModel
    from google.auth import default

    credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])

    # TODO(developer): Update and un-comment below line
    # project_id = "PROJECT_ID"

    vertexai.init(project=project_id, location="us-central1", credentials=credentials)

    model = TextGenerationModel.from_pretrained("text-bison@002")

    tuning_job = model.tune_model(
        training_data="gs://cloud-samples-data/ai-platform/generative_ai/headline_classification.jsonl",
        tuning_job_location="europe-west4",
        tuned_model_location="us-central1",
    )

    print(tuning_job._status)

    return model

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const aiplatform = require('@google-cloud/aiplatform');
const {PipelineServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'europe-west4-aiplatform.googleapis.com',
};
const model = 'text-bison@001';

const pipelineClient = new PipelineServiceClient(clientOptions);

async function tuneLLM() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;

  const parameters = {
    train_steps: helpers.toValue(trainSteps),
    project: helpers.toValue(project),
    location: helpers.toValue('us-central1'),
    dataset_uri: helpers.toValue(datasetUri),
    large_model_reference: helpers.toValue(model),
    model_display_name: helpers.toValue(modelDisplayName),
    accelerator_type: helpers.toValue('GPU'), // Optional: GPU or TPU
  };

  const runtimeConfig = {
    gcsOutputDirectory,
    parameterValues: parameters,
  };

  const pipelineJob = {
    templateUri:
      'https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0',
    displayName: 'my-tuning-job',
    runtimeConfig,
  };

  const createPipelineRequest = {
    parent,
    pipelineJob,
    pipelineJobId,
  };
  await new Promise((resolve, reject) => {
    pipelineClient.createPipelineJob(createPipelineRequest).then(
      response => resolve(response),
      e => reject(e)
    );
  }).then(response => {
    const [result] = response;
    console.log('Tuning pipeline job:');
    console.log(`\tName: ${result.name}`);
    console.log(
      `\tCreate time: ${new Date(1970, 0, 1)
        .setSeconds(result.createTime.seconds)
        .toLocaleString()}`
    );
    console.log(`\tStatus: ${result.status}`);
  });
}

await tuneLLM();

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import com.google.cloud.aiplatform.v1.CreatePipelineJobRequest;
import com.google.cloud.aiplatform.v1.LocationName;
import com.google.cloud.aiplatform.v1.PipelineJob;
import com.google.cloud.aiplatform.v1.PipelineJob.RuntimeConfig;
import com.google.cloud.aiplatform.v1.PipelineServiceClient;
import com.google.cloud.aiplatform.v1.PipelineServiceSettings;
import com.google.protobuf.Value;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class CreatePipelineJobModelTuningSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "PROJECT";
    String location = "europe-west4"; // europe-west4 and us-central1 are the supported regions
    String pipelineJobDisplayName = "PIPELINE_JOB_DISPLAY_NAME";
    String modelDisplayName = "MODEL_DISPLAY_NAME";
    String outputDir = "OUTPUT_DIR";
    String datasetUri = "DATASET_URI";
    int trainingSteps = 300;

    createPipelineJobModelTuningSample(
        project,
        location,
        pipelineJobDisplayName,
        modelDisplayName,
        outputDir,
        datasetUri,
        trainingSteps);
  }

  // Create a model tuning job
  public static void createPipelineJobModelTuningSample(
      String project,
      String location,
      String pipelineJobDisplayName,
      String modelDisplayName,
      String outputDir,
      String datasetUri,
      int trainingSteps)
      throws IOException {
    final String endpoint = String.format("%s-aiplatform.googleapis.com:443", location);
    PipelineServiceSettings pipelineServiceSettings =
        PipelineServiceSettings.newBuilder().setEndpoint(endpoint).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (PipelineServiceClient client = PipelineServiceClient.create(pipelineServiceSettings)) {
      Map<String, Value> parameterValues = new HashMap<>();
      parameterValues.put("project", stringToValue(project));
      parameterValues.put("model_display_name", stringToValue(modelDisplayName));
      parameterValues.put("dataset_uri", stringToValue(datasetUri));
      parameterValues.put(
          "location",
          stringToValue(
              "us-central1")); // Deployment is only supported in us-central1 for Public Preview
      parameterValues.put("large_model_reference", stringToValue("text-bison@001"));
      parameterValues.put("train_steps", numberToValue(trainingSteps));
      parameterValues.put("accelerator_type", stringToValue("GPU")); // Optional: GPU or TPU

      RuntimeConfig runtimeConfig =
          RuntimeConfig.newBuilder()
              .setGcsOutputDirectory(outputDir)
              .putAllParameterValues(parameterValues)
              .build();

      PipelineJob pipelineJob =
          PipelineJob.newBuilder()
              .setTemplateUri(
                  "https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0")
              .setDisplayName(pipelineJobDisplayName)
              .setRuntimeConfig(runtimeConfig)
              .build();

      LocationName parent = LocationName.of(project, location);
      CreatePipelineJobRequest request =
          CreatePipelineJobRequest.newBuilder()
              .setParent(parent.toString())
              .setPipelineJob(pipelineJob)
              .build();

      PipelineJob response = client.createPipelineJob(request);
      System.out.format("response: %s\n", response);
      System.out.format("Name: %s\n", response.getName());
    }
  }

  static Value stringToValue(String str) {
    return Value.newBuilder().setStringValue(str).build();
  }

  static Value numberToValue(int n) {
    return Value.newBuilder().setNumberValue(n).build();
  }
}

Console

To create a model tuning job by using the Google Cloud console, perform the following steps:

  1. In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. Click the Tuning tab.
  3. Click Create tuned model.
  4. Upload or specify your tuning dataset as follows:
    • If you want to upload your dataset file, select  Upload JSONL file. If your dataset file is already in a Cloud Storage bucket, select  Existing Cloud Storage file.

      Upload JSONL file

      • In the Select JSONL file text box, click Browse and select your dataset file.
      • In the Cloud Storage location text box, click Browse and select the Cloud Storage bucket where you want to store your dataset file.

      Existing Cloud Storage file

      In the Cloud Storage path text box, click Browse, and select the Cloud Storage bucket where your dataset file is located.

  5. Click Continue
  6. Configure model details as follows:
    • Model name: Enter a name for your tuned model.
    • Base model: Select the foundation model that you want to tune.
    • Train steps: Enter the number of steps to run for model tuning. The batch size varies by tuning location:
      • us-central1 has a batch size of 8
      • europe-west4 has a batch size of 24

      If there are 240 examples in a training dataset, in europe-west4, it takes 240 / 24 = 10 steps to process the entire dataset once. In us-central1, it takes 240 / 8 = 30 steps to process the entire dataset once.

    • Learning rate multiplier: Enter a multiplier to apply to the recommended learning rate. To use the recommended learning rate, enter 1.0.
    • Working directory: Enter the URI of the bucket to store the tuned model.
    • Region: Enter the region to perform tuning. Supported regions are:
      • us-central1: Uses eight A100 80GB GPUs. Make sure you have enough quota.
      • europe-west4: Uses 64 cores of the TPU v3 pod. Make sure you have enough quota.
    • Click Start tuning.

Example curl command

PROJECT_ID=myproject
DATASET_URI=gs://my-gcs-bucket-uri/dataset
OUTPUT_DIR=gs://my-gcs-bucket-uri/output

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
"https://europe-west4-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/europe-west4/pipelineJobs?pipelineJobId=tune-large-model-$(date +%Y%m%d%H%M%S)" -d \
$'{
  "displayName": "tune-llm",
  "runtimeConfig": {
    "gcsOutputDirectory": "'${OUTPUT_DIR}'",
    "parameterValues": {
      "project": "'${PROJECT_ID}'",
      "model_display_name": "The display name for your model in the UI",
      "dataset_uri": "'${DATASET_URI}'",
      "location": "us-central1",
      "large_model_reference": "text-bison@001",
      "train_steps": 300,
      "learning_rate_multiplier": 1.0
    }
  },
  "templateUri": "https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0"
}'

What's next