Tune code models

Code models can be tuned by using supervised tuning. Supervised tuning uses labeled examples that demonstrate the type of output you'd like from your code generation or code chat model during inference. Code models don't support tuning by using Reinforcement learning from human feedback (RLHF) tuning.

Scenarios to use code model tuning

Tuning is required when you want a model to learn something niche or specific that deviates from general language and code patterns. The following are examples of what you can teach the code-bison and codechat-bison models:

  • How to generate code for custom libraries. By training a code model with labeled samples of a custom library, you can generate or chat about code that is specific to that custom library.
  • How to use your code base. By training a model with labeled samples of your code base, you can generate code or chat about code that uses unique qualities in your code base.
  • How to generate code using variants of a programming language. By training a code model with labeled samples of a language variant, you can generate or chat about code that uses that language variant's particular conventions and standards.

These scenarios include code requirements that are difficult to capture through prompt instructions alone. The following are some examples:

Code generation

  • This sample prompt and response helps tune code-bison to work with a specific dataset.

  • This sample prompt and response helps train code-bison create a product using Python.

Code chat

  • This sample prompt and response helps tune codechat-bison to help a user learn how to work with a specific dataset.

  • This sample prompt and response helps tune codechat-bison to help a user learn how to create a product using Python during a chat session.

Prepare your code model tuning dataset

The dataset used to tune a code model includes examples that align with the task that you want the model to perform. Your dataset must include a minimum of 10 examples, but we recommend at least 500 examples for good results. The more examples you give, the better the results.

Dataset format for tuning a code model

The tuning dataset for a code generation model and a code chat model must be in JSON Lines (JSONL) format. The following includes details about the dataset format and dataset examples for each model:

Code generation

Your code generation model tuning dataset must be in JSON Lines (JSONL) format where each line contains a single tuning example. Each example is composed of an input_text field that contains the prompt to the model and an output_text field that contains an example response that the tuned model is expected to produce.

The maximum number of examples that a dataset for a text generation model can contain is 10,000.

Examples of a code generation model tuning dataset

This topic shows you two examples of code generation model tuning datasets. The following is the first example of a code generation tuning dataset:

{"input_text": "How to create a product in Python with the Cloud Vision API?", "output_text": "Use this
Python code sample to learn how to create a
product with the Cloud Vision API:\n\n```python\n\r\n\r\n\r\n\r\n\r\nfrom google.cloud import
vision\r\nfrom google.protobuf import field_mask_pb2 as field_mask\r\n\r\ndef
create_product(\r\n project_id, location, product_id, product_display_name,\r\n
product_category):\r\n \"\"\"Create one product.\r\n Args:\r\n project_id: Id of
the project.\r\n location: A compute region name.\r\n product_id: Id of the
product.\r\n product_display_name: Display name of the product.\r\n
product_category: Category of the product.\r\n \"\"\"\r\n client =
vision.ProductSearchClient()\r\n\r\n # A resource that represents Google Cloud
Platform location.\r\n location_path =
f\"projects/{project_id}/locations/{location}\"\r\n\r\n # Create a product with
the product specification in the region.\r\n # Set product display name and
product category.\r\n product = vision.Product(\r\n
display_name=product_display_name,\r\n
product_category=product_category)\r\n\r\n # The response is the product with
the `name` field populated.\r\n response = client.create_product(\r\n
parent=location_path,\r\n product=product,\r\n product_id=product_id)\r\n\r\n #
Display the product information.\r\n print('Product name:
{}'.format(response.name))\r\n```"}

The following is the second example of a code generation tuning dataset:

{"input_text": "python to retrieve the labels of a dataset for a given dataset
ID in BigQuery", "output_text": "Here is a Python code sample to retrieve the labels of a
dataset for a given dataset ID in BigQuery.\n\n```python\n\n from google.cloud import
bigquery\n\n # Construct a BigQuery client object.\n client =
bigquery.Client()\n\n # TODO(developer): Set dataset_id to the ID of the dataset
to fetch.\n # dataset_id = \"your-project.your_dataset\"\n\n dataset =
client.get_dataset(dataset_id) # Make an API request.\n\n # View dataset
labels.\n print(\"Dataset ID: {}\".format(dataset_id))\n print(\"Labels:\")\n if
dataset.labels:\n for label, value in dataset.labels.items():\n print(\"\\t{}:
{}\".format(label, value))\n else:\n print(\"\\tDataset has no labels
defined.\")\n```"}

Code chat

Your code chat model tuning dataset must be in JSON Lines (JSONL) format where each line contains a single tuning example. Each example is composed of one messages field that contains example chat prompts and responses that the tuned model is expected to produce.

The maximum number of author fields that the examples in the dataset for a code chat model can contain is 10,000. This maximum is for the sum of all author fields in all messages in all the examples.

Examples of a code chat model tuning dataset

This topic shows you two examples of code chat model tuning datasets. The following is the first example of a code chat tuning dataset:

{"messages": [{"author":"user", "content": "How to create a product in
Python with Cloud Vision?"}, {"author": "assistant", "content": "Use this Python code sample to
learn how to create a product with Cloud Vision:\n\n```python\n\r\n\r\n\r\n\r\n\r\nfrom
google.cloud import vision\r\nfrom google.protobuf import field_mask_pb2 as
field_mask\r\n\r\ndef create_product(\r\n project_id, location, product_id,
product_display_name,\r\n product_category):\r\n \"\"\"Create one product.\r\n
Args:\r\n project_id: Id of the project.\r\n location: A compute region
name.\r\n product_id: Id of the product.\r\n product_display_name: Display name
of the product.\r\n product_category: Category of the product.\r\n \"\"\"\r\n
client = vision.ProductSearchClient()\r\n\r\n # A resource that represents
Google Cloud Platform location.\r\n location_path =
f\"projects/{project_id}/locations/{location}\"\r\n\r\n # Create a product with
the product specification in the region.\r\n # Set product display name and
product category.\r\n product = vision.Product(\r\n
display_name=product_display_name,\r\n
product_category=product_category)\r\n\r\n # The response is the product with
the `name` field populated.\r\n response = client.create_product(\r\n
parent=location_path,\r\n product=product,\r\n product_id=product_id)\r\n\r\n #
Display the product information.\r\n print('Product name:
{}'.format(response.name))\r\n```"}]}

The following is the second example of a code chat tuning dataset:

{"messages": [{"author":"user", "content": "Python to retrieve the labels of a
dataset for a given dataset ID in BigQuery"}, {"author": "assistant", "content": "Here is a
Python code sample to retrieve the labels of a dataset for a given dataset
ID in BigQuery.\n\n```python\n\n from google.cloud import bigquery\n\n # Construct a
BigQuery client object.\n client = bigquery.Client()\n\n # TODO(developer): Set
dataset_id to the ID of the dataset to fetch.\n # dataset_id =
\"your-project.your_dataset\"\n\n dataset = client.get_dataset(dataset_id) #
Make an API request.\n\n # View dataset labels.\n print(\"Dataset ID:
{}\".format(dataset_id))\n print(\"Labels:\")\n if dataset.labels:\n for label,
value in dataset.labels.items():\n print(\"\\t{}: {}\".format(label, value))\n
else:\n print(\"\\tDataset has no labels defined.\")\n```"}]}

Maintain consistency with production data

The examples in your datasets should match your expected production traffic. If your dataset contains specific formatting, keywords, instructions, or information, the production data should be formatted in the same way and contain the same instructions.

For example, if the examples in your dataset include a "question:" and a "context:", production traffic should also be formatted to include a "question:" and a "context:" in the same order as it appears in the dataset examples. If you exclude the context, the model will not recognize the pattern, even if the exact question was in an example in the dataset.

Include instructions in examples

For tasks such as code generation, you can create a dataset of examples that don't contain instructions. However, excluding instructions from the examples in the dataset leads to worse performance after tuning than including instructions, especially for smaller datasets.

Excludes instructions:

{
  "input_text": "Calculate the sum of a list of integers.",
  "output_text": "```python\nnums = [1, 2, 3]\ntotal_sum = sum(nums)\n```"
}

Includes instructions:

{
  "input_text": "Write the code in Python: calculate the sum of a list of integers",
  "output_text": "```python\nnums = [1, 2, 3]\ntotal_sum = sum(nums)\n```"
}

Upload tuning datasets to Cloud Storage

To run a tuning job, you need to upload one or more datasets to a Cloud Storage bucket. You can either create a new Cloud Storage bucket or use an existing one to store dataset files. The region of the bucket doesn't matter, but we recommend that you use a bucket that's in the same Google Cloud project where you plan to tune your model.

After your bucket is ready, upload your dataset file to the bucket.

Supervised tuning region settings

You can specify three Google Cloud region settings when you configure a supervised tuning job. One region is where the pipeline that tunes your model runs. The other region is where the model tuning job runs and the tuned model is uploaded.

Pipeline job region

The pipeline job region is the region where the pipeline job runs. If the optional model upload region isn't specified, then the model is uploaded and deployed to the pipeline job region. Intermediate data, such as the transformed dataset, is stored in the pipeline job region. To learn which regions you can use for the pipeline job region, see Supported pipeline job and model upload regions. You must specify the pipeline job region using one of the following methods:

  • If you use the Vertex AI SDK, you can specify the region where the pipeline job runs using the tuning_job_location parameter on the tune_model method of the object that represents the model you're tuning (for example, the TextGenerationModel.tune_model method).

  • If you create a supervised tuning job by sending a POST request using the pipelineJobs.create method, then you use the URL to specify the region where the pipeline job runs. In the following URL, replacing both instances of PIPELINE_JOB_REGION with the region where the pipeline runs:

     https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs
    
  • If you use the Google Cloud console to create a supervised model tuning job, then you specify the pipeline job region in the Region control when you create your tuning job. In the Google Cloud console, the Region control specifies both the pipeline job region and the model upload region. When you use the Google Cloud console to create a supervised model tuning job, both regions are always the same.

Model upload region

You use the optional tuned_model_location parameter to specify where your tuned model is uploaded. If the model upload region isn't specified, then the tuned model is uploaded to the pipeline job region.You can use one of the Supported pipeline job and model upload regions for your model upload region. You can specify the model upload region using one of the following methods:

  • If you use the Vertex AI SDK, the tuned_model_location parameter is specified on the tune_model method of the object that represents the model you're tuning (for example, the TextGenerationModel.tune_model method).

  • If you create a supervised model tuning job by sending a POST request using the pipelineJobs method, then you can use the location parameter to specify the model upload region.

  • If you use the Google Cloud console to create a supervised model tuning job, then you specify the model upload region in the Region control when you create your tuning job. In the Google Cloud console, the Region control specifies both the model upload region and the pipeline job region. When you use the Google Cloud console to create a supervised model tuning job, both regions are always the same.

Model tuning region

The model tuning region is where the model tuning computation occurs. This region is determined by the accelerator type you choose. If you specify TPU for your accelerator type, then your model tuning computation happens in europe-west4. If you specify GPU for your accelerator type, then model tuning happens in us-central1.

Supported pipeline job and model upload regions

You can use one of the following regions to specify the model upload region and to specify the pipeline job region:

  • us-central1
  • europe-west4
  • asia-southeast1
  • us-west1
  • europe-west3
  • europe-west2
  • asia-northeast1
  • us-east4
  • us-west4
  • northamerica-northeast1
  • europe-west9
  • europe-west1
  • asia-northeast3

Create a code model tuning job

You can create a supervised tuning job by using the Google Cloud console, API, or the Vertex AI SDK for Python. For guidance on model tuning configurations, see the Recommended configurations.

Create a code generation model tuning job

The following shows you how to create a code generation model tuning job using the Google Cloud console or REST API commands.

REST

To create a code generation model tuning job, send a POST request by using the pipelineJobs method.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TUNINGPIPELINE_DISPLAYNAME: A display name for the pipelineJob.
  • OUTPUT_DIR: The URI of the bucket to output pipeline artifacts to.
  • MODEL_DISPLAYNAME: A display name for the model uploaded (created) by the pipelineJob.
  • DATASET_URI: URI of your dataset file.
  • EVAL_DATASET_URI: (optional) The URI of the JSONL file that contains the evaluation dataset for batch prediction and evaluation. Evaluation isn't supported for codechat-bison. For more information, see Dataset format for tuning a code model. The evaluation dataset requires between ten and 250 examples.
  • EVAL_INTERVAL: (optional, default 20) The number of tuning steps between each evaluation. An evaluation interval isn't supported for chat models. Because the evaluation runs on the entire evaluation dataset, a smaller evaluation interval results in a longer tuning time. For example, if steps is 200 and EVAL_INTERVAL is 100, then you will get only two data points for the evaluation metrics. This parameter requires that the evaluation_data_uri is set.
  • PIPELINE_JOB_REGION: The region where the pipeline tuning job runs. This is also the default region for where the tuned model is uploaded. If you want to upload your model to a different region, then use the location parameter to specify the tuned model upload region. For more information, see Pipeline job region.
  • MODEL_UPLOAD_REGION: (optional) The region where the tuned model is uploaded. If you don't specify a model upload region, then the tuned model uploads to the same region where the pipeline job runs. For more information, see Model upload region.
  • ACCELERATOR_TYPE: (optional, default GPU) The type of accelerator to use for model tuning. The valid options are:
    • GPU: Uses eight A100 80 GB GPUs for tuning. Make sure you have enough quota. If you choose GPU, then VPC‑SC is supported. CMEK is supported if the tuning location and model upload location are us-centra1. For more information, see Supervised tuning region settings. If you choose GPU, then your model tuning computations happen in the us-central1 region.
    • TPU: Uses 64 cores of the TPU v3 pod for tuning. Make sure you have enough quota. CMEK isn't supported, but VPC‑SC is supported. If you choose TPU, then your model tuning computations happen in the europe-west4 region.
  • ENABLE_EARLY_STOPPING: (optional, default true) A boolean that, if set to true, stops tuning before completing all the tuning steps if model performance, as measured by the accuracy of predicted tokens, does not improve enough between evaluations runs. If false, tuning continues until all the tuning steps are complete. This parameter requires that the evaluation_data_uri is set. Enable early stopping isn't supported for chat models.
  • ENABLE_CHECKPOINT_SELECTION: A string value that can be true, false, or default. When set to `true`, Vertex AI selects and returns the checkpoint with the best model evaluation performance from all checkpoints created during the tuning job. When set to `false`, the final checkpoint created during the tuning job is returned. Each checkpoint refers to a snapshot of the model during a tuning job.
  • TENSORBOARD_RESOURCE_ID: (optional) The ID of a Vertex AI TensorBoard instance. The Vertex AI TensorBoard instance is used to create an experiment after the tuning job completes. The Vertex AI TensorBoard instance needs to be in the same region as the tuning pipeline.
  • ENCRYPTION_KEY_NAME: (optional) The fully qualified name of a customer-managed encryption key (CMEK) that you want to use for data encryption. A CMEK is available only in us-central1. If you use us-central1 and don't specify a CMEK, then a Google-managed encryption key is used. A Google-managed encryption key is used by default in all other available regions. For more information, see CMEK overview.
  • STEPS: The number of steps to run for model tuning. The default value is 300. The batch size varies by tuning location and model size. For 8k models, such as text-bison@002, chat-bison@002, code-bison@002, and codechat-bison@002:
    • us-central1 has a batch size of 8.
    • europe-west4 has a batch size of 24.
    For 32k models, such as text-bison-32k, chat-bison-32k, code-bison-32k, and codechat-bison-32k:
    • us-central1 has a batch size of 8.
    • europe-west4 has a batch size of 8.

    For example, if you're training text-bison@002 in europe-west4, there are 240 examples in a training dataset, and you set steps to 20, then the number of training examples is the product of 20 steps and the batch size of 24, or 480 training steps. In this case, there are two epochs in the training process because it goes through the examples two times. In us-central1, if there are 240 examples in a training dataset and you set steps to 15, then the number of training examples is the product of 15 steps and the batch size of 8, or 120 training steps. In this case, there are 0.5 epochs because there are half as many training steps as there are examples.

  • LEARNING_RATE_MULTIPLIER : The step size at each iteration. The default value is 1.

HTTP method and URL:

POST https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs

Request JSON body:

{
  "displayName": "PIPELINEJOB_DISPLAYNAME",
  "runtimeConfig": {
    "gcsOutputDirectory": "gs://OUTPUT_DIR",
    "parameterValues": {
      "project": "PROJECT_ID",
      "model_display_name": "MODEL_DISPLAYNAME",
      "dataset_uri": "gs://DATASET_URI",
      "evaluation_data_uri": "EVAL_DATASET_URI",
      "evaluation_interval": "EVAL_INTERVAL",
      "enable_early_stopping": "ENABLE_EARLY_STOPPING",
      "enable_checkpoint_selection": "ENABLE_CHECKPOINT_SELECTION",
      "tensorboard_resource_id": "TENSORBOARD_RESOURCE_ID",
      "location": "MODEL_UPLOAD_REGION",
      "accelerator_type": "ACCELERATOR_TYPE",
      "large_model_reference": "code-bison@002",
      "train_steps": STEPS,
      "learning_rate_multiplier": LEARNING_RATE_MULTIPLIER
    }
  }
  "templateUri": "https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v3.0.0"
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following. Note that pipelineSpec has been truncated to save space.

Console

To tune a code generation or code chat model with supervised tuning by using the Google Cloud console, perform the following steps:

  1. In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. Click the Tune and distill tab.
  3. Click Create tuned model.
  4. Click Supervised tuning.
  5. Configure model details:
    • Tuned model name: Enter a name for your tuned model.
    • Base model: Select the model that you want to tune.
    • Region: Select the region where the pipeline tuning job runs and where the tuned model is deployed.
    • Output directory: Enter the Cloud Storage location where artifacts are stored when your model is tuned.
  6. Expand Advanced Options to configure advanced settings.
    • Train steps: Enter the number of steps to run for model tuning. The default value is 300. The batch size varies by tuning location and model size. For 8k models, such as text-bison@002, chat-bison@002, code-bison@002, and codechat-bison@002:
      • us-central1 has a batch size of 8.
      • europe-west4 has a batch size of 24.
      For 32k models, such as text-bison-32k, chat-bison-32k, code-bison-32k, and codechat-bison-32k:
      • us-central1 has a batch size of 8.
      • europe-west4 has a batch size of 8.

      For example, if you're training text-bison@002 in europe-west4, there are 240 examples in a training dataset, and you set steps to 20, then the number of training examples is the product of 20 steps and the batch size of 24, or 480 training steps. In this case, there are two epochs in the training process because it goes through the examples two times. In us-central1, if there are 240 examples in a training dataset and you set steps to 15, then the number of training examples is the product of 15 steps and the batch size of 8, or 120 training steps. In this case, there are 0.5 epochs because there are half as many training steps as there are examples.

    • Learning rate multiplier: Enter the step size at each iteration. The default value is 1.
    • Accelerator type: (optional) Enter the type of accelerator to use for model tuning. The valid options are:
      • GPU: Uses eight A100 80 GB GPUs for tuning. Make sure you have enough quota. If you choose GPU, then VPC‑SC is supported. CMEK is supported if the tuning location and model upload location are us-centra1. For more information, see Supervised tuning region settings. If you choose GPU, then your model tuning computations happen in the us-central1 region.
      • TPU: Uses 64 cores of the TPU v3 pod for tuning. Make sure you have enough quota. CMEK isn't supported, but VPC‑SC is supported. If you choose TPU, then your model tuning computations happen in the europe-west4 region.
    • Add a TensorBoard instance: (optional) The ID of a Vertex AI TensorBoard instance. The Vertex AI TensorBoard instance is used to create an experiment after the tuning job completes. The Vertex AI TensorBoard instance needs to be in the same region as the tuning pipeline.
    • Encryption (optional) Choose to use a Google-managed encryption key or a customer-managed encryption key (CMEK). A CMEK is available for encryption only in the us-central1 region. In all other available regions, a Google-managed encryption key is used. For more information, see CMEK overview.
    • Service account (optional) Choose a a user-managed service account. A service account determines which Google Cloud resources your service code can access. If you don't choose a service account, then a Google-managed service account is used that includes permissions appropriate for most models.
  7. Click Continue
  8. If you want to upload your dataset file, select  Upload JSONL file to Cloud Storage. If your dataset file is already in a Cloud Storage bucket, select  Existing JSONL file on Cloud Storage.

    Upload a JSONL file

    • In Select JSONL file, click Browse and select your dataset file.
    • In Dataset location, click Browse and select the Cloud Storage bucket where you want to store your dataset file.

    Use an existing JSONL file

    In Cloud Storage file path, click Browse and select the Cloud Storage bucket where your dataset file is located.

  9. (Optional) To evaluate your tuned model, select Enable model evaluation and configure your model evaluation:
    • Evaluation dataset: (optional) The URI of the JSONL file that contains the evaluation dataset for batch prediction and evaluation. Evaluation isn't supported for codechat-bison. For more information, see Dataset format for tuning a code model. The evaluation dataset requires between ten and 250 examples.
    • Evaluation interval: (optional, default 20) The number of tuning steps between each evaluation. An evaluation interval isn't supported for chat models. Because the evaluation runs on the entire evaluation dataset, a smaller evaluation interval results in a longer tuning time. For example, if steps is 200 and EVAL_INTERVAL is 100, then you will get only two data points for the evaluation metrics. This parameter requires that the evaluation_data_uri is set.
    • Enable early stopping: (optional, default true) A boolean that, if set to true, stops tuning before completing all the tuning steps if model performance, as measured by the accuracy of predicted tokens, does not improve enough between evaluations runs. If false, tuning continues until all the tuning steps are complete. This parameter requires that the evaluation_data_uri is set. Enable early stopping isn't supported for chat models.
    • Enable checkpoint selection: When enabled, Vertex AI selects and returns the checkpoint with the best model evaluation performance from all checkpoints created during the tuning job. When disabled, the final checkpoint created during the tuning job is returned. Each checkpoint refers to a snapshot of the model during a tuning job.
    • TensorBoard Id: (optional) The ID of a Vertex AI TensorBoard instance. The Vertex AI TensorBoard instance is used to create an experiment after the tuning job completes. The Vertex AI TensorBoard instance needs to be in the same region as the tuning pipeline.
  10. Click Start tuning.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

from __future__ import annotations


def tune_code_generation_model(
    project_id: str
) -> None:

    from google.auth import default
    import vertexai
    from vertexai.language_models import CodeGenerationModel

    credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])

    # TODO(developer): Update and un-comment below lines
    # project_id = "PROJECT_ID"

    vertexai.init(project=project_id, location="us-central1", credentials=credentials)

    model = CodeGenerationModel.from_pretrained("code-bison@002")

    tuning_job = model.tune_model(
        training_data="gs://cloud-samples-data/ai-platform/generative_ai/headline_classification.jsonl",
        tuning_job_location="europe-west4",
        tuned_model_location="us-central1",
    )

    print(tuning_job._status)

    return model


Example curl command to tune a code generation model

PROJECT_ID=myproject
DATASET_URI=gs://my-gcs-bucket-uri/dataset
EVAL_DATASET_URI=gs://cloud-samples-data/vertex-ai/model-evaluation/eval_sample.jsonl
OUTPUT_DIR=gs://my-gcs-bucket-uri/output
ACCELERATOR_TYPE=GPU
LOCATION=us-central1

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
"https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/pipelineJobs?pipelineJobId=tune-large-model-$(date +%Y%m%d%H%M%S)" -d \
$'{
  "displayName": "'${PIPELINE_NAME}'",
  "runtimeConfig": {
    "gcsOutputDirectory": "'${OUTPUT_DIR}'",
    "parameterValues": {
      "project": "'${PROJECT_ID}'",
      "model_display_name": "The display name for your model in the UI",
      "dataset_uri": "'${DATASET_URI}'",
      "evaluation_data_uri:": "'${EVAL_DATASET_URI}'",
      "location": "'${LOCATION}'",
      "accelerator_type": "'${ACCELERATOR_TYPE}'",
      "large_model_reference": "code-bison@002",
      "learning_rate_multiplier": 1,
      "train_steps": 300
    }
  },
  "templateUri": "https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v3.0.0"
}'

Create a code chat model tuning job

The following shows you how to create a code chat model tuning job using the Google Cloud console or REST API commands.

REST

To create a code chat model tuning job, send a POST request by using the pipelineJobs method.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TUNINGPIPELINE_DISPLAYNAME: A display name for the pipelineJob.
  • OUTPUT_DIR: The URI of the bucket to output pipeline artifacs to.
  • PIPELINE_JOB_REGION: The region where the pipeline tuning job runs. This is also the default region for where the tuned model is uploaded. If you want to upload your model to a different region, then use the location parameter to specify the tuned model upload region. For more information, see Pipeline job region.
  • MODEL_UPLOAD_REGION: (optional) The region where the tuned model is uploaded. If you don't specify a model upload region, then the tuned model uploads to the same region where the pipeline job runs. For more information, see Model upload region.
  • ACCELERATOR_TYPE: (optional, default GPU) The type of accelerator to use for model tuning. The valid options are:
    • GPU: Uses eight A100 80 GB GPUs for tuning. Make sure you have enough quota. If you choose GPU, then VPC‑SC is supported. CMEK is supported if the tuning location and model upload location are us-centra1. For more information, see Supervised tuning region settings. If you choose GPU, then your model tuning computations happen in the us-central1 region.
    • TPU: Uses 64 cores of the TPU v3 pod for tuning. Make sure you have enough quota. CMEK isn't supported, but VPC‑SC is supported. If you choose TPU, then your model tuning computations happen in the europe-west4 region.
  • MODEL_DISPLAYNAME: A display name for the model uploaded (created) by the pipelineJob.
  • DATASET_URI: URI of your dataset file.
  • TENSORBOARD_RESOURCE_ID: (optional) The ID of a Vertex AI TensorBoard instance. The Vertex AI TensorBoard instance is used to create an experiment after the tuning job completes. The Vertex AI TensorBoard instance needs to be in the same region as the tuning pipeline.
  • ENCRYPTION_KEY_NAME: (optional) The fully qualified name of a customer-managed encryption key (CMEK) that you want to use for data encryption. A CMEK is available only in us-central1. If you use us-central1 and don't specify a CMEK, then a Google-managed encryption key is used. A Google-managed encryption key is used by default in all other available regions. For more information, see CMEK overview.
  • DEFAULT_CONTEXT: The context that applies to all tuning examples in the tuning dataset. Setting the context field in an example overrides the default context.
  • STEPS: The number of steps to run for model tuning. The default value is 300. The batch size varies by tuning location and model size. For 8k models, such as text-bison@002, chat-bison@002, code-bison@002, and codechat-bison@002:
    • us-central1 has a batch size of 8.
    • europe-west4 has a batch size of 24.
    For 32k models, such as text-bison-32k, chat-bison-32k, code-bison-32k, and codechat-bison-32k:
    • us-central1 has a batch size of 8.
    • europe-west4 has a batch size of 8.

    For example, if you're training text-bison@002 in europe-west4, there are 240 examples in a training dataset, and you set steps to 20, then the number of training examples is the product of 20 steps and the batch size of 24, or 480 training steps. In this case, there are two epochs in the training process because it goes through the examples two times. In us-central1, if there are 240 examples in a training dataset and you set steps to 15, then the number of training examples is the product of 15 steps and the batch size of 8, or 120 training steps. In this case, there are 0.5 epochs because there are half as many training steps as there are examples.

  • LEARNING_RATE_MULTIPLIER: The step size at each iteration. The default value is 1.

HTTP method and URL:

POST https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs

Request JSON body:

{
  "displayName": "PIPELINEJOB_DISPLAYNAME",
  "runtimeConfig": {
    "gcsOutputDirectory": "gs://OUTPUT_DIR",
    "parameterValues": {
      "project": "PROJECT_ID",
      "model_display_name": "MODEL_DISPLAYNAME",
      "dataset_uri": "gs://DATASET_URI",
      "tensorboard_resource_id": "TENSORBOARD_RESOURCE_ID",
      "location": "MODEL_UPLOAD_REGION",
      "accelerator_type": "ACCELERATOR_TYPE",
      "large_model_reference": "codechat-bison@002",
      "default_context": "DEFAULT_CONTEXT",
      "train_steps": STEPS,
      "learning_rate_multiplier": LEARNING_RATE_MULTIPLIER
    }
  },
  "templateUri": "https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-chat-model/v3.0.0"
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://PIPELINE_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/PIPELINE_JOB_REGION/pipelineJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.
Note that pipelineSpec has been truncated to save space.

Console

To tune a code generation or code chat model with supervised tuning by using the Google Cloud console, perform the following steps:

  1. In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. Click the Tune and distill tab.
  3. Click Create tuned model.
  4. Click Supervised tuning.
  5. Configure model details:
    • Tuned model name: Enter a name for your tuned model.
    • Base model: Select the model that you want to tune.
    • Region: Select the region where the pipeline tuning job runs and where the tuned model is deployed.
    • Output directory: Enter the Cloud Storage location where artifacts are stored when your model is tuned.
  6. Expand Advanced Options to configure advanced settings.
    • Train steps: Enter the number of steps to run for model tuning. The default value is 300. The batch size varies by tuning location and model size. For 8k models, such as text-bison@002, chat-bison@002, code-bison@002, and codechat-bison@002:
      • us-central1 has a batch size of 8.
      • europe-west4 has a batch size of 24.
      For 32k models, such as text-bison-32k, chat-bison-32k, code-bison-32k, and codechat-bison-32k:
      • us-central1 has a batch size of 8.
      • europe-west4 has a batch size of 8.

      For example, if you're training text-bison@002 in europe-west4, there are 240 examples in a training dataset, and you set steps to 20, then the number of training examples is the product of 20 steps and the batch size of 24, or 480 training steps. In this case, there are two epochs in the training process because it goes through the examples two times. In us-central1, if there are 240 examples in a training dataset and you set steps to 15, then the number of training examples is the product of 15 steps and the batch size of 8, or 120 training steps. In this case, there are 0.5 epochs because there are half as many training steps as there are examples.

    • Learning rate multiplier: Enter the step size at each iteration. The default value is 1.
    • Accelerator type: (optional) Enter the type of accelerator to use for model tuning. The valid options are:
      • GPU: Uses eight A100 80 GB GPUs for tuning. Make sure you have enough quota. If you choose GPU, then VPC‑SC is supported. CMEK is supported if the tuning location and model upload location are us-centra1. For more information, see Supervised tuning region settings. If you choose GPU, then your model tuning computations happen in the us-central1 region.
      • TPU: Uses 64 cores of the TPU v3 pod for tuning. Make sure you have enough quota. CMEK isn't supported, but VPC‑SC is supported. If you choose TPU, then your model tuning computations happen in the europe-west4 region.
    • Add a TensorBoard instance: (optional) The ID of a Vertex AI TensorBoard instance. The Vertex AI TensorBoard instance is used to create an experiment after the tuning job completes. The Vertex AI TensorBoard instance needs to be in the same region as the tuning pipeline.
    • Encryption (optional) Choose to use a Google-managed encryption key or a customer-managed encryption key (CMEK). A CMEK is available for encryption only in the us-central1 region. In all other available regions, a Google-managed encryption key is used. For more information, see CMEK overview.
    • Service account (optional) Choose a a user-managed service account. A service account determines which Google Cloud resources your service code can access. If you don't choose a service account, then a Google-managed service account is used that includes permissions appropriate for most models.
  7. Click Continue
  8. If you want to upload your dataset file, select  Upload JSONL file to Cloud Storage. If your dataset file is already in a Cloud Storage bucket, select  Existing JSONL file on Cloud Storage.

    Upload a JSONL file

    • In Select JSONL file, click Browse and select your dataset file.
    • In Dataset location, click Browse and select the Cloud Storage bucket where you want to store your dataset file.

    Use an existing JSONL file

    In Cloud Storage file path, click Browse and select the Cloud Storage bucket where your dataset file is located.

  9. (Optional) To evaluate your tuned model, select Enable model evaluation and configure your model evaluation:
    • Evaluation dataset: (optional) The URI of the JSONL file that contains the evaluation dataset for batch prediction and evaluation. Evaluation isn't supported for codechat-bison. For more information, see Dataset format for tuning a code model. The evaluation dataset requires between ten and 250 examples.
    • Evaluation interval: (optional, default 20) The number of tuning steps between each evaluation. An evaluation interval isn't supported for chat models. Because the evaluation runs on the entire evaluation dataset, a smaller evaluation interval results in a longer tuning time. For example, if steps is 200 and EVAL_INTERVAL is 100, then you will get only two data points for the evaluation metrics. This parameter requires that the evaluation_data_uri is set.
    • Enable early stopping: (optional, default true) A boolean that, if set to true, stops tuning before completing all the tuning steps if model performance, as measured by the accuracy of predicted tokens, does not improve enough between evaluations runs. If false, tuning continues until all the tuning steps are complete. This parameter requires that the evaluation_data_uri is set. Enable early stopping isn't supported for chat models.
    • Enable checkpoint selection: When enabled, Vertex AI selects and returns the checkpoint with the best model evaluation performance from all checkpoints created during the tuning job. When disabled, the final checkpoint created during the tuning job is returned. Each checkpoint refers to a snapshot of the model during a tuning job.
    • TensorBoard Id: (optional) The ID of a Vertex AI TensorBoard instance. The Vertex AI TensorBoard instance is used to create an experiment after the tuning job completes. The Vertex AI TensorBoard instance needs to be in the same region as the tuning pipeline.
  10. Click Start tuning.

Example curl command to tune a code chat model

PROJECT_ID=myproject
DATASET_URI=gs://my-gcs-bucket-uri/dataset
OUTPUT_DIR=gs://my-gcs-bucket-uri/output
ACCELERATOR_TYPE=GPU
LOCATION=us-central1

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
"https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/pipelineJobs?pipelineJobId=tune-large-chat-model-$(date +%Y%m%d%H%M%S)" -d \
$'{
  "displayName": "'${PIPELINE_NAME}'",
  "runtimeConfig": {
    "gcsOutputDirectory": "'${OUTPUT_DIR}'",
    "parameterValues": {
      "project": "'${PROJECT_ID}'",
      "model_display_name": "your-model-display-name",
      "dataset_uri": "'${DATASET_URI}'",
      "location": "'${LOCATION}'",
      "large_model_reference": "codechat-bison@002",
      "train_steps": 300,
      "learning_rate_multiplier": 1,
      "encryption_spec_key_name": "projects/myproject/locations/us-central1/keyRings/sample-key/cryptoKeys/sample-key"
    }
  },
  "encryptionSpec": {
    "kmsKeyName": "projects/myproject/locations/us-central1/keyRings/sample-key/cryptoKeys/sample-key"
  "templateUri": "https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-chat-model/v3.0.0"
}'

The following table shows the recommended configurations for tuning a code model by task:

Task No. of examples in dataset Train steps
Code generation 500+ 200-1000
Code chat 500+ 200-1000

For train steps, you can try more than one value to get the best performance on a particular dataset, for example, 100, 200, 500.

View a list of tuned models

You can use the Google Cloud console or the Vertex AI SDK for Python to view a list of your tuned code models in your current project.

View a list of tuned code models (console)

To view your tuned code chat and code generation models in the Google Cloud console, go to the Vertex AI Model Registry page.

Go to Vertex AI Model Registry

View a list of tuned code generation models (SDK)

The following sample code uses the Vertex AI SDK for Python to list the tuned code generation models in your current project:

import vertexai
from vertexai.preview.language_models import CodeGenerationModel

model = CodeGenerationModel.from_pretrained("code-bison@002").list_tuned_model_names()

View a list of tuned code chat models (SDK)

The following sample code uses the Vertex AI SDK for Python to list the tuned code chat models in your current project:

import vertexai
from vertexai.preview.language_models import CodeChatModel

model = CodeChatModel.from_pretrained("codechat-bison@002").list_tuned_model_names()

Load a tuned model

You can use the Vertex AI SDK for Python to load a tuned code model.

Load a tuned code generation model

The following sample code uses the Vertex AI SDK for Python to load a tuned code generation model. In the sample code, replace TUNED_MODEL_NAME with the qualified resource name of your tuned model. This name is in the format projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID. You can find the model ID of your tuned model in Vertex AI Model Registry.

import vertexai
from vertexai.preview.language_models import CodeGenerationModel

model = CodeGenerationModel.get_tuned_model(TUNED_MODEL_NAME)

Load a tuned code chat model

The following sample code uses the Vertex AI SDK for Python to load a tuned code chat model:

import vertexai
from vertexai.preview.language_models import CodeChatModel

model = CodeChatModel.get_tuned_model(TUNED_MODEL_NAME)

Tuning and evaluation metrics

You can configure a model tuning job to collect and report model tuning and model evaluation metrics, which can then be visualized by using Vertex AI TensorBoard.

Model tuning metrics

You can configure a model tuning job to collect the following tuning metrics for chat-bison, code-bison, codechat-bison, and text-bison:
  • /train_total_loss: Loss for the tuning dataset at a training step.
  • /train_fraction_of_correct_next_step_preds: The token accuracy at a training step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the tuning dataset.
  • /train_num_predictions: Number of predicted tokens at a training step.

Model evaluation metrics

You can configure a model tuning job to collect the following evaluation metrics for code-bison and text-bison:

  • /eval_total_loss: Loss for the evaluation dataset at an evaluation step.
  • /eval_fraction_of_correct_next_step_preds: The token accuracy at an evaluation step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the evaluation dataset.
  • /eval_num_predictions: Number of predicted tokens at an evaluation step.

The metrics visualizations are available after the model tuning job completes. If you specify only a Vertex AI TensorBoard instance ID and not an evaluation dataset when you create the tuning job, only the visualizations for the tuning metrics are available.

Quota

Tuning jobs in us-central1 use eight A100 80 GB GPUs.

Tuning jobs in europe-west4 use 64 cores of the TPU v3 pod custom model training resource.

If you don't have enough quota or want to run multiple concurrent tuning jobs in your Google Cloud project, you must request additional quota:

  • For us-central1, submit a request for Restricted image training Nvidia A100 80 GB GPUs per region in the us-central1 region in multiples of eight.

  • For europe-west4, submit a request for Restricted image training TPU V3 pod cores per region in the europe-west4 region in multiples of 64.

What's next

  • For more models, advanced features, and the ability to transcribe files up to eight hours, see Speech-to-Text.