Tune text embeddings

This page shows you how to tune the text embedding model, textembedding-gecko and textembedding-gecko-multilingual. These foundation models have been trained on a large set of public text data. If you have a unique use case which requires your own specific training data you can use model tuning. After you tune a foundation embedding model, the model should be catered for your use case. Tuning is supported for stable versions of the text embedding model.

Text embedding models support supervised tuning. Supervised tuning uses labeled examples that demonstrate the type of output you'd like from your text embedding model during inference. Text embedding models don't support tuning by using Reinforcement learning from human feedback (RLHF).

To learn more about model tuning, see How model tuning works.

Expected quality improvement

Vertex AI uses a parameter efficient tuning method for customization. This methodology shows significant gains in quality of up to 41% (average 12%) on experiments performed on public retrieval benchmark datasets.

Use case for tuning an embedding model

Tuning a text embeddings model can enable your model to adapt to the embeddings to a specific domain or task. This can be useful if the pre-trained embeddings model is not well-suited to your specific needs. For example, you might fine-tune an embeddings model on a specific dataset of customer support tickets for your company. This could help a chatbot understand the different types of customer support issues your customers typically have, and be able to answer their questions more effectively. Without tuning, the model doesn't know the specifics of your customer support tickets or the solutions to specific problems for your product.

Tuning workflow

The model tuning workflow on Vertex AI for textembedding-gecko and textembedding-gecko-multilingual is as follows:

  • Prepare your model tuning dataset.
  • Upload the model tuning dataset to a Cloud Storage bucket.
  • Configure your project for Vertex AI Pipelines.
  • Create a model tuning job.
  • Deploy the tuned model to a Vertex AI endpoint of the same name. Unlike text or Codey model tuning jobs, a text embedding tuning job doesn't deploy your tuned models to a Vertex AI endpoint.

Prepare your embeddings dataset

The dataset used to tune an embeddings model includes data that align with the task that you want the model to perform.

Dataset format for tuning an embeddings model

The training dataset consists of the following files, which need to be in Cloud Storage. The path of the files are defined by parameters when launching the tuning pipeline. The three types of files are the corpus file, query file, and labels. Only train labels are necessary, but you may also provide validation and test labels for greater control.

  • Corpus file: The path is defined by parameter corpus_path. It's a JSONL file where each line has the fields _id, title, and text with string values. _id and text are required, while title is optional. Here is an example corpus.jsonl file:

    {"_id": "doc1", "title": "Get an introduction to generative AI on Vertex AI", "text": "Vertex AI's Generative AI Studio offers a Google Cloud console tool for rapidly prototyping and testing generative AI models. Learn how you can use Generative AI Studio to test models using prompt samples, design and save prompts, tune a foundation model, and convert between speech and text."}
    {"_id": "doc2", "title": "Use gen AI for summarization, classification, and extraction", "text": "Learn how to create text prompts for handling any number of tasks with Vertex AI's generative AI support. Some of the most common tasks are classification, summarization, and extraction. Vertex AI's PaLM API for text lets you design prompts with flexibility in terms of their structure and format."}
    {"_id": "doc3", "title": "Custom ML training overview and documentation", "text": "Get an overview of the custom training workflow in Vertex AI, the benefits of custom training, and the various training options that are available. This page also details every step involved in the ML training workflow from preparing data to predictions."}
    {"_id": "doc4", "text": "Text embeddings are useful for clustering, information retrieval, retrieval-augmented generation (RAG), and more."}
    {"_id": "doc5", "title": "Text embedding tuning", "text": "Google's text embedding models can be tuned on Vertex AI."}
    
  • Query file: The query file contains your example queries. The path is defined by the parameter queries_path. The query file is in JSONL format and has the same fields as the corpus file. Here is an example queries.jsonl file:

    {"_id": "query1", "text": "Does Vertex support generative AI?"}
    {"_id": "query2", "text": "What can I do with Vertex GenAI offerings?"}
    {"_id": "query3", "text": "How do I train my models using Vertex?"}
    {"_id": "query4", "text": "What is a text embedding?"}
    {"_id": "query5", "text": "Can text embedding models be tuned on Vertex?"}
    {"_id": "query6", "text": "embeddings"}
    {"_id": "query7", "text": "embeddings for rag"}
    {"_id": "query8", "text": "custom model training"}
    {"_id": "query9", "text": "Google Cloud PaLM API"}
    
  • Training labels: The path is defined by the parameter train_label_path. The train_label_path is the Cloud Storage URI to the train label data location and is specified when you create your tuning job. The labels need to be a TSV file with a header. A subset of the queries and the corpus need be included in your training labels file. The file must have the columns query-id, corpus-id and score. The query-id is a string that matches the _id key from the query file, the corpus-id is a string that matches the _id in the corpus file. Score is a non-negative integer value. Any score greater than zero indicates that the document is related to the query. Larger numbers indicate a greater level of relevance. If the score is omitted, the default value is 1. Here is an example train_labels.tsv file:

    query-id  corpus-id   score
    query1    doc1    1
    query2    doc2    1
    query3    doc3    2
    query3    doc5  1
    query4    doc4  1
    query4    doc5  1
    query5    doc5  2
    query6    doc4  1
    query6    doc5  1
    query7    doc4  1
    query8    doc3  1
    query9    doc2  1
    
  • Test labels: Optional. The test labels have the same format as the training labels and are specified by the test_label_path parameter. If no test_label_path is provided, the test labels will be autosplit from the training labels.

  • Validation labels: Optional. The validation labels have the same format as the training labels and are specified by the validation_label_path parameter. If no validation_label_path is provided, the validation labels will be autosplit from the training labels.

Dataset size requirements

The provided dataset files must meet the following constraints:

  • The number of queries must be between 9 and 40,000.

  • The number of documents in the corpus must be between 9 and 500,000.

  • Each dataset label file must include at least 3 query IDs, and across all dataset splits there must be at least 9 query IDs.

  • The total number of labels must be less than 500,000.

Configure your project for Vertex AI Pipelines

Tuning is executed within your project using the Vertex AI Pipelines platform.

Configuring permissions

The pipeline executes training code under two Google-managed service accounts. These accounts must be configured with certain permissions.

Compute Engine default service account
PROJECT_NUMBER-compute@developer.gserviceaccount.com

This service account requires:

  • Storage Object Viewer access to each dataset file you created in Cloud Storage.

  • Storage Object User access to the output Cloud Storage directory of your pipeline, PIPELINE_OUTPUT_DIRECTORY.

  • Vertex AI User access to your project.

Instead of the Compute Engine default service account, you can specify a custom service account. For more information, see Configure a service account with granular permissions.

Vertex AI Service Agent
service-PROJECT_NUMBER@gcp-sa-aiplatform.iam.gserviceaccount.com

This service account requires:

  • Storage Object Viewer access to each dataset file you created in Cloud Storage.

  • Storage Object User access to the output Cloud Storage directory of your pipeline, PIPELINE_OUTPUT_DIRECTORY.

For more information about configuring Cloud Storage dataset permissions, see Configure a Cloud Storage bucket for pipeline artifacts.

Using Accelerators

Any of the following accelerators can be used for tuning:

  • NVIDIA_L4

  • NVIDIA_TESLA_A100

  • NVIDIA_TESLA_T4

  • NVIDIA_TESLA_V100

  • NVIDIA_TESLA_P100

Launching a tuning job requires adequate Restricted image training GPUs quota for the accelerator type and region you have selected, for example Restricted image training Nvidia V100 GPUs per region. To increase the quota of your project, see request additional quota.

Not all accelerators are available in all regions. See Using accelerators in Vertex AI for more information.

Create an embedding model tuning job

You can create an embedding model tuning job by using the Google Cloud console or REST API.

REST

Before using any of the request data, make the following replacements:

  • DISPLAY_NAME: A display name for the pipelineJob.
  • PIPELINE_OUTPUT_DIRECTORY: Path for the pipeline output artifacts, starting with "gs://".
  • PROJECT_ID: Your Google Cloud project ID.
  • LOCATION: Google Cloud project region to run the pipeline. Tuning is supported in any region where your project has adequate GPU quota. See Using accelerators for more information. Because serving resources may be limited in other regions, us-central1 is recommended.
  • QUERIES_PATH: The URI of the Cloud Storage bucket storing query data, starting with "gs://".
  • CORPUS_PATH: The Cloud Storage URI for the corpus data, starting with "gs://".
  • TRAIN_LABEL_PATH: The Cloud Storage URI of the train label data location, starting with "gs://".
  • TEST_LABEL_PATH: Optional. The Cloud Storage URI of the test label data location, starting with "gs://". Passing an empty string will direct the pipeline to autosplit the test dataset from the training dataset.
  • VALIDATION_LABEL_PATH: Optional. The Cloud Storage URI of the validation label data location, starting with "gs://". Passing an empty string will direct the pipeline to autosplit the validation dataset from the training dataset.
  • ACCELERATOR_TYPE: Optional. The accelerator type to use for training. Defaults to NVIDIA_TESLA_V100. For possible values, see Using accelerators.
  • ACCELERATOR_COUNT: Optional. The number of accelerators to use when training. Using a greater number of accelerators may make training faster, but has no effect on quality. Defaults to 4.
  • MACHINE_TYPE: Optional. The machine type to use for training. Defaults to n1-standard-16. For information about selecting the machine type that matches the accelerator type and count you have selected, see GPU Platforms.
  • BASE_MODEL_VERSION_ID: Optional. Use this to specify what text embedding model to tune. Defaults to textembedding-gecko@001. For possible values, see stable versions.
  • MODEL_DISPLAY_NAME: Optional. The display name for the tuned model when it appears in Model Registry. Defaults to "tuned-text-embedding-model".
  • TASK_TYPE: Optional. Setting this parameter optimizes the tuned model for a specific downstream task. Defaults to DEFAULT. For more information, see Get text embeddings.
  • BATCH_SIZE: Optional. The training batch size. Defaults to 128.
  • ITERATIONS: Optional. The number of steps to perform model tuning. Defaults to 1000, and must be greater than 30.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/pipelineJobs

Request JSON body:

{
  "displayName": "DISPLAY_NAME",
  "runtimeConfig": {
    "gcsOutputDirectory": "PIPELINE_OUTPUT_DIRECTORY",
    "parameterValues": {
      "project": "PROJECT_ID",
      "location": "LOCATION",
      "queries_path": "QUERIES_PATH",
      "corpus_path": "CORPUS_PATH",
      "train_label_path": "TRAIN_LABEL_PATH",
      "test_label_path": "TEST_LABEL_PATH",
      "validation_label_path": "VALIDATION_LABEL_PATH",
      "accelerator_type": "ACCELERATOR_TYPE",
      "accelerator_count": "ACCELERATOR_COUNT",
      "machine_type": "MACHINE_TYPE",
      "base_model_version_id": "BASE_MODEL_VERSION_ID",
      "model_display_name": "MODEL_DISPLAY_NAME",
      "task_type": "TASK_TYPE",
      "batch_size": "BATCH_SIZE",
      "iterations": "ITERATIONS"
    }
  },
  "templateUri": "https://us-kfp.pkg.dev/ml-pipeline/llm-text-embedding/tune-text-embedding-model/v1.1.2"
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

After launching the pipeline, follow the progress of your tuning job through the Google Cloud console.

Go to Google Cloud console

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

package aiplatform;

// [START aiplatform_sdk_embedding_tuning]
import com.google.cloud.aiplatform.v1beta1.CreatePipelineJobRequest;
import com.google.cloud.aiplatform.v1beta1.LocationName;
import com.google.cloud.aiplatform.v1beta1.PipelineJob;
import com.google.cloud.aiplatform.v1beta1.PipelineJob.RuntimeConfig;
import com.google.cloud.aiplatform.v1beta1.PipelineServiceClient;
import com.google.cloud.aiplatform.v1beta1.PipelineServiceSettings;
import com.google.protobuf.Value;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

public class CreatePipelineJobEmbeddingModelTuningSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "PROJECT";
    String baseModelVersionId = "BASE_MODEL_VERSION_ID";
    String taskType = "TASK_TYPE";
    String location = "us-central1";
    String pipelineJobDisplayName = "PIPELINE_JOB_DISPLAY_NAME";
    String modelDisplayName = "MODEL_DISPLAY_NAME";
    String outputDir = "OUTPUT_DIR";
    String queriesPath = "DATASET_URI";
    String corpusPath = "DATASET_URI";
    String trainLabelPath = "DATASET_URI";
    String testLabelPath = "DATASET_URI";
    int batchSize = 50;
    int iterations = 300;

    createPipelineJobEmbeddingModelTuningSample(
        project,
        baseModelVersionId,
        taskType,
        location,
        pipelineJobDisplayName,
        modelDisplayName,
        outputDir,
        queriesPath,
        corpusPath,
        trainLabelPath,
        testLabelPath,
        batchSize,
        iterations);
  }

  // Create a model tuning job
  public static void createPipelineJobEmbeddingModelTuningSample(
      String project,
      String baseModelVersionId,
      String taskType,
      String location,
      String pipelineJobDisplayName,
      String modelDisplayName,
      String outputDir,
      String queriesPath,
      String corpusPath,
      String trainLabelPath,
      String testLabelPath,
      int batchSize,
      int iterations)
      throws IOException {
    final String endpoint = String.format("%s-aiplatform.googleapis.com:443", location);
    PipelineServiceSettings pipelineServiceSettings =
        PipelineServiceSettings.newBuilder().setEndpoint(endpoint).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (PipelineServiceClient client = PipelineServiceClient.create(pipelineServiceSettings)) {
      Map<String, Value> parameterValues = new HashMap<>();
      parameterValues.put("project", stringToValue(project));
      parameterValues.put("base_model_version_id", stringToValue(baseModelVersionId));
      parameterValues.put("task_type", stringToValue(taskType));
      parameterValues.put(
          "location",
          stringToValue(
              "us-central1")); // Deployment is only supported in us-central1.
      parameterValues.put("queries_path", stringToValue(queriesPath));
      parameterValues.put("corpus_path", stringToValue(corpusPath));
      parameterValues.put("train_label_path", stringToValue(trainLabelPath));
      parameterValues.put("test_label_path", stringToValue(testLabelPath));
      parameterValues.put("batch_size", numberToValue(batchSize));
      parameterValues.put("iterations", numberToValue(iterations));

      RuntimeConfig runtimeConfig =
          RuntimeConfig.newBuilder()
              .setGcsOutputDirectory(outputDir)
              .putAllParameterValues(parameterValues)
              .build();

      PipelineJob pipelineJob =
          PipelineJob.newBuilder()
              .setTemplateUri(
                  "https://us-kfp.pkg.dev/ml-pipeline/llm-text-embedding/tune-text-embedding-model/v1.1.2")
              .setDisplayName(pipelineJobDisplayName)
              .setRuntimeConfig(runtimeConfig)
              .build();

      LocationName parent = LocationName.of(project, location);
      CreatePipelineJobRequest request =
          CreatePipelineJobRequest.newBuilder()
              .setParent(parent.toString())
              .setPipelineJob(pipelineJob)
              .build();

      PipelineJob response = client.createPipelineJob(request);
      System.out.format("response: %s\n", response);
      System.out.format("Name: %s\n", response.getName());
    }
  }

  static Value stringToValue(String str) {
    return Value.newBuilder().setStringValue(str).build();
  }

  static Value numberToValue(int n) {
    return Value.newBuilder().setNumberValue(n).build();
  }
}

// [END aiplatform_sdk_embedding_tuning]

Console

To tune a text embedding model by using the Google Cloud console, you can launch a customization pipeline using the following steps:

  1. In the Vertex AI section of the Google Cloud console, go to the Vertex AI Pipelines page.

    Go to Vertex AI Pipelines

  2. Click Create run to open the Create pipeline run pane.
  3. Click Select from existing pipelines and enter the following details:
    1. Select "ml-pipeline" from the select a resource drop-down.
    2. Select "llm-text-embedding" from the Repository drop-down.
    3. Select "tune-text-embedding-model" from the Pipeline or component drop-down.
    4. Select the version labeled "v1.1.2" from the Version drop-down.
  4. Specify a Run name to uniquely identify the pipeline run.
  5. In the Region drop-down list, select the region to create the pipeline run. Currently, only us-central1 is supported.
  6. Click Continue. The Runtime configuration pane appears.
  7. Under Cloud storage location, click Browse to select the Cloud Storage bucket for storing the pipeline output artifacts, and then click Select.
  8. under Pipeline parameters, specify your parameters for the tuning pipeline. Please refer to the REST documentation for the meaning of the parameters.
  9. Click Submit to create your pipeline run.

Example curl command

PROJECT_ID=PROJECT_ID
LOCATION=LOCATION
BASE_MODEL_VERSION_ID=BASE_MODEL_VERSION_ID
PIPELINE_OUTPUT_DIRECTORY=PIPELINE_OUTPUT_DIRECTORY
QUERIES_PATH=QUERIES_PATH
CORPUS_PATH=CORPUS_PATH
TRAIN_LABEL_PATH=TRAIN_LABEL_PATH


curl -X POST  \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json; charset=utf-8" \
"https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/pipelineJobs?pipelineJobId=tune-text-embedding-$(date +%Y%m%d%H%M%S)" \
-d '{
  "displayName": "tune-text-embedding-model",
  "runtimeConfig": {
    "gcsOutputDirectory": "'${PIPELINE_OUTPUT_DIRECTORY}'",
    "parameterValues": {
      "project":  "'${PROJECT_ID}'",
      "base_model_version_id":  "'${BASE_MODEL_VERSION_ID}'",
      "location":   "'${LOCATION}'",
      "queries_path":  "'${QUERIES_PATH}'",
      "corpus_path":  "'${CORPUS_PATH}'",
      "train_label_path":  "'${TRAIN_LABEL_PATH}'"
    }
  },
  "templateUri": "https://us-kfp.pkg.dev/ml-pipeline/llm-text-embedding/tune-text-embedding-model/v1.1.2"
}'

Use your tuned model

View tuned models in Model Registry

When your tuning job completes, the tuned model isn't automatically deployed to an endpoint. It will be available as a Model resource in Model Registry. You can view a list of models in your current project, including your tuned models, by using the Google Cloud console.

To view your tuned models in the Google Cloud console, go to the Vertex AI Model Registry page.

Go to Vertex AI Model Registry

Deploy your model

After you've tuned the embeddings model, you need to deploy the Model resource. To deploy your tuned embeddings model, see Deploy a model to an endpoint.

Unlike foundation models, tuned text embedding models are managed by the user. This includes managing serving resources, like machine type and accelerators. To prevent out-of-memory errors during prediction, it's recommended that you deploy using the NVIDIA_TESLA_A100 GPU type, which can support batch sizes up to 5 for any input length.

Similar to the textembedding-gecko foundation model, your tuned model supports up to 3072 tokens and can truncate longer inputs.

Get predictions on a deployed model

Once your tuned model is deployed, you can use one of the following commands to issue requests to the tuned model endpoint.

Example curl commands for tuned textembedding-gecko@001 models

To get predictions from a tuned version of textembedding-gecko@001, use the example curl command below.

PROJECT_ID=PROJECT_ID
LOCATION=LOCATION
ENDPOINT_URI=https://${LOCATION}-aiplatform.googleapis.com
MODEL_ENDPOINT=TUNED_MODEL_ENDPOINT_ID

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json"  \
    ${ENDPOINT_URI}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${MODEL_ENDPOINT}:predict \
    -d '{
  "instances": [
    {
      "content": "Dining in New York City"
    },
    {
      "content": "Best resorts on the east coast"
    }
  ]
}'

Example curl commands for non textembedding-gecko@001 models

Tuned versions of other models (for example, textembedding-gecko@003 and textembedding-gecko-multilingual@001) require 2 additional inputs: task_type and title. More documentation for these parameters can be found at curl command

PROJECT_ID=PROJECT_ID
LOCATION=LOCATION
ENDPOINT_URI=https://${LOCATION}-aiplatform.googleapis.com
MODEL_ENDPOINT=TUNED_MODEL_ENDPOINT_ID

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json"  \
    ${ENDPOINT_URI}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${MODEL_ENDPOINT}:predict \
    -d '{
  "instances": [
    {
      "content": "Dining in New York City",
      "task_type": "DEFAULT",
      "title": ""
    },
    {
      "content": "There are many resorts to choose from on the East coast...",
      "task_type": "RETRIEVAL_DOCUMENT",
      "title": "East Coast Resorts"
    }
  ]
}'

Example output

This output applies to both textembedding-gecko and textembedding-gecko-multilingual models, regardless of version.

{
 "predictions": [
   [ ... ],
   [ ... ],
   ...
 ],
 "deployedModelId": "...",
 "model": "projects/.../locations/.../models/...",
 "modelDisplayName": "tuned-text-embedding-model",
 "modelVersionId": "1"
}

What's next