This page shows you how to tune the text embedding models
like textembedding-gecko
and textembedding-gecko-multilingual
.
Foundation embedding models are pre-trained on a massive dataset of text, providing
a strong baseline for many tasks. For scenarios requiring specialized knowledge or
highly tailored performance, model tuning enables you to fine-tune the model's
representations using your own relevant data. Tuning is supported
for stable versions
of textembedding-gecko
and textembedding-gecko-multilingual
models.
Text embedding models support supervised tuning. Supervised tuning uses labeled examples that demonstrate the type of output you'd like from your text embedding model during inference.
To learn more about model tuning, see How model tuning works.
Expected quality improvement
Vertex AI uses a parameter efficient tuning method for customization. This methodology shows significant gains in quality of up to 41% (average 12%) on experiments performed on public retrieval benchmark datasets.
Use case for tuning an embedding model
Tuning a text embeddings model can enable your model to adapt to the embeddings to a specific domain or task. This can be useful if the pre-trained embeddings model is not well-suited to your specific needs. For example, you might fine-tune an embeddings model on a specific dataset of customer support tickets for your company. This could help a chatbot understand the different types of customer support issues your customers typically have, and be able to answer their questions more effectively. Without tuning, the model doesn't know the specifics of your customer support tickets or the solutions to specific problems for your product.
Tuning workflow
The model tuning workflow on Vertex AI for textembedding-gecko
and
textembedding-gecko-multilingual
is as follows:
- Prepare your model tuning dataset.
- Upload the model tuning dataset to a Cloud Storage bucket.
- Configure your project for Vertex AI Pipelines.
- Create a model tuning job.
- Deploy the tuned model to a Vertex AI endpoint of the same name. Unlike text or Codey model tuning jobs, a text embedding tuning job doesn't deploy your tuned models to a Vertex AI endpoint.
Prepare your embeddings dataset
The dataset used to tune an embeddings model includes data that align with the task that you want the model to perform.
Dataset format for tuning an embeddings model
The training dataset consists of the following files, which need to be in Cloud Storage. The path of the files are defined by parameters when launching the tuning pipeline. The three types of files are the corpus file, query file, and labels. Only train labels are necessary, but you may also provide validation and test labels for greater control.
Corpus file: The path is defined by parameter
corpus_path
. It's a JSONL file where each line has the fields_id
,title
, andtext
with string values._id
andtext
are required, whiletitle
is optional. Here is an examplecorpus.jsonl
file:{"_id": "doc1", "title": "Get an introduction to generative AI on Vertex AI", "text": "Vertex AI Studio offers a Google Cloud console tool for rapidly prototyping and testing generative AI models. Learn how you can use Vertex AI Studio to test models using prompt samples, design and save prompts, tune a foundation model, and convert between speech and text."} {"_id": "doc2", "title": "Use gen AI for summarization, classification, and extraction", "text": "Learn how to create text prompts for handling any number of tasks with Vertex AI's generative AI support. Some of the most common tasks are classification, summarization, and extraction. Vertex AI's PaLM API for text lets you design prompts with flexibility in terms of their structure and format."} {"_id": "doc3", "title": "Custom ML training overview and documentation", "text": "Get an overview of the custom training workflow in Vertex AI, the benefits of custom training, and the various training options that are available. This page also details every step involved in the ML training workflow from preparing data to predictions."} {"_id": "doc4", "text": "Text embeddings are useful for clustering, information retrieval, retrieval-augmented generation (RAG), and more."} {"_id": "doc5", "title": "Text embedding tuning", "text": "Google's text embedding models can be tuned on Vertex AI."}
Query file: The query file contains your example queries. The path is defined by the parameter
queries_path
. The query file is in JSONL format and has the same fields as the corpus file. Here is an examplequeries.jsonl
file:{"_id": "query1", "text": "Does Vertex support generative AI?"} {"_id": "query2", "text": "What can I do with Vertex GenAI offerings?"} {"_id": "query3", "text": "How do I train my models using Vertex?"} {"_id": "query4", "text": "What is a text embedding?"} {"_id": "query5", "text": "Can text embedding models be tuned on Vertex?"} {"_id": "query6", "text": "embeddings"} {"_id": "query7", "text": "embeddings for rag"} {"_id": "query8", "text": "custom model training"} {"_id": "query9", "text": "Google Cloud PaLM API"}
Training labels: The path is defined by the parameter
train_label_path
. The train_label_path is the Cloud Storage URI to the train label data location and is specified when you create your tuning job. The labels need to be a TSV file with a header. A subset of the queries and the corpus need be included in your training labels file. The file must have the columnsquery-id
,corpus-id
andscore
. Thequery-id
is a string that matches the_id
key from the query file, thecorpus-id
is a string that matches the_id
in the corpus file.Score
is a non-negative integer value. If a pair of query and document is unrelated, you may either leave it out of the training labels file, or include it with a score of zero. Any score greater than zero indicates that the document is related to the query. Larger numbers indicate a greater level of relevance. If the score is omitted, the default value is 1. Here is an exampletrain_labels.tsv
file:query-id corpus-id score query1 doc1 1 query2 doc2 1 query3 doc3 2 query3 doc5 1 query4 doc4 1 query4 doc5 1 query5 doc5 2 query6 doc4 1 query6 doc5 1 query7 doc4 1 query8 doc3 1 query9 doc2 1
Test labels: Optional. The test labels have the same format as the training labels and are specified by the
test_label_path
parameter. If notest_label_path
is provided, the test labels will be autosplit from the training labels.Validation labels: Optional. The validation labels have the same format as the training labels and are specified by the
validation_label_path
parameter. If novalidation_label_path
is provided, the validation labels will be autosplit from the training labels.
Dataset size requirements
The provided dataset files must meet the following constraints:
- The number of queries must be between 9 and 10,000.
- The number of documents in the corpus must be between 9 and 500,000.
- Each dataset label file must include at least 3 query IDs, and across all dataset splits there must be at least 9 query IDs.
- The total number of labels must be less than 500,000.
Configure your project for Vertex AI Pipelines
Tuning is executed within your project using the Vertex AI Pipelines platform.
Configuring permissions
The pipeline executes training code under two service agents. These service agents must be granted specific roles in order to begin training using your project and dataset.
Compute Engine default service account
PROJECT_NUMBER-compute@developer.gserviceaccount.com
This service account requires:
Storage Object Viewer
access to each dataset file you created in Cloud Storage.Storage Object User
access to the output Cloud Storage directory of your pipeline, PIPELINE_OUTPUT_DIRECTORY.Vertex AI User
access to your project.
Instead of the Compute Engine default service account, you can specify a custom service account. For more information, see Configure a service account with granular permissions.
Vertex AI Tuning Service Agent
service-PROJECT_NUMBER@gcp-sa-aiplatform-ft.iam.gserviceaccount.com
This service account requires:
Storage Object Viewer
access to each dataset file you created in Cloud Storage.Storage Object User
access to the output Cloud Storage directory of your pipeline, PIPELINE_OUTPUT_DIRECTORY.
For more information about configuring Cloud Storage dataset permissions, see Configure a Cloud Storage bucket for pipeline artifacts.
Using Accelerators
Tuning requires GPU accelerators. Any of the following accelerators can be used for the text embedding tuning pipeline:
NVIDIA_L4
NVIDIA_TESLA_A100
NVIDIA_TESLA_T4
NVIDIA_TESLA_V100
NVIDIA_TESLA_P100
Launching a tuning job requires adequate Restricted image training GPUs
quota for the accelerator type and region you have selected, for example Restricted image training Nvidia V100 GPUs per region
. To increase the quota of your project, see request additional quota.
Not all accelerators are available in all regions. See Using accelerators in Vertex AI for more information.
Create an embedding model tuning job
You can create an embedding model tuning job by using the Google Cloud console, REST API, or client libraries.
REST
To create an embedding model tuning job, use the
projects.locations.pipelineJobs.create
method.
Before using any of the request data, make the following replacements:
PROJECT_ID
: Your Google Cloud project ID.PIPELINE_OUTPUT_DIRECTORY
: Path for the pipeline output artifacts, starting with "gs://".
HTTP method and URL:
POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/pipelineJobs
Request JSON body:
{ "displayName": "tune_text_embeddings_model_sample", "runtimeConfig": { "gcsOutputDirectory": "PIPELINE_OUTPUT_DIRECTORY", "parameterValues": { "corpus_path": "gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/corpus.jsonl", "queries_path": "gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/queries.jsonl", "train_label_path": "gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/train.tsv", "test_label_path": "gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/test.tsv", "base_model_version_id":"text-embedding-004", "task_type": "DEFAULT", "batch_size": "128", "train_steps": "1000", "output_dimensionality": "768", "learning_rate_multiplier": "1.0" } }, "templateUri": "https://us-kfp.pkg.dev/ml-pipeline/llm-text-embedding/tune-text-embedding-model/v1.1.3" }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
After launching the pipeline, follow the progress of your tuning job through the Google Cloud console.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Console
To tune a text embedding model by using the Google Cloud console, you can launch a customization pipeline using the following steps:
- In the Vertex AI section of the Google Cloud console, go to the Vertex AI Pipelines page.
- Click Create run to open the Create pipeline run pane.
- Click Select from existing pipelines and enter the following details:
- Select "ml-pipeline" from the select a resource drop-down.
- Select "llm-text-embedding" from the Repository drop-down.
- Select "tune-text-embedding-model" from the Pipeline or component drop-down.
- Select the version labeled "v1.1.3" from the Version drop-down.
- Specify a Run name to uniquely identify the pipeline run.
- In the Region drop-down list, select the region to create the pipeline run, which will be the same region in which your tuned model is created.
- Click Continue. The Runtime configuration pane appears.
- Under Cloud storage location, click Browse to select the Cloud Storage bucket for storing the pipeline output artifacts, and then click Select.
- Under Pipeline parameters, specify your parameters for the tuning pipeline. The three required parameters are
corpus_path
,queries_path
, andtrain_label_path
, with formats described in Prepare your embeddings dataset. For more detailed information about each parameter, refer to the REST tab of this section. - Click Submit to create your pipeline run.
Other supported features
Text embedding tuning supports VPC Service Controls and can be configured to run within a Virtual Private Cloud (VPC) by passing the network
parameter when creating the PipelineJob
.
To use CMEK (customer-managed encryption keys), pass the key to the parameterValues.encryption_spec_key_name
pipeline parameter, as well as the encryptionSpec.kmsKeyName
parameter when creating the PipelineJob
.
Use your tuned model
View tuned models in Model Registry
When your tuning job completes, the tuned model isn't automatically deployed to an endpoint. It will be available as a Model resource in Model Registry. You can view a list of models in your current project, including your tuned models, by using the Google Cloud console.
To view your tuned models in the Google Cloud console, go to the Vertex AI Model Registry page.
Go to Vertex AI Model Registry
Deploy your model
After you've tuned the embeddings model, you need to deploy the Model resource. To deploy your tuned embeddings model, see Deploy a model to an endpoint.
Unlike foundation models, tuned text embedding models are managed by the user.
This includes managing serving resources, like machine type and accelerators.
To prevent out-of-memory errors during prediction, it's recommended that you deploy
using the NVIDIA_TESLA_A100
GPU type, which can support batch sizes up to 5
for any input length.
Similar to the textembedding-gecko
foundation model, your tuned model supports
up to 3072 tokens and can truncate longer inputs.
Get predictions on a deployed model
Once your tuned model is deployed, you can use one of the following commands to issue requests to the tuned model endpoint.
Example curl commands for tuned textembedding-gecko@001
models
To get predictions from a tuned version of textembedding-gecko@001
, use the example
curl command below.
PROJECT_ID=PROJECT_ID
LOCATION=LOCATION
ENDPOINT_URI=https://${LOCATION}-aiplatform.googleapis.com
MODEL_ENDPOINT=TUNED_MODEL_ENDPOINT_ID
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
${ENDPOINT_URI}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${MODEL_ENDPOINT}:predict \
-d '{
"instances": [
{
"content": "Dining in New York City"
},
{
"content": "Best resorts on the east coast"
}
]
}'
Example curl commands for non textembedding-gecko@001
models
Tuned versions of other models (for example, textembedding-gecko@003
and textembedding-gecko-multilingual@001
)
require 2 additional inputs: task_type
and title
.
More documentation for these parameters can be found at
curl command
PROJECT_ID=PROJECT_ID
LOCATION=LOCATION
ENDPOINT_URI=https://${LOCATION}-aiplatform.googleapis.com
MODEL_ENDPOINT=TUNED_MODEL_ENDPOINT_ID
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
${ENDPOINT_URI}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${MODEL_ENDPOINT}:predict \
-d '{
"instances": [
{
"content": "Dining in New York City",
"task_type": "DEFAULT",
"title": ""
},
{
"content": "There are many resorts to choose from on the East coast...",
"task_type": "RETRIEVAL_DOCUMENT",
"title": "East Coast Resorts"
}
]
}'
Example output
This output applies to both textembedding-gecko
and textembedding-gecko-multilingual
models, regardless of version.
{
"predictions": [
[ ... ],
[ ... ],
...
],
"deployedModelId": "...",
"model": "projects/.../locations/.../models/...",
"modelDisplayName": "tuned-text-embedding-model",
"modelVersionId": "1"
}
What's next
- To get batch predictions for embeddings, see Get batch text embeddings predictions
- To learn more about multimodal embeddings, see Get multimodal embeddings
- For information about text-only use cases (text-based semantic search, clustering, long-form document analysis, and other text retrieval or question-answering use cases), read Get text embeddings.