This page describes your choices of embedding models, lists the embedding models, and shows you how to use your choice of embedding models to create a RAG corpus. The association between your model and the RAG corpus remains fixed for the lifetime of your RAG corpus.
Embeddings are numerical representations of inputs. You can use embeddings in your applications to recognize complex meanings and semantic relationships and to process and produce language.
Embeddings work by converting text, image, and video into arrays of floating point numbers called vectors. The closer two vectors are in their embedding space, the greater the similarity of their inputs.
Embedding models are an important component of semantic retrieval systems. The performance of a retrieval system depends on how well the embedding model maps relationships in your data.
Vertex AI RAG Engine implements retrieval-augmented generation (RAG), and it offers you the choice of the following embedding models to use within a RAG corpus:
- Vertex AI text embedding models: Models trained by the publisher, such as Google. The models are trained on a large dataset of text, and provide a strong baseline for many tasks.
- Fine-tuned Vertex AI text embedding models: Models trained to have a specialized knowledge or highly-tailored performance.
- OSS embedding models: Third-party open-source embedding models in English-only and multilingual variants.
Embedding models
Embedding models are used to create a corpus and for search and retrieval during response generation. This section lists the supported embedding models.
textembedding-gecko@003
textembedding-gecko-multilingual@001
text-embedding-004
(default)text-multilingual-embedding-002
textembedding-gecko@002
(fine-tuned versions only)textembedding-gecko@001
(fine-tuned versions only)
For more information about tuning embedding models, see Tune text embeddings.
The following open embedding models are also supported. You can find them in Model Garden.
e5-base-v2
e5-large-v2
e5-small-v2
multilingual-e5-large
multilingual-e5-small
Use Vertex AI text embedding models
The Vertex AI text embedding API uses the Gecko embedding models, which produces a dense embedding vector with 768 dimensions. Dense embeddings store the meaning of text unlike sparse vectors, which tend to directly map words to numbers. The benefit of using dense vector embeddings in generative AI is that instead of searching for a direct word or syntax matches, you can better search for passages that align to the meaning of the query, even if the passages don't use the same language.
Gecko models are available in English-only and multilingual versions. Unlike fine-tuned Gecko models, publisher Gecko models aren't required to be deployed, which makes them the easiest set of models to use with Vertex AI RAG Engine. The following Gecko models are recommended for use with a RAG corpus:
text-embedding-004
text-multilingual-embedding-002
textembedding-gecko@003
textembedding-gecko-multilingual@001
If you don't specify an embedding model when you create your RAG corpus,
Vertex AI RAG Engine assigns by default the text-embedding-004
model for your
RAG corpus.
The publisher Gecko models might deprecate. If that happens, the publisher Gecko models can't be used with Vertex AI RAG Engine, even for a RAG corpus that was created prior to the deprecation. When your Gecko model deprecates, you must migrate the RAG corpus, which means that you create a RAG corpus and import the data. An alternative is to use a fine-tuned Gecko model or a self-deployed OSS embedding model, which is supported after the model deprecates.
These code samples demonstrate how to create a RAG corpus with a publisher Gecko model.
curl
ENDPOINT=us-central1-aiplatform.googleapis.com
PROJECT_ID=YOUR_PROJECT_ID
// Set this to your choice of publisher Gecko model. Note that the full resource name of the publisher model is required.
// Example: projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/text-embedding-004
ENDPOINT_NAME=YOUR_ENDPOINT_NAME
// Set a display name for your corpus.
// For example, "my test corpus"
CORPUS_DISPLAY_NAME=YOUR_CORPUS_DISPLAY_NAME
// CreateRagCorpus
// Input: ENDPOINT, PROJECT_ID, ENDPOINT_NAME, CORPUS_DISPLAY_NAME
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ENDPOINT}/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora \
-d '{
"display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
"rag_embedding_model_config" : {
"vertex_prediction_endpoint": {
"endpoint": '\""${ENDPOINT_NAME}"\"'
}
}
}'
Vertex AI SDK for Python
import vertexai
from vertexai import rag
# Set Project
PROJECT_ID = "YOUR_PROJECT_ID"
vertexai.init(project=${PROJECT_ID}, location="us-central1")
# Configure a Google first-party embedding model
embedding_model_config = rag.EmbeddingModelConfig(
publisher_model="publishers/google/models/text-embedding-004"
)
# Name your corpus
DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"
rag_corpus = rag.create_corpus(
display_name=DISPLAY_NAME, embedding_model_config=embedding_model_config
)
Use fine-tuned Vertex AI text embedding models
Although the foundation publisher models are trained on a large dataset of text and provide a strong baseline for many tasks, there might be scenarios where you might require the models to have a specialized knowledge or highly-tailored performance. In such cases, model tuning lets you fine tune the model's representations using your relevant data. An additional benefit of this approach is that when the model is fine tuned, the resulting image is owned by you and is unaffected by the Gecko model deprecation. All fine-tuned Gecko embedding models produce embeddings with 768-dimensional vectors. To learn more about these models, see Get text embeddings.
To fine tune a Gecko model, see Tune text embeddings.
These code samples demonstrate how to create a RAG corpus with your deployed, fine-tuned Gecko model.
curl
ENDPOINT=us-central1-aiplatform.googleapis.com
PROJECT_ID=YOUR_PROJECT_ID
// Your Vertex AI endpoint resource with the deployed fine-tuned Gecko model
// Example: projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}
ENDPOINT_NAME=YOUR_ENDPOINT_NAME
// Set a display name for your corpus.
// For example, "my test corpus"
CORPUS_DISPLAY_NAME=YOUR_CORPUS_DISPLAY_NAME
// CreateRagCorpus
// Input: ENDPOINT, PROJECT_ID, ENDPOINT_NAME, CORPUS_DISPLAY_NAME
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ENDPOINT}/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora \
-d '{
"display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
"rag_embedding_model_config" : {
"vertex_prediction_endpoint": {
"endpoint": '\""${ENDPOINT_NAME}"\"'
}
}
}'
Vertex AI SDK for Python
import vertexai
from vertexai import rag
# Set Project
PROJECT_ID = "YOUR_PROJECT_ID"
vertexai.init(project=${PROJECT_ID}, location="us-central1")
# Your Vertex Endpoint resource with the deployed fine-tuned Gecko model
ENDPOINT_ID = "YOUR_MODEL_ENDPOINT_ID"
MODEL_ENDPOINT = "projects/${PROJECT_ID}/locations/us-central1/endpoints/${ENDPOINT_ID}"
embedding_model_config = rag.EmbeddingModelConfig(
endpoint=${MODEL_ENDPOINT},
)
# Name your corpus
DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"
rag_corpus = rag.create_corpus(
display_name=${DISPLAY_NAME}, embedding_model_config=embedding_model_config
)
Use OSS embedding models
Vertex AI RAG Engine supports third-party open-source embedding models in English-only and multilingual variants. This table lists the supported E5 models.
Model version | Base model | Parameters | embedding dimension | English only |
---|---|---|---|---|
e5-base-v2 |
MiniLM |
109M | 768 | ✔ |
e5-large-v2 |
MiniLM |
335M | 1,024 | ✔ |
e5-small-v2 |
MiniLM |
33M | 384 | ✔ |
multilingual-e5-large |
xlm-roberta-large |
560M | 1,024 | ✗ |
multilingual-e5-small |
microsoft/Multilingual-MiniLM-L12-H384 |
118M | 384 | ✗ |
In order to use E5 models with Vertex AI RAG Engine, the E5 model must be deployed from Model Garden. See E5 Text Embedding in the Google Cloud console to deploy your E5 model.
These code samples demonstrate how to create RAG corpus with your deployed E5 model.
curl
ENDPOINT=us-central1-aiplatform.googleapis.com
PROJECT_ID=YOUR_PROJECT_ID
// Your Vertex Endpoint resource with the deployed E5 model
// Example: projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}
ENDPOINT_NAME=YOUR_ENDPOINT_NAME
// Set a display name for your corpus.
// For example, "my test corpus"
CORPUS_DISPLAY_NAME=YOUR_CORPUS_DISPLAY_NAME
// CreateRagCorpus
// Input: ENDPOINT, PROJECT_ID, ENDPOINT_NAME, CORPUS_DISPLAY_NAME
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ENDPOINT}/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora \
-d '{
"display_name" : '\""${CORPUS_DISPLAY_NAME</var>}"\"',
"rag_embedding_model_config" : {
"vertex_prediction_endpoint": {
"endpoint": '\""${ENDPOINT_NAME}"\"'
}
}
}'
Vertex AI SDK for Python
import vertexai
from vertexai import rag
# Set Project
PROJECT_ID = "YOUR_PROJECT_ID"
vertexai.init(project=PROJECT_ID, location="us-central1")
# Your Vertex Endpoint resource with the deployed E5 model
ENDPOINT_ID = "YOUR_MODEL_ENDPOINT_ID"
MODEL_ENDPOINT = "projects/{PROJECT_ID}/locations/us-central1/endpoints/{ENDPOINT_ID}"
embedding_model_config = rag.EmbeddingModelConfig(
endpoint=MODEL_ENDPOINT,
)
# Name your corpus
DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"
rag_corpus = rag.create_corpus(
display_name=DISPLAY_NAME, embedding_model_config=embedding_model_config
)