This page shows you how to connect your Vertex AI RAG Engine to Vertex AI Vector Search.
You can also follow along using this notebook Vertex AI RAG Engine with Vertex AI Vector Search.
Vertex AI RAG Engine is a powerful tool that uses a built-in vector database powered by Spanner to store and manage vector representations of text documents. The vector database enables efficient retrieval of relevant documents based on the documents' semantic similarity to a given query. By integrating Vertex AI Vector Search as an additional vector database with Vertex AI RAG Engine, you can use the capabilities of Vector Search to handle data volumes with low latency to improve the performance and scalability of your RAG applications.
Vertex AI Vector Search setup
Vertex AI Vector Search is based on Vector Search technology developed by Google research. With Vector Search you can use the same infrastructure that provides a foundation for Google products such as Google Search, YouTube, and Google Play.
To integrate with Vertex AI RAG Engine, an empty Vector Search index is required.
Set up Vertex AI SDK
To set up Vertex AI SDK, see Setup.
Create Vector Search index
To create a Vector Search index that's compatible with your RAG Corpus, the index has to meet the following criteria:
IndexUpdateMethod
must beSTREAM_UPDATE
, see Create stream index.Distance measure type must be explicitly set to one of the following:
DOT_PRODUCT_DISTANCE
COSINE_DISTANCE
Dimension of the vector must be consistent with the embedding model you plan to use in the RAG corpus. Other parameters can be tuned based on your choices, which determine whether the additional parameters can be tuned.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Create Vector Search index endpoint
Public endpoints are supported by Vertex AI RAG Engine.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Deploy an index to an index endpoint
Before we do the nearest neighbor search, the index has to be deployed to an index endpoint.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
If it's the first time that you're deploying an index to an index endpoint, it takes approximately 30 minutes to automatically build and initiate the backend before the index can be stored. After the first deployment, the index is ready in seconds. To see the status of the index deployment, open the Vector Search Console, select the Index endpoints tab, and choose your index endpoint.
Identify the resource name of your index and index endpoint, which have the following formats:
projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexes/${INDEX_ID}
projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexEndpoints/${INDEX_ENDPOINT_ID}
.
Use Vertex AI Vector Search in Vertex AI RAG Engine
After the Vector Search instance is set up, follow the steps in this section to set the Vector Search instance as the vector database for the RAG application.
Set the vector database to create a RAG corpus
When you create the RAG corpus, specify only the full INDEX_ENDPOINT_NAME
and
INDEX_NAME
. Make sure to use the numeric ID for both index and index endpoint
resource names. The RAG corpus is created and automatically associated with the
Vector Search index. Validations are performed on the
criteria. If any of the requirements aren't met,
the request is rejected.
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
# TODO(developer): Update and un-comment the following lines:
# CORPUS_DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"
# Full index/indexEndpoint resource name
# Index: projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexes/${INDEX_ID}
# IndexEndpoint: projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexEndpoints/${INDEX_ENDPOINT_ID}
# INDEX_RESOURCE_NAME = "YOUR_INDEX_ENDPOINT_RESOURCE_NAME"
# INDEX_NAME = "YOUR_INDEX_RESOURCE_NAME"
# Call CreateRagCorpus API to create a new RagCorpus
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora -d '{
"display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
"rag_vector_db_config" : {
"vertex_vector_search": {
"index":'\""${INDEX_NAME}"\"'
"index_endpoint":'\""${INDEX_ENDPOINT_NAME}"\"'
}
}
}'
# Call ListRagCorpora API to verify the RagCorpus is created successfully
curl -sS -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora"
Import files using the RAG API
Use the ImportRagFiles
API to import
files from Cloud Storage or
Google Drive into the Vector Search index. The files are embedded and
stored in the Vector Search index.
REST
# TODO(developer): Update and uncomment the following lines:
# RAG_CORPUS_ID = "your-rag-corpus-id"
#
# Google Cloud Storage bucket/file location.
# For example, "gs://rag-fos-test/"
# GCS_URIS= "your-gcs-uris"
# Call ImportRagFiles API to embed files and store in the BigQuery table
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora/${RAG_CORPUS_ID}/ragFiles:import \
-d '{
"import_rag_files_config": {
"gcs_source": {
"uris": '\""${GCS_URIS}"\"'
},
"rag_file_chunking_config": {
"chunk_size": 512
}
}
}'
# Call ListRagFiles API to verify the files are imported successfully
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/ragCorpora/${RAG_CORPUS_ID}/ragFiles
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Retrieve relevant contexts using the RAG API
After completion of the file imports, the relevant context can be retrieved from
the Vector Search index by using the RetrieveContexts
API.
REST
# TODO(developer): Update and uncomment the following lines:
# RETRIEVAL_QUERY="your-retrieval-query"
#
# Full RAG corpus resource name
# Format:
# "projects/${PROJECT_ID}/locations/us-central1/ragCorpora/${RAG_CORPUS_ID}"
# RAG_CORPUS_RESOURCE="your-rag-corpus-resource"
# Call RetrieveContexts API to retrieve relevant contexts
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1:retrieveContexts \
-d '{
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": '\""${RAG_CORPUS_RESOURCE}"\"',
},
},
"query": {
"text": '\""${RETRIEVAL_QUERY}"\"',
"similarity_top_k": 10
}
}'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Generate content using Vertex AI Gemini API
To generate content using Gemini models, make a call to the
Vertex AI GenerateContent
API. By specifying the
RAG_CORPUS_RESOURCE
in the request, the API automatically retrieves data from
the Vector Search index.
REST
# TODO(developer): Update and uncomment the following lines:
# MODEL_ID=gemini-1.5-flash-001
# GENERATE_CONTENT_PROMPT="your-generate-content-prompt"
# GenerateContent with contexts retrieved from the FeatureStoreOnline index
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/${MODEL_ID}:generateContent \
-d '{
"contents": {
"role": "user",
"parts": {
"text": '\""${GENERATE_CONTENT_PROMPT}"\"'
}
},
"tools": {
"retrieval": {
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": '\""${RAG_CORPUS_RESOURCE}"\"',
},
"similarity_top_k": 8,
"vector_distance_threshold": 0.32
}
}
}
}'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.