This guide provides an API reference for RAG Engine. It covers the following topics:
- Parameters list: Describes the API parameters for managing corpora, files, and projects.
- Corpus management examples: Provides examples for creating, updating, listing, and deleting RAG corpora.
- File management examples: Shows how to upload, import, and manage files within a RAG corpus.
- Retrieval and prediction examples: Includes examples for retrieving contexts and generating grounded responses.
- Project management examples: Explains how to configure project-level settings for the RAG engine.
The Vertex AI RAG Engine is a component of the Vertex AI platform that facilitates Retrieval-Augmented Generation (RAG). RAG Engine enables Large Language Models (LLMs) to access and incorporate data from external knowledge sources, such as documents and databases. By using RAG, LLMs can generate more accurate and informative responses.
Parameters list
Corpus management parameters
For information about a RAG corpus, see Corpus management.
Create a RAG corpus
This table lists the parameters used to create a RAG corpus.
Body Request
Parameter | Description |
---|---|
|
Optional, Immutable. The configuration to specify the corpus type. |
|
Required. The display name of the RAG corpus. |
|
Optional. The description of the RAG corpus. |
|
Optional, Immutable. The CMEK key name used to encrypt at-rest data related to the RAG corpus. This key name is only applicable to the Format: |
|
Optional, Immutable. The configuration for the vector databases. |
|
Optional. The configuration for Vertex AI Search. Format: |
CorpusTypeConfig
Parameter | Description |
---|---|
|
The default value of |
|
If you set this type, the RAG corpus is a For more information, see Use Vertex AI RAG Engine as the memory store. |
|
The LLM parser that's used to parse and store session contexts from the Gemini Live API. You can build memories for indexing. |
RagVectorDbConfig
When you create a RAG corpus, you can choose from several vector database options. The following table provides a comparison to help you select the best option for your use case.
Vector Database Option | Description | Use Case |
---|---|---|
rag_managed_db |
A fully managed vector database provided by Google. | For users who want an integrated solution without managing their own database infrastructure. |
weaviate |
Connects to a self-managed Weaviate instance. | For users who already have or prefer to use a Weaviate vector database. |
pinecone |
Connects to a self-managed Pinecone instance. | For users who already have or prefer to use a Pinecone vector database. |
vertex_feature_store |
Uses an existing Vertex AI Feature Store instance. | For integrating RAG with existing machine learning features and data in Vertex AI Feature Store. |
vertex_vector_search |
Uses an existing Vector Search index. | For leveraging the advanced scalability and performance of Vector Search for large-scale applications. |
If you choose the rag_managed_db
option, you must also select a retrieval strategy. The following table compares the available strategies.
Retrieval Strategy | Description | Pros | Cons |
---|---|---|---|
KNN (k-Nearest Neighbors) |
Finds the exact nearest neighbors by performing an exhaustive search across all data points. | Provides the most accurate and relevant results. | Can be slower and more computationally expensive, especially with large datasets. |
ANN (Approximate Nearest Neighbor) |
Finds neighbors that are likely to be the closest, trading some accuracy for significant speed improvements. | Much faster query performance and lower latency. | Results are approximate and might not be the absolute most relevant. |
Parameter | Description |
---|---|
|
If no vector database is specified, |
|
This is the default retrieval strategy. Finds the exact nearest neighbors by comparing all data points in your RAG corpus. If you don't specify a strategy when you create your RAG corpus, KNN is used. |
|
tree_depth Determines the number of layers or levels in the tree.
leaf_count Determines the number of leaf nodes in the tree-based structure.
rebuild_ann_index
|
|
Specifies your Weaviate instance. |
|
The Weaviate instance's HTTP endpoint. This value can't be changed after it's set. You can leave this field empty in the
|
|
The Weaviate collection that the RAG corpus maps to. This value can't be changed after it's set. You can leave this field empty in the
|
|
Specifies your Pinecone instance. |
|
The name used to create the Pinecone index that's used with the RAG corpus. This value can't be changed after it's set. You can leave this field empty in the
|
|
Specifies your Vertex AI Feature Store instance. |
|
The Vertex AI Feature Store Format: This value can't be changed after it's set. You can leave this field empty in the
|
|
Specifies your Vector Search instance. |
|
The resource name of the Vector Search index that's used with the RAG corpus. Format: This value can't be changed after it's set. You can leave this field empty in the
|
|
The resource name of the Vector Search index endpoint that's used with the RAG corpus. Format: This value can't be changed after it's set. You can leave this field empty in the
|
|
This is the full resource name of the secret stored in Secret Manager, which contains the API key for your Weaviate or Pinecone vector database. Format: You can leave this field empty in the |
|
Optional, Immutable. The embedding model to use for the RAG corpus. This value can't be changed after it's set. If you leave it empty, RAG Engine uses text-embedding-005 as the embedding model. |
Update a RAG corpus
This table lists the parameters used to update a RAG corpus.
Body Request
Parameter | Description |
---|---|
|
Optional. The display name of the RAG corpus. |
|
Optional. The description of the RAG corpus. |
|
The Weaviate instance's HTTP endpoint. If you created the |
|
The Weaviate collection that the RAG corpus maps to. If you created the |
|
The name used to create the Pinecone index that's used with the RAG corpus. If you created the |
|
The Vertex AI Feature Store Format: If you created the |
|
The resource name of the Vector Search index that's used with the RAG corpus. Format: If you created the |
|
The resource name of the Vector Search index endpoint that's used with the RAG corpus. Format: If you created the |
|
The full resource name of the secret stored in Secret Manager, which contains the API key for your Weaviate or Pinecone vector database. Format: |
List RAG corpora
This table lists the parameters used to list RAG corpora.
Parameter | Description |
---|---|
|
Optional. The standard list page size. |
|
Optional. A page token that you can get from |
Get a RAG corpus
This table lists the parameter used to get a RAG corpus.
Parameter | Description |
---|---|
|
Required. The name of the |
Delete a RAG corpus
This table lists the parameter used to delete a RAG corpus.
Parameter | Description |
---|---|
|
Required. The name of the |
File management parameters
For information about a RAG file, see File management.
Upload a RAG file
This table lists parameters used to upload a RAG file.
Body Request
Parameter | Description |
---|---|
|
Required. The name of the |
|
Required. The file to upload. |
|
Required. The configuration for the |
RagFile |
Description |
---|---|
|
Required. The display name of the RAG file. |
|
Optional. The description of the RAG file. |
UploadRagFileConfig |
Description |
---|---|
|
The number of tokens in each chunk. |
|
The number of tokens to overlap between chunks. |
Import RAG files
This table lists parameters used to import a RAG file.
Parameter | Description |
---|---|
|
Required. The name of the Format: |
|
The Cloud Storage location. Supports importing individual files and entire Cloud Storage directories. |
|
The Cloud Storage URI that contains the upload file. |
|
The Google Drive location. Supports importing individual files and Google Drive folders. |
|
The Slack channel where the file is uploaded. |
|
The Jira query where the file is uploaded. |
|
The SharePoint sources where the file is uploaded. |
|
The number of tokens in each chunk. |
|
The number of tokens to overlap between chunks. |
|
Optional. Specifies the parsing configuration for If you don't set this field, RAG uses the default parser. |
|
Optional. The maximum number of queries per minute (QPM) that this job can make to the embedding model. This value is specific to this job and isn't shared with other import jobs. To set an appropriate value, see the Quotas page for your project. If you don't specify a value, a default of 1,000 QPM is used. |
GoogleDriveSource |
Description |
---|---|
|
Required. The ID of the Google Drive resource. |
|
Required. The type of the Google Drive resource. |
SlackSource |
Description |
---|---|
|
Repeated. Slack channel information, including ID and time range to import. |
|
Required. The Slack channel ID. |
|
Optional. The starting timestamp for messages to import. |
|
Optional. The ending timestamp for messages to import. |
|
Required. The full resource name of the secret stored in Secret Manager,
which contains a Slack channel access token that has access to the specified Slack channel IDs.
Format: |
JiraSource |
Description |
---|---|
|
Repeated. A list of Jira projects to import in their entirety. |
|
Repeated. A list of custom Jira queries to import. For more information about Jira Query Language (JQL), see
|
|
Required. The Jira email address. |
|
Required. The Jira server URI. |
|
Required. The full resource name of the secret stored in Secret Manager,
which contains a Jira API key.
Format: |
SharePointSources |
Description |
---|---|
|
The path of the SharePoint folder to download from. |
|
The ID of the SharePoint folder to download from. |
|
The name of the drive to download from. |
|
The ID of the drive to download from. |
|
The Application (client) ID for the app registered in the Microsoft Azure Portal.
|
|
Required. The full resource name of the secret stored in Secret Manager, which contains the application secret for the app registered in Azure. Format: |
|
Unique identifier of the Azure Active Directory Instance. |
|
The name of the SharePoint site to download from. This can be the site name or the site ID. |
Parser Option | Description | Use Case |
---|---|---|
layout_parser |
Uses Document AI to parse files, preserving the structure and layout of the document. | Best for structured or semi-structured documents like PDFs with tables, columns, and complex layouts. |
llm_parser |
Uses a Large Language Model (LLM) to parse files, focusing on semantic understanding of the content. | Ideal for unstructured text documents where extracting meaning and context is more important than preserving the original visual layout. |
RagFileParsingConfig |
Description |
---|---|
|
The Layout Parser to use for |
|
The full resource name of a Document AI processor or processor version. Format:
|
|
The maximum number of requests the job is allowed to make to the Document AI processor per minute. Consult https://cloud.google.com/document-ai/quotas and the Quota page for your project to set an appropriate value here. If unspecified, a default value of 120 QPM is used. |
|
The LLM parser to use for |
|
The resource name of an LLM model. Format:
|
|
The maximum number of requests the job is allowed to make to the LLM model per minute. To set an appropriate value for your project, see model quota section and the Quota page for your project to set an appropriate value here. If unspecified, a default value of 5000 QPM is used. |
Get a RAG file
This table lists the parameter used to get a RAG file.
Parameter | Description |
---|---|
|
Required. The name of the |
Delete a RAG file
This table lists the parameter used to delete a RAG file.
Parameter | Description |
---|---|
|
Required. The name of the |
Retrieval and prediction parameters
This section lists the retrieval and prediction parameters.
Retrieval parameters
This table lists the parameters for the retrieveContexts
method.
Parameter | Description |
---|---|
|
Required. The resource name of the location from which to retrieve Format: |
|
The data source for Vertex RagStore. |
|
Required. Single RAG retrieve query. |
VertexRagStore
VertexRagStore |
Description |
---|---|
|
list: The RAG source. You can specify a single corpus or multiple |
|
Optional.
Format: |
|
list: A list of Format: |
RagQuery |
Description |
---|---|
|
The query in text format to get relevant contexts. |
|
Optional. The retrieval configuration for the query. |
RagRetrievalConfig |
Description |
---|---|
|
Optional. The number of contexts to retrieve. |
|
Optional. Controls the weight between dense and sparse vector search results. The value must be between 0.0 and 1.0. A value of 0.0 means sparse vector search only, and 1.0 means dense vector search only. The default is 0.5. Hybrid search is only available for Weaviate. |
|
Returns only contexts with a vector distance smaller than this threshold. |
|
Returns only contexts with a vector similarity larger than this threshold. |
|
Optional. The model name of the rank service. Example: |
|
Optional. The model name used for ranking. Example: |
Prediction parameters
This table lists prediction parameters.
GenerateContentRequest |
Description |
---|---|
|
Set to use a data source powered by the Vertex AI RAG store. |
See VertexRagStore for details.
Project management parameters
This table lists project-level parameters.
When using the RagManagedDb
, you can select a tier that best fits your performance and cost requirements. The following table compares the available tiers.
Tier | Description | Use Case |
---|---|---|
scaled |
A production-scale tier with autoscaling to handle high and variable query loads. | Recommended for production applications requiring high availability and performance. |
basic |
A cost-effective, lower-compute tier for smaller-scale needs. | Suitable for development, testing, or applications with low, predictable traffic. |
unprovisioned |
De-provisions the managed database and its underlying resources. | Used to disable the managed database and stop incurring associated costs. |
RagEngineConfig
Parameter | Description |
---|---|
RagManagedDbConfig.scaled |
This tier offers production-scale performance along with autoscaling functionality. |
RagManagedDbConfig.basic |
This tier offers a cost-effective and low-compute tier. |
RagManagedDbConfig.unprovisioned |
This tier de-provisions the RagManagedDb and its underlying Spanner instance. |
Corpus management examples
This section provides examples of how to use the API to manage your RAG corpora.
Create a RAG corpus example
This example demonstrates how to create a RAG corpus.
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- CORPUS_DISPLAY_NAME: The display name of the
RagCorpus
. - CORPUS_DESCRIPTION: The description of the
RagCorpus
.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora
Request JSON body:
{ "display_name" : "CORPUS_DISPLAY_NAME", "description": "CORPUS_DESCRIPTION", }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora" | Select-Object -Expand Content
Update a RAG corpus example
You can update a RAG corpus's display name, description, and some vector database configurations. However, you can't change the following immutable parameters:
- The vector database type. For example, you can't change the vector database from Weaviate to Vertex AI Feature Store.
- If you're using the managed database option, you can't update the vector database configuration.
This example demonstrates how to update a RAG corpus.
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- CORPUS_ID: The corpus ID of your RAG corpus.
- CORPUS_DISPLAY_NAME: The display name of the
RagCorpus
. - CORPUS_DESCRIPTION: The description of the
RagCorpus
. - INDEX_NAME: The resource name of the
Vector Search Index
. Format:projects/{project}/locations/{location}/indexes/{index}
- INDEX_ENDPOINT_NAME: The resource name of the
Vector Search Index Endpoint
. Format:projects/{project}/locations/{location}/indexEndpoints/{index_endpoint}
HTTP method and URL:
PATCH https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/CORPUS_ID
Request JSON body:
{ "display_name" : "CORPUS_DISPLAY_NAME", "description": "CORPUS_DESCRIPTION", "rag_vector_db_config": { "vertex_vector_search": { "index": "INDEX_NAME", "index_endpoint": "INDEX_ENDPOINT_NAME", } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/CORPUS_ID"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method PATCH `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/CORPUS_ID" | Select-Object -Expand Content
List RAG corpora example
This example shows how to list all RAG corpora in a project.
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- PAGE_SIZE: The standard list page size. You can adjust the number of
RagCorpora
to return per page by updating thepage_size
parameter. - PAGE_TOKEN: The standard list page token. Get this token from
ListRagCorporaResponse.next_page_token
in a previousVertexRagDataService.ListRagCorpora
call.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora?page_size=PAGE_SIZE&page_token=PAGE_TOKEN
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora?page_size=PAGE_SIZE&page_token=PAGE_TOKEN"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora?page_size=PAGE_SIZE&page_token=PAGE_TOKEN" | Select-Object -Expand Content
RagCorpora
for the specified project.
Get a RAG corpus example
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- RAG_CORPUS_ID: The ID of the
RagCorpus
resource.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID" | Select-Object -Expand Content
RagCorpus
resource.
Delete a RAG corpus example
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- RAG_CORPUS_ID: The ID of the
RagCorpus
resource.
HTTP method and URL:
DELETE https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID" | Select-Object -Expand Content
DeleteOperationMetadata
resource.
File management examples
This section provides examples of how to use the API to manage RAG files.
Upload a RAG file example
REST
Before running the command, replace the following variables: PROJECT_ID: Your project ID.
LOCATION: The region to process the request.
RAG_CORPUS_ID: The corpus ID of your RAG corpus.
LOCAL_FILE_PATH: The local path to the file to be uploaded.
DISPLAY_NAME: The display name of the RAG file.
DESCRIPTION: The description of the RAG file.
To send your request, use the following command:
curl -X POST \
-H "X-Goog-Upload-Protocol: multipart" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-F metadata="{'rag_file': {'display_name':' DISPLAY_NAME', 'description':'DESCRIPTION'}}" \
-F file=@LOCAL_FILE_PATH \
"https://LOCATION-aiplatform.googleapis.com/upload/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles:upload"
Import RAG files example
You can import files and folders from Drive or Cloud Storage.
The response.skipped_rag_files_count
refers to the number of files that
were skipped during import. A file is skipped if the following conditions are
met:
- The file has already been imported.
- The file hasn't changed.
- The chunking configuration for the file hasn't changed.
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- RAG_CORPUS_ID: The ID of the
RagCorpus
resource. - GCS_URIS: A list of Cloud Storage locations. Example:
gs://my-bucket1, gs://my-bucket2
. - CHUNK_SIZE: Optional: The number of tokens each chunk should have.
- CHUNK_OVERLAP: Optional: The number of tokens to overlap between chunks.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles:import
Request JSON body:
{ "import_rag_files_config": { "gcs_source": { "uris": "GCS_URIS" }, "rag_file_chunking_config": { "chunk_size": CHUNK_SIZE, "chunk_overlap": CHUNK_OVERLAP } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles:import"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles:import" | Select-Object -Expand Content
ImportRagFilesOperationMetadata
resource.
The following sample demonstrates how to import a file from
Cloud Storage. Use the max_embedding_requests_per_min
control field
to limit the rate at which RAG Engine calls the embedding model during the
ImportRagFiles
indexing process. The field has a default value of 1000
calls
per minute.
PROJECT_ID: Your project ID.
LOCATION: The region to process the request.
RAG_CORPUS_ID: The corpus ID of your RAG corpus.
GCS_URIS: A list of Cloud Storage locations. Example: gs://my-bucket1.
CHUNK_SIZE: Number of tokens each chunk should have.
CHUNK_OVERLAP: Number of tokens overlap between chunks.
EMBEDDING_MODEL_QPM_RATE: The QPM rate to limit RAGs access to your embedding model. Example: 1000.
// ImportRagFiles
// Import a single Cloud Storage file or all files in a Cloud Storage bucket.
// Input: LOCATION, PROJECT_ID, RAG_CORPUS_ID, GCS_URIS
// Output: ImportRagFilesOperationMetadataNumber
// Use ListRagFiles to find the server-generated rag_file_id.
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles:import \
-d '{
"import_rag_files_config": {
"gcs_source": {
"uris": "GCS_URIS"
},
"rag_file_chunking_config": {
"chunk_size": CHUNK_SIZE,
"chunk_overlap": CHUNK_OVERLAP
},
"max_embedding_requests_per_min": EMBEDDING_MODEL_QPM_RATE
}
}'
// Poll the operation status.
// The response contains the number of files imported.
OPERATION_ID: The operation ID you get from the response of the previous command.
poll_op_wait OPERATION_ID
The following sample demonstrates how to import a file from
Drive. Use the max_embedding_requests_per_min
control field to
limit the rate at which RAG Engine calls the embedding model during the
ImportRagFiles
indexing process. The field has a default value of 1000
calls
per minute.
PROJECT_ID: Your project ID.
LOCATION: The region to process the request.
RAG_CORPUS_ID: The corpus ID of your RAG corpus.
FOLDER_RESOURCE_ID: The resource ID of your Google Drive folder.
CHUNK_SIZE: Number of tokens each chunk should have.
CHUNK_OVERLAP: Number of tokens overlap between chunks.
EMBEDDING_MODEL_QPM_RATE: The QPM rate to limit RAGs access to your embedding model. Example: 1000.
// ImportRagFiles
// Import all files in a Google Drive folder.
// Input: LOCATION, PROJECT_ID, RAG_CORPUS_ID, FOLDER_RESOURCE_ID
// Output: ImportRagFilesOperationMetadataNumber
// Use ListRagFiles to find the server-generated rag_file_id.
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles:import \
-d '{
"import_rag_files_config": {
"google_drive_source": {
"resource_ids": {
"resource_id": "FOLDER_RESOURCE_ID",
"resource_type": "RESOURCE_TYPE_FOLDER"
}
},
"max_embedding_requests_per_min": EMBEDDING_MODEL_QPM_RATE
}
}'
// Poll the operation status.
// The response contains the number of files imported.
OPERATION_ID: The operation ID you get from the response of the previous command.
poll_op_wait OPERATION_ID
List RAG files example
This example demonstrates how to list RAG files.
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- RAG_CORPUS_ID: The ID of the
RagCorpus
resource. - PAGE_SIZE: The standard list page size. You can adjust the number of
RagFiles
to return per page by updating thepage_size
parameter. - PAGE_TOKEN: The standard list page token. Get this token from
ListRagFilesResponse.next_page_token
in a previousVertexRagDataService.ListRagFiles
call.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles?page_size=PAGE_SIZE&page_token=PAGE_TOKEN
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles?page_size=PAGE_SIZE&page_token=PAGE_TOKEN"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles?page_size=PAGE_SIZE&page_token=PAGE_TOKEN" | Select-Object -Expand Content
RagFiles
for the specified RAG_CORPUS_ID
.
Get a RAG file example
This example demonstrates how to get a RAG file.
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- RAG_CORPUS_ID: The ID of the
RagCorpus
resource. - RAG_FILE_ID: The ID of the
RagFile
resource.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles/RAG_FILE_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles/RAG_FILE_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles/RAG_FILE_ID" | Select-Object -Expand Content
RagFile
resource.
Delete a RAG file example
This example demonstrates how to delete a RAG file.
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- RAG_CORPUS_ID: The ID of the
RagCorpus
resource. - RAG_FILE_ID: The ID of the
RagFile
resource. Format:projects/{project}/locations/{location}/ragCorpora/{rag_corpus}/ragFiles/{rag_file_id}
.
HTTP method and URL:
DELETE https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles/RAG_FILE_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles/RAG_FILE_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles/RAG_FILE_ID" | Select-Object -Expand Content
DeleteOperationMetadata
resource.
Retrieval and prediction examples
Retrieval query example
When you provide a query, the retrieval component in RAG searches its knowledge base to find relevant information.
REST
Before using any of the request data, make the following replacements:
- LOCATION: The region to process the request.
- PROJECT_ID: Your project ID.
- RAG_CORPUS_RESOURCE: The name of the
RagCorpus
resource. Format:projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - VECTOR_DISTANCE_THRESHOLD: Only contexts with a vector distance smaller than the threshold are returned.
- TEXT: The query text to get relevant contexts.
- SIMILARITY_TOP_K: The number of top contexts to retrieve.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts
Request JSON body:
{ "vertex_rag_store": { "rag_resources": { "rag_corpus": "RAG_CORPUS_RESOURCE" }, "vector_distance_threshold": VECTOR_DISTANCE_THRESHOLD }, "query": { "text": "TEXT", "similarity_top_k": SIMILARITY_TOP_K } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts" | Select-Object -Expand Content
RagFiles
.
Generation example
The LLM generates a grounded response using the retrieved contexts.
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process the request.
- MODEL_ID: The LLM model for content generation. Example:
gemini-2.5-flash
- GENERATION_METHOD: The LLM method for content generation. Options:
generateContent
,streamGenerateContent
- INPUT_PROMPT: The text sent to the LLM for content generation. Try to use a prompt relevant to the uploaded RAG files.
- RAG_CORPUS_RESOURCE: The name of the
RagCorpus
resource. Format:projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.
- VECTOR_DISTANCE_THRESHOLD: Optional: Contexts with a vector distance smaller than the threshold are returned.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATION_METHOD
Request JSON body:
{ "contents": { "role": "user", "parts": { "text": "INPUT_PROMPT" } }, "tools": { "retrieval": { "disable_attribution": false, "vertex_rag_store": { "rag_resources": { "rag_corpus": "RAG_CORPUS_RESOURCE" }, "similarity_top_k": SIMILARITY_TOP_K, "vector_distance_threshold": VECTOR_DISTANCE_THRESHOLD } } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATION_METHOD"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATION_METHOD" | Select-Object -Expand Content
Project management examples
The tier is a project-level setting available under the RagEngineConfig
resource and affects RAG corpora that use RagManagedDb
. To get the tier
configuration, use GetRagEngineConfig
. To update the tier configuration,
use UpdateRagEngineConfig
.
For more information on managing your tier configuration, see Manage your tier.
Get project configuration
The following example demonstrates how to read your RagEngineConfig
:
Console
- In the Google Cloud console, go to the RAG Engine page.
- Select the region in which your RAG Engine is running. Your list of RAG corpora is updated.
- Click Configure RAG Engine. The Configure RAG Engine pane appears. You can see the tier that's selected for your RAG Engine.
- Click Cancel.
Python
from vertexai import rag
import vertexai
PROJECT_ID = YOUR_PROJECT_ID
LOCATION = YOUR_RAG_ENGINE_LOCATION
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)
rag_engine_config = rag.rag_data.get_rag_engine_config(
name=f"projects/{PROJECT_ID}/locations/{LOCATION}/ragEngineConfig"
)
print(rag_engine_config)
REST
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragEngineConfig
Update project configuration
This section provides examples of how to change your tier.
Update your RagEngineConfig
to the Scaled tier
The following examples demonstrate how to set the RagEngineConfig
to the
Scaled tier:
Console
- In the Google Cloud console, go to the RAG Engine page.
- Select the region in which your RAG Engine is running. Your list of RAG corpora is updated.
- Click Configure RAG Engine. The Configure RAG Engine pane appears.
- Select the tier that you want to run your RAG Engine.
- Click Save.
Python
from vertexai import rag
import vertexai
PROJECT_ID = YOUR_PROJECT_ID
LOCATION = YOUR_RAG_ENGINE_LOCATION
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)
rag_engine_config_name=f"projects/{PROJECT_ID}/locations/{LOCATION}/ragEngineConfig"
new_rag_engine_config = rag.RagEngineConfig(
name=rag_engine_config_name,
rag_managed_db_config=rag.RagManagedDbConfig(tier=rag.Scaled()),
)
updated_rag_engine_config = rag.rag_data.update_rag_engine_config(
rag_engine_config=new_rag_engine_config
)
print(updated_rag_engine_config)
REST
curl -X PATCH \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragEngineConfig -d "{'ragManagedDbConfig': {'scaled': {}}}"
Update your RagEngineConfig
to the Basic tier
The following examples demonstrate how to set the RagEngineConfig
to the
Basic tier:
If you have a large amount of data in your RagManagedDb
across your RAG
corpora, downgrading to a Basic tier can fail due to insufficient compute
and storage capacity.
Console
- In the Google Cloud console, go to the RAG Engine page.
- Select the region in which your RAG Engine is running. Your list of RAG corpora is updated.
- Click Configure RAG Engine. The Configure RAG Engine pane appears.
- Select the tier that you want to run your RAG Engine.
- Click Save.
Python
from vertexai import rag
import vertexai
PROJECT_ID = YOUR_PROJECT_ID
LOCATION = YOUR_RAG_ENGINE_LOCATION
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)
rag_engine_config_name=f"projects/{PROJECT_ID}/locations/{LOCATION}/ragEngineConfig"
new_rag_engine_config = rag.RagEngineConfig(
name=rag_engine_config_name,
rag_managed_db_config=rag.RagManagedDbConfig(tier=rag.Basic()),
)
updated_rag_engine_config = rag.rag_data.update_rag_engine_config(
rag_engine_config=new_rag_engine_config
)
print(updated_rag_engine_config)
REST
curl -X PATCH \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragEngineConfig -d "{'ragManagedDbConfig': {'basic': {}}}"
Update your RagEngineConfig
to the Unprovisioned tier
The following examples demonstrate how to set the RagEngineConfig
to the
Unprovisioned tier:
Console
- In the Google Cloud console, go to the RAG Engine page.
- Select the region in which your RAG Engine is running. Your list of RAG corpora is updated.
- Click Configure RAG Engine. The Configure RAG Engine pane appears.
- Click Delete RAG Engine. A confirmation dialog appears.
- Verify that you're about to delete your data in RAG Engine by typing delete, then click Confirm.
- Click Save.
Python
from vertexai import rag
import vertexai
PROJECT_ID = YOUR_PROJECT_ID
LOCATION = YOUR_RAG_ENGINE_LOCATION
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location=LOCATION)
rag_engine_config_name=f"projects/{PROJECT_ID}/locations/{LOCATION}/ragEngineConfig"
new_rag_engine_config = rag.RagEngineConfig(
name=rag_engine_config_name,
rag_managed_db_config=rag.RagManagedDbConfig(tier=rag.Unprovisioned()),
)
updated_rag_engine_config = rag.rag_data.update_rag_engine_config(
rag_engine_config=new_rag_engine_config
)
print(updated_rag_engine_config)
REST
curl -X PATCH \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/ragEngineConfig -d "{'ragManagedDbConfig': {'unprovisioned': {}}}"
What's next
- To learn more about supported generation models, see Generative AI models that support RAG.
- To learn more about supported embedding models, see Embedding models.
- To learn more about open models, see Open models.
- To learn more about RAG Engine, see
RAG Engine overview.