This guide shows you how to ingest data into Vertex AI RAG Engine from various supported sources. This page covers the following topics: The Import RagFiles API provides data connectors for the following data sources: For more information, see the RAG API reference. If you import the same file multiple times without any changes, Vertex AI RAG Engine skips the file because it already exists. The A file is skipped if all of the following conditions are met: To investigate import failures, you can review the response metadata or configure an import result sink to store detailed logs. The For detailed results on both successful and failed file imports, specify the optional The Cloud Storage: Specify a path in the format BigQuery: Specify a table in the format This section shows you how to import files from Cloud Storage or Google Drive. Before you import files from Google Drive, you must grant the required permissions. To import files from Google Drive, you must grant the To grant permissions: For more information on file size limits, see Supported document types. Create a corpus by following the instructions at
Create a RAG corpus. To import your files from Cloud Storage or Google Drive, use the
template. The system automatically checks your file's path, filename, and
If a file with the same filename and path has a content
update, the file is reindexed. To import files from Slack, follow these steps: Create and set up a Slack app: Add the following permissions: Click Install to Workspace to install the app into your Slack workspace. Copy your API token. Add your API token to Secret Manager. Grant the Secret Manager Secret Accessor role to your project's Vertex AI RAG Engine service account so it can access the secret. The following samples show how to import files from your Slack resources. To get messages from a specific channel, change the To get messages for a given time range or from a specific channel, change any of the following fields: To import files from Jira, follow these steps: To import files from your SharePoint site, follow these steps: Create an Azure app to access your SharePoint site: Go to App Registrations and create a new registration: From the app's Overview section, note the Application (client) ID (used as CLIENT_ID) and the Directory (tenant) ID (used as TENANT_ID). In the Manage section, configure API permissions: In the Manage section, go to Certificates & secrets to create a new client secret. Add the client secret value to Secret Manager. You will use the secret's resource name as the API_KEY_SECRET_VERSION. Grant the Secret Manager Secret Accessor role to your project's Vertex AI RAG Engine service account. Use {YOUR_ORG_ID}.sharepoint.com as the SHAREPOINT_SITE_NAME. Specify a drive name or drive ID in the SharePoint site in the request. Optional: Specify a folder path or folder ID on the drive. If you don't specify a folder, all folders and files on the drive are imported.
Supported data sources for RAG
Option
Description
Use Case
Upload a local file
Synchronous, single-file upload directly from your local machine.
Quick testing and importing individual small files (up to 25 MB).
Cloud Storage
Asynchronously import one or more files stored in a Cloud Storage bucket.
Batch processing of large files or a large number of files already in cloud storage.
Google Drive
Asynchronously import files from a specified Google Drive folder.
Ingesting documents and collaborative files directly from a user's or shared drive.
Slack
Ingests conversations and files from specified Slack channels using a data connector.
Building a knowledge base from team communications and shared resources in Slack.
Jira
Ingests issues, comments, and attachments from Jira projects or custom JQL queries.
Creating a searchable index of project management data, bug reports, and documentation from Jira.
SharePoint
Ingests files and documents from a SharePoint site, drive, or folder.
Integrating enterprise documents, reports, and collaborative content stored in SharePoint.
Data deduplication
response.skipped_rag_files_count
field in the response indicates the number of files that were skipped during the import process.
Understand import failures
Response metadata
response.metadata
object in the SDK lets you view the import results, the request time, and the response time.Import result sink
import_result_sink
parameter. This parameter sets a destination for the import logs, which helps you identify which files failed and why.import_result_sink
must be a Cloud Storage path or a BigQuery table:
gs://my-bucket/my/object.ndjson
. The object must not exist before the import. After the job completes, this file contains one JSON object per line, detailing the operation ID, timestamp, filename, status, and file ID for each imported file.bq://my-project.my-dataset.my-table
. If the table doesn't exist, it is created. If it exists, its schema is verified. You can reuse the same table for multiple imports.Import files from Cloud Storage or Google Drive
Grant Google Drive permissions
Viewer
role to the Vertex AI RAG Data Service Agent service account for the Google Drive folder or file. If you don't grant the correct permissions, the import fails without an error message.
Viewer
role to the service account. The Google Drive resource ID can be found in the web URL.Import the files
version_id
. The version_id
is a file hash that's
calculated using the file's content, which prevents the file from being
reindexed.Import files from Slack
CHANNEL_ID
from the Slack channel ID.
channels:history
groups:history
im:history
mpim:history
curl
CHANNEL_ID
.API_KEY_SECRET_VERSION=SLACK_API_KEY_SECRET_VERSION
CHANNEL_ID=SLACK_CHANNEL_ID
PROJECT_ID=us-central1
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ ENDPOINT }/v1beta1/projects/${ PROJECT_ID }/locations/${ PROJECT_ID }/ragCorpora/${ RAG_CORPUS_ID }/ragFiles:import \
-d '{
"import_rag_files_config": {
"slack_source": {
"channels": [
{
"apiKeyConfig": {
"apiKeySecretVersion": "'"${ API_KEY_SECRET_VERSION }"'"
},
"channels": [
{
"channel_id": "'"${ CHANNEL_ID }"'"
}
]
}
]
}
}
}'
Python
# Slack example
start_time = protobuf.timestamp_pb2.Timestamp()
start_time.GetCurrentTime()
end_time = protobuf.timestamp_pb2.Timestamp()
end_time.GetCurrentTime()
source = rag.SlackChannelsSource(
channels = [
SlackChannel("CHANNEL1", "api_key1"),
SlackChannel("CHANNEL2", "api_key2", START_TIME, END_TIME)
],
)
response = rag.import_files(
corpus_name="projects/my-project/locations/us-central1/ragCorpora/my-corpus-1",
source=source,
chunk_size=512,
chunk_overlap=100,
)
Import files from Jira
projects
or customQueries
with your request. When you import projects
, the value is expanded into a query to get the entire project (for example, MyProject
becomes project = MyProject
). To learn more about custom queries, see Use advanced search with Jira Query Language (JQL).curl
EMAIL=JIRA_EMAIL
API_KEY_SECRET_VERSION=JIRA_API_KEY_SECRET_VERSION
SERVER_URI=JIRA_SERVER_URI
CUSTOM_QUERY=JIRA_CUSTOM_QUERY
PROJECT_ID=JIRA_PROJECT
REGION= "us-central1"
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ ENDPOINT }/v1beta1/projects/${ PROJECT_ID }/locations/REGION>/ragCorpora/${ RAG_CORPUS_ID }/ragFiles:import \
-d '{
"import_rag_files_config": {
"jiraSource": {
"jiraQueries": [{
"projects": ["'"${ PROJECT_ID }"'"],
"customQueries": ["'"${ CUSTOM_QUERY }"'"],
"email": "'"${ EMAIL }"'",
"serverUri": "'"${ SERVER_URI }"'",
"apiKeyConfig": {
"apiKeySecretVersion": "'"${ API_KEY_SECRET_VERSION }"'"
}
}]
}
}
}'
Python
# Jira Example
jira_query = rag.JiraQuery(
email="xxx@yyy.com",
jira_projects=["project1", "project2"],
custom_queries=["query1", "query2"],
api_key="api_key",
server_uri="server.atlassian.net"
)
source = rag.JiraSource(
queries=[jira_query],
)
response = rag.import_files(
corpus_name="projects/my-project/locations/REGION/ragCorpora/my-corpus-1",
source=source,
chunk_size=512,
chunk_overlap=100,
)
Import files from SharePoint
Sites.Read.All
permission.Files.Read.All
and Browser SiteLists.Read.All
permissions.curl
CLIENT_ID=SHAREPOINT_CLIENT_ID
API_KEY_SECRET_VERSION=SHAREPOINT_API_KEY_SECRET_VERSION
TENANT_ID=SHAREPOINT_TENANT_ID
SITE_NAME=SHAREPOINT_SITE_NAME
FOLDER_PATH=SHAREPOINT_FOLDER_PATH
DRIVE_NAME=SHAREPOINT_DRIVE_NAME
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ ENDPOINT }/v1beta1/projects/${ PROJECT_ID }/locations/REGION>/ragCorpora/${ RAG_CORPUS_ID }/ragFiles:import \
-d '{
"import_rag_files_config": {
"sharePointSources": {
"sharePointSource": [{
"clientId": "'"${ CLIENT_ID }"'",
"apiKeyConfig": {
"apiKeySecretVersion": "'"${ API_KEY_SECRET_VERSION }"'"
},
"tenantId": "'"${ TENANT_ID }"'",
"sharepointSiteName": "'"${ SITE_NAME }"'",
"sharepointFolderPath": "'"${ FOLDER_PATH }"'",
"driveName": "'"${ DRIVE_NAME }"'"
}]
}
}
}'
Python
from vertexai.preview import rag
from vertexai.preview.rag.utils import resources
CLIENT_ID="SHAREPOINT_CLIENT_ID"
API_KEY_SECRET_VERSION="SHAREPOINT_API_KEY_SECRET_VERSION"
TENANT_ID="SHAREPOINT_TENANT_ID"
SITE_NAME="SHAREPOINT_SITE_NAME"
FOLDER_PATH="SHAREPOINT_FOLDER_PATH"
DRIVE_NAME="SHAREPOINT_DRIVE_NAME"
# SharePoint Example.
source = resources.SharePointSources(
share_point_sources=[
resources.SharePointSource(
client_id=CLIENT_ID,
client_secret=API_KEY_SECRET_VERSION,
tenant_id=TENANT_ID,
sharepoint_site_name=SITE_NAME,
folder_path=FOLDER_PATH,
drive_id=DRIVE_ID,
)
]
)
response = rag.import_files(
corpus_name="projects/my-project/locations/REGION/ragCorpora/my-corpus-1",
source=source,
chunk_size=512,
chunk_overlap=100,
)
What's next
Use data ingestion with Vertex AI RAG Engine
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-26 UTC.