This page describes refreshing structured and unstructured data.
To refresh your website apps, see Refresh your web page.
Refresh structured data
You can refresh the data in a structured data store as long as you use a schema that is the same or backward compatible with the schema in the data store. For example, adding only new fields to an existing schema is backward compatible.
You can refresh structured data in the Google Cloud console or using the API.
Console
To use the Google Cloud console to refresh structured data from a branch of a data store, follow these steps:
In the Google Cloud console, go to the Agent Builder page.
In the navigation menu, click Data Stores.
In the Name column, click the data store that you want to edit.
On the Documents tab, click
Import data.To refresh from Cloud Storage:
- In the Select a data source pane, select Cloud Storage.
- In the Import data from Cloud Storage pane, click Browse, select the bucket that contains your refreshed data, and then click Select. Alternatively, enter the bucket location directly in the gs:// field.
- Under Data Import Options, select an import option.
- Click Import.
To refresh from BigQuery:
- In the Select a data source pane, select BigQuery.
- In the Import data from BigQuery pane, click Browse, select a table that contains your refreshed data, and then click Select. Alternatively, enter the table location directly in the BigQuery path field.
- Under Data Import Options, select an import option.
- Click Import.
REST
Use the documents.import
method to refresh your data,
specifying the appropriate reconciliationMode
value.
To refresh structured data from BigQuery using the command line, follow these steps:
Find your data store ID. If you already have your data store ID, skip to the next step.
In the Google Cloud console, go to the Agent Builder page and in the navigation menu, click Data Stores.
Click the name of your data store.
On the Data page for your data store, get the data store ID.
Import your structured data using BigQuery.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \ -d '{ "bigquerySource": { "projectId": "PROJECT_ID", "datasetId":"DATASET_ID", "tableId": "TABLE_ID", "dataSchema": "DATA_SCHEMA", }, "reconciliationMode": "RECONCILIATION_MODE", "autoGenerateIds": "AUTO_GENERATE_IDS", "idField": "ID_FIELD", "errorConfig": { "gcsPrefix": "ERROR_DIRECTORY" } }'
- PROJECT_ID: The ID of your project.
- DATA_STORE_ID: The ID of your data store.
- DATASET_ID: The name of your BigQuery dataset.
- TABLE_ID: The name of your BigQuery table.
- DATA_SCHEMA Optional. Values are
document
andcustom
. The default isdocument
.- If you specify
document
, the BigQuery table that you use must conform to the following default BigQuery schema. You can define the ID of each document yourself, while wrapping all the data in the jsonData string. - If you specify
custom
, any BigQuery table schema is accepted, and Vertex AI Agent Builder automatically generates the IDs for each document that is imported.
- If you specify
- ERROR_DIRECTORY: Optional. A Cloud Storage directory for
error information about the import—for example,
gs://<your-gcs-bucket>/directory/import_errors
. Google recommends leaving this field empty to let Vertex AI Agent Builder automatically create a temporary directory. - RECONCILIATION_MODE: Optional. Values are
FULL
andINCREMENTAL
. Default isINCREMENTAL
. SpecifyingINCREMENTAL
causes an incremental refresh of data from BigQuery to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL
causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in BigQuery are removed from your data store. TheFULL
mode is helpful if you want to automatically delete documents that you no longer need. AUTO_GENERATE_IDS: Optional. Specifies whether to automatically generate document IDs. If set to
true
, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode
toFULL
to maintain consistent document IDs.Specify
autoGenerateIds
only whenbigquerySource.dataSchema
is set tocustom
. Otherwise anINVALID_ARGUMENT
error is returned. If you don't specifyautoGenerateIds
or set it tofalse
, you must specifyidField
. Otherwise the documents fail to import.ID_FIELD: Optional. Specifies which fields are the document IDs. For BigQuery source files,
idField
indicates the name of the column in the BigQuery table that contains the document IDs.Specify
idField
only when: (1)bigquerySource.dataSchema
is set tocustom
, and (2)auto_generate_ids
is set tofalse
or is unspecified. Otherwise anINVALID_ARGUMENT
error is returned.Note that the value of the BigQuery column name must be of string type, must be between 1 and 63 characters, and must conform to RFC-1034. Otherwise, the documents fail to import.
Here is the default BigQuery schema. Your BigQuery table must conform to this schema when you set
dataSchema
todocument
.[ { "name": "id", "mode": "REQUIRED", "type": "STRING", "fields": [] }, { "name": "jsonData", "mode": "NULLABLE", "type": "STRING", "fields": [] } ]
Refresh unstructured data
You can refresh unstructured data in the Google Cloud console or using the API.
Console
To use the Google Cloud console to refresh unstructured data from a branch of a data store, follow these steps:
In the Google Cloud console, go to the Agent Builder page.
In the navigation menu, click Data Stores.
In the Name column, click the data store that you want to edit.
On the Documents tab, click
Import data.To ingest from a Cloud Storage bucket (with or without metadata):
- In the Select a data source pane, select Cloud Storage.
- In the Import data from Cloud Storage pane, click Browse,
select the bucket that contains your refreshed data, and then click
Select. Alternatively, enter the bucket location directly in the
gs://
field. - Under Data Import Options, select an import option.
- Click Import.
To ingest from BigQuery:
- In the Select a data source pane, select BigQuery.
- In the Import data from BigQuery pane, click Browse, select a table that contains your refreshed data, and then click Select. Alternatively, enter the table location directly in the BigQuery path field.
- Under Data Import Options, select an import option.
- Click Import.
REST
To refresh unstructured data using the API, re-import it using the
documents.import
method, specifying the appropriate
reconciliationMode
value. For more information about importing unstructured
data, see Unstructured data.