This page describes how to create a data store for first-party data sources, such as Cloud Storage or Google Drive.
To import data from a third-party data source instead, see Connect a third-party data source.
To create a data store and ingest data, go to the section for the source you plan to use:
- Import from BigQuery
- Import from Cloud Storage
- Sync from Google Drive
- Sync from Gmail (Public preview)
- Sync from Google Sites (Public preview)
- Sync from Google Calendar (Public preview)
- Sync from Google Groups (Public preview)
- Sync people data (Public preview)
- Import from Cloud SQL
- Import from Spanner (Public preview)
- Import from Firestore
- Import from Bigtable (Public Preview)
- Import from AlloyDB for PostgreSQL (Public Preview)
- Upload structured JSON data with the API
- Create a data store using Terraform
Limitations
If you have CMEK organization policies, you must create new data stores using the API, not the Google Cloud console. Creating new data stores using the Google Cloud console fails if you have CMEK organization policies enabled. For more information about CMEK support for Agentspace Enterprise, see Customer-managed encryption keys.
Import from BigQuery
You can create data stores from BigQuery tables in two ways:
One-time ingestion: You import data from a BigQuery table into a data store. The data in the data store does not change unless you manually refresh the data.
Periodic ingestion: You import data from one or more BigQuery tables, and you set a sync frequency that determines how often the data stores are updated with the most recent data from the BigQuery dataset.
The following table compares the two ways that you can import BigQuery data into Agentspace Enterprise data stores.
One-time ingestion | Periodic ingestion |
---|---|
Generally available (GA). | Public preview. |
Data must be refreshed manually. | Data updates automatically every 1, 3, or 5 days. Data cannot be manually refreshed. |
Agentspace Enterprise creates a single data store from one table in a BigQuery. | Agentspace Enterprise creates a data connector for a BigQuery dataset and a data store (called an entity data store) for each table specified. For each data connector, the tables must have the same data type (for example, structured) and be in the same BigQuery dataset. |
Data from multiple tables can be combined in one data store by first ingesting data from one table and then more data from another source or BigQuery table. | Because manual data import is not supported, the data in an entity data store can only be sourced from one BigQuery table. |
Data source access control is supported. | Data source access control is not supported. The imported data can contain access controls but these controls won't be respected. |
You can create a data store using either the Google Cloud console or the API. | You must use the console to create data connectors and their entity data stores. |
CMEK-compliant. | Not CMEK-compliant. |
Import once from BigQuery
To ingest data from a BigQuery table, use the following steps to create a data store and ingest data using either the Google Cloud console or the API.
Before importing your data, review Prepare data for ingesting.
Console
To use the Google Cloud console to ingest data from BigQuery, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select BigQuery.
Select what kind of data you are importing.
Click One time.
In the BigQuery path field, click Browse, select a table that you have prepared for ingesting, and then click Select. Alternatively, enter the table location directly in the BigQuery path field.
Click Continue.
If you are doing one-time import of structured data:
Map fields to key properties.
If there are important fields missing from the schema, use Add new field to add them.
For more information, see About auto-detect and edit.
Click Continue.
Choose a region for your data store.
Enter a name for your data store.
Click Create.
To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several minutes to several hours.
REST
To use the command line to create a data store and import data from BigQuery, follow these steps.
Create a data store.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \ -d '{ "displayName": "DATA_STORE_DISPLAY_NAME", "industryVertical": "GENERIC", "solutionTypes": ["SOLUTION_TYPE_SEARCH"] }'
Replace the following:
PROJECT_ID
: the ID of your project.DATA_STORE_ID
: the ID of the data store that you want to create. This ID can contain only lowercase letters, digits, underscores, and hyphens.DATA_STORE_DISPLAY_NAME
: the display name of the data store that you want to create.
Optional: If you're uploading unstructured data and want to configure document parsing or to turn on document chunking for RAG, specify the
documentProcessingConfig
object and include it in your data store creation request. Configuring an OCR parser for PDFs is recommended if you're ingesting scanned PDFs. For how to configure parsing or chunking options, see Parse and chunk documents.Import data from BigQuery.
If you defined a schema, make sure the data conforms to that schema.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \ -d '{ "bigquerySource": { "projectId": "PROJECT_ID", "datasetId":"DATASET_ID", "tableId": "TABLE_ID", "dataSchema": "DATA_SCHEMA", "aclEnabled": "BOOLEAN" }, "reconciliationMode": "RECONCILIATION_MODE", "autoGenerateIds": "AUTO_GENERATE_IDS", "idField": "ID_FIELD", "errorConfig": { "gcsPrefix": "ERROR_DIRECTORY" } }'
Replace the following:
PROJECT_ID
: the ID of your project.DATA_STORE_ID
: the ID of the data store.DATASET_ID
: the ID of the BigQuery dataset.TABLE_ID
: the ID of the BigQuery table.- If the BigQuery table is not under
PROJECT_ID, you need to give the service account
service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com
"BigQuery Data Viewer" permission for the BigQuery table. For example, if you are importing a BigQuery table from source project "123" to destination project "456", giveservice-456@gcp-sa-discoveryengine.iam.gserviceaccount.com
permissions for the BigQuery table under project "123".
- If the BigQuery table is not under
PROJECT_ID, you need to give the service account
DATA_SCHEMA
: Optional. Values aredocument
andcustom
. The default isdocument
.document
: the BigQuery table that you use must conform to the default BigQuery schema provided in Prepare data for ingesting. You can define the ID of each document yourself, while wrapping all the data in the jsonData string.custom
: Any BigQuery table schema is accepted, and Agentspace Enterprise automatically generates the IDs for each document that is imported.
ERROR_DIRECTORY
: Optional. A Cloud Storage directory for error information about the import—for example,gs://<your-gcs-bucket>/directory/import_errors
. Google recommends leaving this field empty to let Agentspace Enterprise automatically create a temporary directory.RECONCILIATION_MODE
: Optional. Values areFULL
andINCREMENTAL
. Default isINCREMENTAL
. SpecifyingINCREMENTAL
causes an incremental refresh of data from BigQuery to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL
causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in BigQuery are removed from your data store. TheFULL
mode is helpful if you want to automatically delete documents that you no longer need.AUTO_GENERATE_IDS
: Optional. Specifies whether to automatically generate document IDs. If set totrue
, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode
toFULL
to maintain consistent document IDs.Specify
autoGenerateIds
only whenbigquerySource.dataSchema
is set tocustom
. Otherwise anINVALID_ARGUMENT
error is returned. If you don't specifyautoGenerateIds
or set it tofalse
, you must specifyidField
. Otherwise the documents fail to import.ID_FIELD
: Optional. Specifies which fields are the document IDs. For BigQuery source files,idField
indicates the name of the column in the BigQuery table that contains the document IDs.Specify
idField
only when: (1)bigquerySource.dataSchema
is set tocustom
, and (2)auto_generate_ids
is set tofalse
or is unspecified. Otherwise anINVALID_ARGUMENT
error is returned.The value of the BigQuery column name must be of string type, must be between 1 and 63 characters, and must conform to RFC-1034. Otherwise, the documents fail to import.
Connect to BigQuery with periodic syncing
Before importing your data, review Prepare data for ingesting.
The following procedure describes how to create a data connector that associates a BigQuery dataset with an Agentspace Enterprise data connector and how to specify a table on the dataset for each data store you want to create. Data stores that are children of data connectors are called entity data stores.
Data from the dataset is synced periodically to the entity data stores. You can specify synchronization daily, every three days, or every five days.
Console
To use the Google Cloud console to create a connector that periodically syncs data from a BigQuery dataset to Agentspace Enterprise, follow these steps:
In the Google Cloud console, go to the Agentspace page.
In the navigation menu, click Data Stores.
Click Create data store.
On the Source page, select BigQuery.
Select the kind of data that you are importing.
Click Periodic.
Select the Sync frequency, how often you want the Agentspace Enterprise connector to sync with the BigQuery dataset. You can change the frequency later.
In the BigQuery dataset path field, click Browse, select the dataset that contains the tables that you have prepared for ingesting. Alternatively, enter the table location directly in the BigQuery path field. The format for the path is
projectname.datasetname
.In the Tables to sync field, click Browse, and then select a table that contains the data that you want for your data store.
If there are additional tables in the dataset that that you want to use for data stores, click Add table and specify those tables too.
Click Continue.
Choose a region for your data store, enter a name for your data connector, and click Create.
You have now created a data connector, which will periodically sync data with the BigQuery dataset. And, you have created one or more entity data stores. The data stores have the same names as the BigQuery tables.
To check the status of your ingestion, go to the Data Stores page and click your data connector name to see details about it on its Data page > Data ingestion activity tab. When the status column on the Activity tab changes from In progress to succeeded, the first ingestion is complete.
Depending on the size of your data, ingestion can take several minutes to several hours.
After you set up your data source and import data the first time, the data store syncs data from that source at a frequency that you select during setup. About an hour after the data connector is created, the first sync occurs. The next sync then occurs around 24 hours, 72 hours, or 120 hours later.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Import from Cloud Storage
You can create data stores from Cloud Storage tables in two ways:
One-time ingestion: You import data from a Cloud Storage folder or file into a data store. The data in the data store doesn't change unless you manually refresh the data.
Periodic ingestion: You import data from a Cloud Storage folder or file, and you set a sync frequency that determines how often the data store is updated with the most recent data from that Cloud Storage location.
The following table compares the two ways that you can import Cloud Storage data into Agentspace Enterprise data stores.
One-time ingestion | Periodic ingestion |
---|---|
Generally available (GA). | Public preview. |
Data must be refreshed manually. | Data updates automatically every one, three, or five days. Data cannot be manually refreshed. |
Agentspace Enterprise creates a single data store from one folder or file in Cloud Storage. | Agentspace Enterprise creates a data connector, and associates a data store (called an entity data store) with it for the file or folder that is specified. Each Cloud Storage data connector can have a single entity data store. |
Data from multiple files, folders, and buckets can be combined in one data store by first ingesting data from one Cloud Storage location and then more data from another location. | Because manual data import is not supported, the data in an entity data store can only be sourced from one Cloud Storage file or folder. |
Data source access control is supported. For more information, see Data source access control. | Data source access control is not supported. The imported data can contain access controls but these controls won't be respected. |
You can create a data store using either the Google Cloud console or the API. | You must use the console to create data connectors and their entity data stores. |
CMEK-compliant. | Not CMEK-compliant. |
Import once from Cloud Storage
To ingest data from Cloud Storage, use the following steps to create a data store and ingest data using either the Google Cloud console or the API.
Before importing your data, review Prepare data for ingesting.
Console
To use the console to ingest data from a Cloud Storage bucket, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select Cloud Storage.
In the Select a folder or file you want to import section, select Folder or File.
Click Browse and choose the data you have prepared for ingesting, and then click Select. Alternatively, enter the location directly in the
gs://
field.Select what kind of data you are importing.
Click Continue.
If you are doing one-time import of structured data:
Map fields to key properties.
If there are important fields missing from the schema, use Add new field to add them.
For more information, see About auto-detect and edit.
Click Continue.
Choose a region for your data store.
Choose a region for your data store.
Enter a name for your data store.
Optional: If you selected unstructured documents, you can select parsing and chunking options for your documents. To compare parsers, see Parse documents. For information about chunking see Chunk documents for RAG.
The OCR parser and layout parser can incur additional costs. See Document AI feature pricing.
To select a parser, expand Document processing options and specify the parser options that you want to use.
Click Create.
To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several minutes or several hours.
REST
To use the command line to create a data store and ingest data from Cloud Storage, follow these steps.
Create a data store.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \ -d '{ "displayName": "DATA_STORE_DISPLAY_NAME", "industryVertical": "GENERIC", "solutionTypes": ["SOLUTION_TYPE_SEARCH"], "contentConfig": "CONTENT_REQUIRED", }'
Replace the following:
PROJECT_ID
: the ID of your project.DATA_STORE_ID
: the ID of the data store that you want to create. This ID can contain only lowercase letters, digits, underscores, and hyphens.DATA_STORE_DISPLAY_NAME
: the display name of the data store that you want to create.
Optional: To configure document parsing or to turn on document chunking for RAG, specify the
documentProcessingConfig
object and include it in your data store creation request. Configuring an OCR parser for PDFs is recommended if you're ingesting scanned PDFs. For how to configure parsing or chunking options, see Parse and chunk documents.Import data from Cloud Storage.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \ -d '{ "gcsSource": { "inputUris": ["INPUT_FILE_PATTERN_1", "INPUT_FILE_PATTERN_2"], "dataSchema": "DATA_SCHEMA", }, "reconciliationMode": "RECONCILIATION_MODE", "autoGenerateIds": "AUTO_GENERATE_IDS", "idField": "ID_FIELD", "errorConfig": { "gcsPrefix": "ERROR_DIRECTORY" } }'
Replace the following:
PROJECT_ID
: the ID of your project.DATA_STORE_ID
: the ID of the data store.INPUT_FILE_PATTERN
: A file pattern in Cloud Storage containing your documents.For structured data or for unstructured data with metadata, an example of the input file pattern is
gs://<your-gcs-bucket>/directory/object.json
and an example of pattern matching one or more files isgs://<your-gcs-bucket>/directory/*.json
.For unstructured documents, an example is
gs://<your-gcs-bucket>/directory/*.pdf
. Each file that is matched by the pattern becomes a document.If
<your-gcs-bucket>
is not under PROJECT_ID, you need to give the service accountservice-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com
"Storage Object Viewer" permissions for the Cloud Storage bucket. For example, if you are importing a Cloud Storage bucket from source project "123" to destination project "456", giveservice-456@gcp-sa-discoveryengine.iam.gserviceaccount.com
permissions on the Cloud Storage bucket under project "123".DATA_SCHEMA
: Optional. Values aredocument
,custom
,csv
, andcontent
. The default isdocument
.document
: Upload unstructured data with metadata for unstructured documents. Each line of the file has to follow one of the following formats. You can define the ID of each document:{ "id": "<your-id>", "jsonData": "<JSON string>", "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
{ "id": "<your-id>", "structData": <JSON object>, "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
custom
: Upload JSON for structured documents. The data is organized according to a schema. You can specify the schema; otherwise it is auto-detected. You can put the JSON string of the document in a consistent format directly in each line, and Agentspace Enterprise automatically generates the IDs for each document imported.content
: Upload unstructured documents (PDF, HTML, DOC, TXT, PPTX). The ID of each document is automatically generated as the first 128 bits of SHA256(GCS_URI) encoded as a hex string. You can specify multiple input file patterns as long as the matched files don't exceed the 100K files limit.csv
: Include a header row in your CSV file, with each header mapped to a document field. Specify the path to the CSV file using theinputUris
field.
ERROR_DIRECTORY
: Optional. A Cloud Storage directory for error information about the import—for example,gs://<your-gcs-bucket>/directory/import_errors
. Google recommends leaving this field empty to let Agentspace Enterprise automatically create a temporary directory.RECONCILIATION_MODE
: Optional. Values areFULL
andINCREMENTAL
. Default isINCREMENTAL
. SpecifyingINCREMENTAL
causes an incremental refresh of data from Cloud Storage to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL
causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in Cloud Storage are removed from your data store. TheFULL
mode is helpful if you want to automatically delete documents that you no longer need.AUTO_GENERATE_IDS
: Optional. Specifies whether to automatically generate document IDs. If set totrue
, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode
toFULL
to maintain consistent document IDs.Specify
autoGenerateIds
only whengcsSource.dataSchema
is set tocustom
orcsv
. Otherwise anINVALID_ARGUMENT
error is returned. If you don't specifyautoGenerateIds
or set it tofalse
, you must specifyidField
. Otherwise the documents fail to import.ID_FIELD
: Optional. Specifies which fields are the document IDs. For Cloud Storage source documents,idField
specifies the name in the JSON fields that are document IDs. For example, if{"my_id":"some_uuid"}
is the document ID field in one of your documents, specify"idField":"my_id"
. This identifies all JSON fields with the name"my_id"
as document IDs.Specify this field only when: (1)
gcsSource.dataSchema
is set tocustom
orcsv
, and (2)auto_generate_ids
is set tofalse
or is unspecified. Otherwise anINVALID_ARGUMENT
error is returned.Note that the value of the Cloud Storage JSON field must be of string type, must be between 1-63 characters, and must conform to RFC-1034. Otherwise, the documents fail to import.
Note that the JSON field name specified by
id_field
must be of string type, must be between 1 and 63 characters, and must conform to RFC-1034. Otherwise, the documents fail to import.
Connect to Cloud Storage with periodic syncing
Before importing your data, review Prepare data for ingesting.
The following procedure describes how to create a data connector that associates a Cloud Storage location with an Agentspace Enterprise data connector and how to specify a folder or file in that location for the data store that you want to create. Data stores that are children of data connectors are called entity data stores.
Data is synced periodically to the entity data store. You can specify synchronization daily, every three days, or every five days.
Console
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click Create data store.
On the Source page, select Cloud Storage.
Select what kind of data you are importing.
Click Periodic.
Select the Synchronization frequency, how often you want the Agentspace Enterprise connector to sync with the Cloud Storage location. You can change the frequency later.
In the Select a folder or file you want to import section, select Folder or File.
Click Browse and choose the data you have prepared for ingesting, and then click Select. Alternatively, enter the location directly in the
gs://
field.Click Continue.
Choose a region for your data connector.
Enter a name for your data connector.
Optional: If you selected unstructured documents, you can select parsing and chunking options for your documents. To compare parsers, see Parse documents. For information about chunking see Chunk documents for RAG.
The OCR parser and layout parser can incur additional costs. See Document AI feature pricing.
To select a parser, expand Document processing options and specify the parser options that you want to use.
Click Create.
You have now created a data connector, which will periodically sync data with the Cloud Storage location. You have also created an entity data store, which is named
gcs_store
.To check the status of your ingestion, go to the Data Stores page and click your data connector name to see details about it on its Data page
Data ingestion activity tab. When the status column on the Data ingestion activity tab changes from In progress to succeeded, the first ingestion is complete.
Depending on the size of your data, ingestion can take several minutes to several hours.
After you set up your data source and import data the first time, data is synced from that source at a frequency that you select during setup. About an hour after the data connector is created, the first sync occurs. The next sync then occurs around 24 hours, 72 hours, or 120 hours later.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Connect to Google Drive
To search data from Google Drive, use the following steps to create a connector using the Google Cloud console.
Before you begin:
You must be signed into the Google Cloud console with the same account that you use for the Google Drive instance that you plan to connect. Agentspace Enterprise uses your Google Workspace customer ID to connect to Google Drive.
Set up access control for Google Drive. For information about setting up access control, see Use data source access control.
Advanced Google Drive search is in Private preview. This feature is a prerequisite for using search summarization and search with follow-ups with a Google Drive data store. To use this feature, follow the steps in Use advanced drive indexing instead.
Console
To use the console to make Google Drive data searchable, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select Google Drive.
Choose a region for your data store.
Enter a name for your data store. . Click Create.
Use advanced drive indexing (Private preview)
Advanced drive indexing is in Private preview.
Follow this procedure if you plan to use Google Drive with search summarization and search with follow-ups.
Before you begin:
- You must be a Google Workspace super administrator to turn on advanced drive indexing. This is because with advanced drive indexing, Agentspace Enterprise indexes Google Drive data.
- You must be added to allowlist to use this feature.
Console
To use the console to create a Google Drive data store with advanced Google Drive indexing, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select Google Drive.
Select Advanced drive indexing.
Enter your Google Workspace email address.
In the Set up domain wide delegation section, review the instructions and take note of the service account client ID provided in step 4 of that section.
Set up domain wide delegation:
- Go to the Domain-wide delegation page of Google Workspace Admin Console and sign with your super administrator account.
- Click Add new.
- Enter the service account client ID that you took note of. (This ID is provided in the instructions in the Agentspace console in the Set up domain wide delegation section.)
Enter the following OAuth scopes.
https://www.googleapis.com/auth/drive.readonly, https://www.googleapis.com/auth/admin.directory.user.readonly, https://www.googleapis.com/auth/admin.directory.group.readonly, https://www.googleapis.com/auth/admin.directory.domain.readonly, https://www.googleapis.com/auth/admin.reports.audit.readonly
Click Authorize.
In the Agentspace console, click Continue.
Choose a region for your data store.
Enter a name for your data store.
Click Create. Depending on the size of your data, ingestion can take several minutes to several hours. Wait at least an hour before using your data store for searching.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Connect to Gmail
To search data from Gmail, use the following steps to create a data store and ingest data using the Google Cloud console.
Before you begin:
You must be signed into the Google Cloud console with the same account that you use for the Google Workspace instance that you plan to connect. Agentspace Enterprise uses your Google Workspace customer ID to connect to Gmail.
Set up access control for Gmail. For information about setting up access control, see Use data source access control.
Console
To use the console to make Gmail data searchable, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select Google Gmail.
Choose a region for your data store.
Enter a name for your data store.
Click Create.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Connect to Google Sites
To search data from Google Sites, use the following steps to create a connector using the Google Cloud console.
Before you begin:
You must be signed into the Google Cloud console with the same account that you use for the Google Workspace instance that you plan to connect. Agentspace Enterprise uses your Google Workspace customer ID to connect to Google Sites.
Set up access control for Google Sites. For information about setting up access control, see Use data source access control.
Console
To use the console to make Google Sites data searchable, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select Google Sites.
Choose a region for your data store.
Enter a name for your data store.
Click Create.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Connect to Google Calendar
To search data from Google Calendar, use the following steps to create a connector using the Google Cloud console.
Before you begin:
You must be signed into the Google Cloud console with the same account that you use for the Google Workspace instance that you plan to connect. Agentspace Enterprise uses your Google Workspace customer ID to connect to Google Calendar.
Set up access control for Google Calendar. For information about setting up access control, see Use data source access control.
Console
To use the console to make Google Calendar data searchable, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select Google Calendar.
Choose a region for your data store.
Enter a name for your data store.
Click Create.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Connect to Google Groups
To search data from Google Groups, use the following steps to create a connector using the Google Cloud console.
Before you begin:
You must be signed into the Google Cloud console with the same account that you use for the Google Workspace instance that you plan to connect. Agentspace Enterprise uses your Google Workspace customer ID to connect to Google Groups.
Set up access control for Google Groups. For information about setting up access control, see Use data source access control.
Console
To use the console to make Google Groups data searchable, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select Google Groups.
Choose a region for your data store.
Enter a name for your data store.
Click Create. Depending on the size of your data, ingestion can take several minutes to several hours. Wait at least an hour before using your data store for searching.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Sync people data from Google Workspace
You can set up people search for your work teams by syncing people data from Google Workspace. This data continuously syncs to Agentspace Enterprise after you create your data store.
People from your directory are shown in search results as cards that each display a person's available profile information, such as name, email address, organization, and profile picture. You can click a card to see that person's details page.
Prerequisites
Determine which identity provider your users will sign into your app with. If using a third-party identity provider, an administrator must federate your identity provider with Google Workspace. Federation can take a significant amount of time to plan and set up. For more information, see Use data source access control.
A Google Workspace administrator must enable people search on Google Workspace data. To do so:
- Sign in with an administrator account to the Google Admin console.
- In the Admin console, go to Directory > Directory settings.
- Turn on Contact sharing.
Sign into the Google Cloud console with the same account from which you plan to connect Google Workspace.
Connect to your identity provider using the steps in Connect your identity provider, and specify Google Identity as your provider.
For information about Google Workspace Directory, see Overview: Set up and manage the Directory in the Google Workspace documentation.
Create a people search data store
Console
To use the console to ingest people data, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click Create data store.
On the Source page, select People Search.
Choose a region for your data store.
Enter a name for your data store.
Click Create. Depending on the size of your data, syncing can take several minutes to several hours.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Import from Cloud SQL
To ingest data from Cloud SQL, use the following steps to set up Cloud SQL access, create a data store, and ingest data.
Set up staging bucket access for Cloud SQL instances
When ingesting data from Cloud SQL, data is first staged to a Cloud Storage bucket. Follow these steps to give a Cloud SQL instance access to Cloud Storage buckets.
In the Google Cloud console, go to the SQL page.
Click the Cloud SQL instance that you plan to import from.
Copy the identifier for the instance's service account, which looks like an email address—for example,
p9876-abcd33f@gcp-sa-cloud-sql.iam.gserviceaccount.com
.Go to the IAM & Admin page.
Click Grant access.
For New principals, enter the instance's service account identifier and select the Cloud Storage > Storage Admin role.
Click Save.
Next:
If your Cloud SQL data is in the same project as Agentspace Enterprise: Go to Import data from Cloud SQL.
If your Cloud SQL data is in a different project than your Agentspace Enterprise project: Go to Set up Cloud SQL access from a different project.
Set up Cloud SQL access from a different project
To give Agentspace Enterprise access to Cloud SQL data that's in a different project, follow these steps:
Replace the following
PROJECT_NUMBER
variable with your Agentspace Enterprise project number, and then copy the contents of the code block. This is your Agentspace Enterprise service account identifier:service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com`
Go to the IAM & Admin page.
Switch to your Cloud SQL project on the IAM & Admin page and click Grant Access.
For New principals, enter the identifier for the service account and select the Cloud SQL > Cloud SQL Viewer role.
Click Save.
Next, go to Import data from Cloud SQL.
Import data from Cloud SQL
Console
To use the console to ingest data from Cloud SQL, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select Cloud SQL.
Specify the project ID, instance ID, database ID, and table ID of the data that you plan to import.
Click Browse and choose an intermediate Cloud Storage location to export data to, and then click Select. Alternatively, enter the location directly in the
gs://
field.Select whether to turn on serverless export. Serverless export incurs additional cost. For information about serverless export, see Minimize the performance impact of exports in the Cloud SQL documentation.
Click Continue.
Choose a region for your data store.
Enter a name for your data store.
Click Create.
To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several minutes or several hours.
REST
To use the command line to create a data store and ingest data from Cloud SQL, follow these steps:
Create a data store.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \ -d '{ "displayName": "DISPLAY_NAME", "industryVertical": "GENERIC", "solutionTypes": ["SOLUTION_TYPE_SEARCH"], }'
Replace the following:
PROJECT_ID
: The ID of your project.DATA_STORE_ID
: The ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.DISPLAY_NAME
: The display name of the data store. This might be displayed in the Google Cloud console.
Import data from Cloud SQL.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \ -d '{ "cloudSqlSource": { "projectId": "SQL_PROJECT_ID", "instanceId": "INSTANCE_ID", "databaseId": "DATABASE_ID", "tableId": "TABLE_ID", "gcsStagingDir": "STAGING_DIRECTORY" }, "reconciliationMode": "RECONCILIATION_MODE", "autoGenerateIds": "AUTO_GENERATE_IDS", "idField": "ID_FIELD", }'
Replace the following:
PROJECT_ID
: The ID of your Agentspace Enterprise project.DATA_STORE_ID
: The ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.SQL_PROJECT_ID
: The ID of your Cloud SQL project.INSTANCE_ID
: The ID of your Cloud SQL instance.DATABASE_ID
: The ID of your Cloud SQL database.TABLE_ID
: The ID of your Cloud SQL table.STAGING_DIRECTORY
: Optional. A Cloud Storage directory—for example,gs://<your-gcs-bucket>/directory/import_errors
.RECONCILIATION_MODE
: Optional. Values areFULL
andINCREMENTAL
. Default isINCREMENTAL
. SpecifyingINCREMENTAL
causes an incremental refresh of data from Cloud SQL to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL
causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in Cloud SQL are removed from your data store. TheFULL
mode is helpful if you want to automatically delete documents that you no longer need.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Import from Spanner
To ingest data from Spanner, use the following steps to create a data store and ingest data using either the Google Cloud console or the API.
Set up Spanner access from a different project
If your Spanner data is in the same project as Agentspace Enterprise, skip to Import data from Spanner.
To give Agentspace Enterprise access to Spanner data that is in a different project, follow these steps:
Replace the following
PROJECT_NUMBER
variable with your Agentspace Enterprise project number, and then copy the contents of this code block. This is your Agentspace Enterprise service account identifier:service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
Go to the IAM & Admin page.
Switch to your Spanner project on the IAM & Admin page and click Grant Access.
For New principals, enter the identifier for the service account and select one of the following:
- If you won't use data boost during import, select the Cloud Spanner > Cloud Spanner Database Reader role.
- If you plan to use data boost during import, select the Cloud Spanner > Cloud Spanner Database Admin role, or a custom role with the permissions of Cloud Spanner Database Reader and spanner.databases.useDataBoost. For information about Data Boost, see Data Boost overview in the Spanner documentation.
Click Save.
Next, go to Import data from Spanner.
Import data from Spanner
Console
To use the console to ingest data from Spanner, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select Cloud Spanner.
Specify the project ID, instance ID, database ID, and table ID of the data that you plan to import.
Select whether to turn on Data Boost. For information about Data Boost, see Data Boost overview in the Spanner documentation.
Click Continue.
Choose a region for your data store.
Enter a name for your data store.
Click Create.
To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several minutes or several hours.
REST
To use the command line to create a data store and ingest data from Spanner, follow these steps:
Create a data store.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \ -d '{ "displayName": "DISPLAY_NAME", "industryVertical": "GENERIC", "solutionTypes": ["SOLUTION_TYPE_SEARCH"], "contentConfig": "CONTENT_REQUIRED", }'
Replace the following:
PROJECT_ID
: The ID of your Agentspace Enterprise project.DATA_STORE_ID
: The ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.DISPLAY_NAME
: The display name of the data store. This might be displayed in the Google Cloud console.
Import data from Spanner.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \ -d '{ "cloudSpannerSource": { "projectId": "SPANNER_PROJECT_ID", "instanceId": "INSTANCE_ID", "databaseId": "DATABASE_ID", "tableId": "TABLE_ID", "enableDataBoost": "DATA_BOOST_BOOLEAN" }, "reconciliationMode": "RECONCILIATION_MODE", "autoGenerateIds": "AUTO_GENERATE_IDS", "idField": "ID_FIELD", }'
Replace the following:
PROJECT_ID
: The ID of your Agentspace Enterprise project.DATA_STORE_ID
: The ID of the data store.SPANNER_PROJECT_ID
: The ID of your Spanner project.INSTANCE_ID
: The ID of your Spanner instance.DATABASE_ID
: The ID of your Spanner database.TABLE_ID
: The ID of your Spanner table.DATA_BOOST_BOOLEAN
: Optional. Whether to turn on Data Boost. For information about Data Boost, see Data Boost overview in the Spanner documentation.RECONCILIATION_MODE
: Optional. Values areFULL
andINCREMENTAL
. Default isINCREMENTAL
. SpecifyingINCREMENTAL
causes an incremental refresh of data from Spanner to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL
causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in Spanner are removed from your data store. TheFULL
mode is helpful if you want to automatically delete documents that you no longer need.AUTO_GENERATE_IDS
: Optional. Specifies whether to automatically generate document IDs. If set totrue
, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode
toFULL
to maintain consistent document IDs.ID_FIELD
: Optional. Specifies which fields are the document IDs.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Import from Firestore
To ingest data from Firestore, use the following steps to create a data store and ingest data using either the Google Cloud console or the API.
If your Firestore data is in the same project as Agentspace Enterprise, go to Import data from Firestore.
If your Firestore data is in a different project than your Agentspace Enterprise project, go to Set up Firestore access.
Set up Firestore access from a different project
To give Agentspace Enterprise access to Firestore data that's in a different project, follow these steps:
Replace the following
PROJECT_NUMBER
variable with your Agentspace Enterprise project number, and then copy the contents of this code block. This is your Agentspace Enterprise service account identifier:service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
Go to the IAM & Admin page.
Switch to your Firestore project on the IAM & Admin page and click Grant Access.
For New principals, enter the instance's service account identifier and select the Datastore > Cloud Datastore Import Export Admin role.
Click Save.
Switch back to your Agentspace Enterprise project.
Next, go to Import data from Firestore.
Import data from Firestore
Console
To use the console to ingest data from Firestore, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select Firestore.
Specify the project ID, database ID, and collection ID of the data that you plan to import.
Click Continue.
Choose a region for your data store.
Enter a name for your data store.
Click Create.
To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several minutes or several hours.
REST
To use the command line to create a data store and ingest data from Firestore, follow these steps:
Create a data store.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \ -d '{ "displayName": "DISPLAY_NAME", "industryVertical": "GENERIC", "solutionTypes": ["SOLUTION_TYPE_SEARCH"], }'
Replace the following:
PROJECT_ID
: The ID of your project.DATA_STORE_ID
: The ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.DISPLAY_NAME
: The display name of the data store. This might be displayed in the Google Cloud console.
Import data from Firestore.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \ -d '{ "firestoreSource": { "projectId": "FIRESTORE_PROJECT_ID", "databaseId": "DATABASE_ID", "collectionId": "COLLECTION_ID", }, "reconciliationMode": "RECONCILIATION_MODE", "autoGenerateIds": "AUTO_GENERATE_IDS", "idField": "ID_FIELD", }'
Replace the following:
PROJECT_ID
: The ID of your Agentspace Enterprise project.DATA_STORE_ID
: The ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.FIRESTORE_PROJECT_ID
: The ID of your Firestore project.DATABASE_ID
: The ID of your Firestore database.COLLECTION_ID
: The ID of your Firestore collection.RECONCILIATION_MODE
: Optional. Values areFULL
andINCREMENTAL
. Default isINCREMENTAL
. SpecifyingINCREMENTAL
causes an incremental refresh of data from Firestore to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL
causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in Firestore are removed from your data store. TheFULL
mode is helpful if you want to automatically delete documents that you no longer need.AUTO_GENERATE_IDS
: Optional. Specifies whether to automatically generate document IDs. If set totrue
, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode
toFULL
to maintain consistent document IDs.ID_FIELD
: Optional. Specifies which fields are the document IDs.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Import from Bigtable
To ingest data from Bigtable, use the following steps to create a data store and ingest data using the API.
Set up Bigtable access
To give Agentspace Enterprise access to Bigtable data that's in a different project, follow these steps:
Replace the following
PROJECT_NUMBER
variable with your Agentspace Enterprise project number, then copy the contents of this code block. This is your Agentspace Enterprise service account identifier:service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com`
Go to the IAM & Admin page.
Switch to your Bigtable project on the IAM & Admin page and click Grant Access.
For New principals, enter the instance's service account identifier and select the Bigtable > Bigtable Reader role.
Click Save.
Switch back to your Agentspace Enterprise project.
Next, go to Import data from Bigtable.
Import data from Bigtable
REST
To use the command line to create a data store and ingest data from Bigtable, follow these steps:
Create a data store.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \ -d '{ "displayName": "DISPLAY_NAME", "industryVertical": "GENERIC", "solutionTypes": ["SOLUTION_TYPE_SEARCH"], }'
Replace the following:
PROJECT_ID
: The ID of your project.DATA_STORE_ID
: The ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.DISPLAY_NAME
: The display name of the data store. This might be displayed in the Google Cloud console.
Import data from Bigtable.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \ -d '{ "bigtableSource ": { "projectId": "BIGTABLE_PROJECT_ID", "instanceId": "INSTANCE_ID", "tableId": "TABLE_ID", "bigtableOptions": { "keyFieldName": "KEY_FIELD_NAME", "families": { "key": "KEY", "value": { "fieldName": "FIELD_NAME", "encoding": "ENCODING", "type": "TYPE", "columns": [ { "qualifier": "QUALIFIER", "fieldName": "FIELD_NAME", "encoding": "COLUMN_ENCODING", "type": "COLUMN_VALUES_TYPE" } ] } } ... } }, "reconciliationMode": "RECONCILIATION_MODE", "autoGenerateIds": "AUTO_GENERATE_IDS", "idField": "ID_FIELD", }'
Replace the following:
PROJECT_ID
: The ID of your Agentspace Enterprise project.DATA_STORE_ID
: The ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.BIGTABLE_PROJECT_ID
: The ID of your Bigtable project.INSTANCE_ID
: The ID of your Bigtable instance.TABLE_ID
: The ID of your Bigtable table.KEY_FIELD_NAME
: Optional but recommended. The field name to use for the row key value after ingesting to Agentspace Enterprise.KEY
: Required. A string value for the column family key.ENCODING
: Optional. The encoding mode of the values when the type is not STRING.This can be overridden for a specific column by listing that column incolumns
and specifying an encoding for it.COLUMN_TYPE
: Optional. The type of values in this column family.QUALIFIER
: Required. Qualifier of the column.FIELD_NAME
: Optional but recommended. The field name to use for this column after ingesting to Agentspace Enterprise.COLUMN_ENCODING
: Optional. The encoding mode of the values for a specific column when the type is not STRING.RECONCILIATION_MODE
: Optional. Values areFULL
andINCREMENTAL
. Default isINCREMENTAL
. SpecifyingINCREMENTAL
causes an incremental refresh of data from Bigtable to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL
causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in Bigtable are removed from your data store. TheFULL
mode is helpful if you want to automatically delete documents that you no longer need.AUTO_GENERATE_IDS
: Optional. Specifies whether to automatically generate document IDs. If set totrue
, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode
toFULL
to maintain consistent document IDs.Specify
autoGenerateIds
only whenbigquerySource.dataSchema
is set tocustom
. Otherwise anINVALID_ARGUMENT
error is returned. If you don't specifyautoGenerateIds
or set it tofalse
, you must specifyidField
. Otherwise the documents fail to import.ID_FIELD
: Optional. Specifies which fields are the document IDs.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Import from AlloyDB for PostgreSQL
To ingest data from AlloyDB for PostgreSQL, use the following steps to create a data store and ingest data using either the Google Cloud console or the API.
If your AlloyDB for PostgreSQL data is in the same project as Agentspace Enterprise project, go to Import data from AlloyDB for PostgreSQL.
If your AlloyDB for PostgreSQL data is in a different project than your Agentspace Enterprise project, go to Set up AlloyDB for PostgreSQL access.
Set up AlloyDB for PostgreSQL access from a different project
To give Agentspace Enterprise access to AlloyDB for PostgreSQL data that's in a different project, follow these steps:
Replace the following
PROJECT_NUMBER
variable with your Agentspace Enterprise project number, and then copy the contents of this code block. This is your Agentspace Enterprise service account identifier:service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
Switch to the Google Cloud project where your AlloyDB for PostgreSQL data resides.
Go to the IAM page.
Click Grant Access.
For New principals, enter the Agentspace Enterprise service account identifier and select the Cloud AlloyDB > Cloud AlloyDB Admin role.
Click Save.
Switch back to your Agentspace Enterprise project.
Next, go to Import data from AlloyDB for PostgreSQL.
Import data from AlloyDB for PostgreSQL
Console
To use the console to ingest data from AlloyDB for PostgreSQL, follow these steps:
In the Google Cloud console, go to the Agentspace page.
In the navigation menu, click Data Stores.
Click Create data store.
On the Source page, select AlloyDB.
Specify the project ID, location ID, cluster ID, database ID, and table ID of the data that you plan to import.
Click Continue.
Choose a region for your data store.
Enter a name for your data store.
Click Create.
To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several minutes or several hours.
REST
To use the command line to create a data store and ingest data from AlloyDB for PostgreSQL, follow these steps:
Create a data store.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \ -d '{ "displayName": "DISPLAY_NAME", "industryVertical": "GENERIC", "solutionTypes": ["SOLUTION_TYPE_SEARCH"], }'
Replace the following:
PROJECT_ID
: The ID of your project.DATA_STORE_ID
: The ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.DISPLAY_NAME
: The display name of the data store. This might be displayed in the Google Cloud console.
Import data from AlloyDB for PostgreSQL.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \ -d '{ "alloydbSource": { "projectId": "ALLOYDB_PROJECT_ID", "locationId": "LOCATION_ID", "clusterId": "CLUSTER_ID", "databaseId": "DATABASE_ID", "tableId": "TABLE_ID", }, "reconciliationMode": "RECONCILIATION_MODE", "autoGenerateIds": "AUTO_GENERATE_IDS", "idField": "ID_FIELD", }'
Replace the following:
PROJECT_ID
: The ID of your Agentspace Enterprise project.DATA_STORE_ID
: The ID of the data store. The ID can contain only lowercase letters, digits, underscores, and hyphens.ALLOYDB_PROJECT_ID
: The ID of your AlloyDB for PostgreSQL project.LOCATION_ID
: The ID of your AlloyDB for PostgreSQL location.CLUSTER_ID
: The ID of your AlloyDB for PostgreSQL cluster.DATABASE_ID
: The ID of your AlloyDB for PostgreSQL database.TABLE_ID
: The ID of your AlloyDB for PostgreSQL table.RECONCILIATION_MODE
: Optional. Values areFULL
andINCREMENTAL
. Default isINCREMENTAL
. SpecifyingINCREMENTAL
causes an incremental refresh of data from AlloyDB for PostgreSQL to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL
causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in AlloyDB for PostgreSQL are removed from your data store. TheFULL
mode is helpful if you want to automatically delete documents that you no longer need.AUTO_GENERATE_IDS
: Optional. Specifies whether to automatically generate document IDs. If set totrue
, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode
toFULL
to maintain consistent document IDs.ID_FIELD
: Optional. Specifies which fields are the document IDs.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Upload structured JSON data with the API
To directly upload a JSON document or object using the API, follow these steps.
Before importing your data, Prepare data for ingesting.
REST
To use the command line to create a data store and import structured JSON data, follow these steps.
Create a data store.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \ -d '{ "displayName": "DATA_STORE_DISPLAY_NAME", "industryVertical": "GENERIC", "solutionTypes": ["SOLUTION_TYPE_SEARCH"] }'
Replace the following:
PROJECT_ID
: the ID of your project.DATA_STORE_ID
: the ID of the data store that you want to create. This ID can contain only lowercase letters, digits, underscores, and hyphens.DATA_STORE_DISPLAY_NAME
: the display name of the data store that you want to create.
Import structured data.
There are a few approaches that you can use to upload data, including:
Upload a JSON document.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \ -d '{ "jsonData": "JSON_DOCUMENT_STRING" }'
Replace the following:
DOCUMENT_ID
: a unique ID for the document. This ID can be up to 63 characters long and contain only lowercase letters, digits, underscores, and hyphens.JSON_DOCUMENT_STRING
: the JSON document as a single string. This must conform to the JSON schema that you provided in the previous step—for example:{ \"title\": \"test title\", \"categories\": [\"cat_1\", \"cat_2\"], \"uri\": \"test uri\"}
Upload a JSON object.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \ -d '{ "structData": JSON_DOCUMENT_OBJECT }'
Replace
JSON_DOCUMENT_OBJECT
with the JSON document as a JSON object. This must conform to the JSON schema that you provided in the previous step—for example:```json { "title": "test title", "categories": [ "cat_1", "cat_2" ], "uri": "test uri" } ```
Update with a JSON document.
curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \ -d '{ "jsonData": "JSON_DOCUMENT_STRING" }'
Update with a JSON object.
curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \ -d '{ "structData": JSON_DOCUMENT_OBJECT }'
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.
Troubleshoot data ingestion
If you are having problems with data ingestion, review these tips:
If you're using customer-managed encryption keys and data import fails (with error message
The caller does not have permission
), then make sure that the CryptoKey Encrypter/Decrypter IAM role (roles/cloudkms.cryptoKeyEncrypterDecrypter
) on the key has been granted to the Cloud Storage service agent. For more information, see Before you begin in "Customer-managed encryption keys".If you are using advanced website indexing and the Document usage for the data store is much lower than you expect, then review the URL patterns that you specified for indexing and make sure that the URL patterns specified cover the pages that you want to index and expand them if needed. For example, if you used
*.en.example.com/*
, you might need to add*.example.com/*
to the sites you want indexed.
Create a data store using Terraform
You can use Terraform to create an empty data store. After the empty data store is created, you can ingest data into the data store using the Google Cloud console or API commands.
To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.
To create an empty data store using Terraform, see
google_discovery_engine_data_store
.