本頁面由 Cloud Translation API 翻譯而成。

建立搜尋資料儲存庫

如要建立資料儲存庫並擷取搜尋資料，請前往您打算使用的來源部分：

使用網站內容建立資料儲存庫
從 BigQuery 匯入
從 Cloud Storage 匯入
從 Google 雲端硬碟同步
從 Gmail 同步 (公開測試版)
從 Google 協作平台同步 (公開測試版)
從 Google 日曆同步 (公開預先發布版)
從 Google 網路論壇同步 (公開測試版)
從 Cloud SQL 匯入
從 Spanner 匯入 (公開預先發布版)
從 Firestore 匯入
從 Bigtable 匯入 (公開測試版)
從 AlloyDB for PostgreSQL 匯入 (公開預先發布版)
使用 API 上傳結構化 JSON 資料
使用 Terraform 建立資料儲存庫

如要改為從第三方資料來源同步處理資料，請參閱「連線至第三方資料來源」。

如需疑難排解資訊，請參閱「排解資料擷取問題」。

使用網站內容建立資料儲存庫

請按照下列程序建立資料儲存庫，並為網站建立索引。

建立網站資料儲存庫後，必須將其附加至已啟用 Enterprise 版功能的應用程式，才能使用。您可以在建立應用程式時啟用 Enterprise 版本。這會產生額外費用。請參閱「建立搜尋應用程式」和「關於進階功能」。

事前準備

如果您在網站中使用 robots.txt 檔案，請更新該檔案。詳情請參閱如何準備網站的 robots.txt 檔案。

程序

控制台

如要透過 Google Cloud 控制台建立資料存放區並為網站建立索引，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
點按導覽選單中的「Data Stores」(資料儲存庫)。
按一下「Create data store」(建立資料儲存庫)。
在「來源」頁面中，選取「網站內容」。
選擇是否要為這個資料儲存庫開啟「進階網站索引建立功能」。這個選項一經選取即無法再變更。

進階網站索引功能提供其他功能，例如搜尋摘要、後續搜尋和擷取式回答。進階網站索引建立功能會產生額外費用，且您必須驗證建立索引的網站網域擁有權。詳情請參閱進階網站索引和定價。
在「Sites to include」(要包含的網站) 欄位中，輸入與要納入資料儲存庫的網站相符的網址模式。每行輸入一個網址模式，不需使用半形逗號分隔。例如： example.com/docs/*
選用：在「要排除的網站」欄位中，輸入要從資料儲存庫排除的網址模式。

排除的網站優先於納入的網站。因此，如果您納入 example.com/docs/* 但排除 example.com，系統就不會為任何網站建立索引。詳情請參閱「網站資料」。
按一下「繼續」。
選取資料儲存庫的位置。
- 建立基本網站搜尋資料儲存庫時，系統一律會將此項設定為「global (Global)」。
- 使用進階網站索引功能建立資料儲存庫時，您可以選取位置。由於建立索引的網站必須公開，Google 強烈建議您選取「global (Global)」(全球 (全球)) 做為位置。確保所有搜尋和回答服務的可用性達到最高，並消除區域資料儲存庫的限制。
輸入資料儲存庫的名稱。
點按「Create」(建立)。Vertex AI Search 會隨即建立資料儲存庫，並顯示在「Data Stores」(資料儲存庫) 頁面中。
如要查看資料儲存庫的相關資訊，請點按「Name」(名稱) 欄中的資料儲存庫名稱。系統會顯示資料儲存庫頁面。
- 如果開啟「進階網站索引建立功能」，系統會顯示警告，提示您驗證資料儲存庫中的網域。
- 如果配額不足 (您指定的網站頁面數量超過專案的「每個專案的文件數量」配額)，系統會顯示額外警告，提示您升級配額。
如要驗證資料商店中網址模式的網域，請按照「驗證網站網域」頁面的操作說明進行。
如要升級配額，請按照下列步驟操作：
1. 按一下「升級配額」。 Google Cloud 控制台會顯示「IAM and Admin」(IAM 與管理) 頁面。
2. 請按照 Google Cloud 說明文件中的「申請調整配額」一節操作。要增加的配額是 Discovery Engine API 服務中的「文件數量」。
3. 提交提高配額上限的要求後，請返回「AI Applications」頁面，然後點選導覽選單中的「Data Stores」。
4. 在「名稱」欄中，按一下資料儲存庫的名稱。「狀態」欄會指出已超出配額的網站目前是否正在建立索引。如果網址的「狀態」欄顯示「已編入索引」，表示該網址或網址模式可使用進階網站索引功能。
詳情請參閱「配額與限制」頁面中的「網頁索引配額」。

Python

詳情請參閱 Vertex AI Search Python API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

匯入網站

#     from google.api_core.client_options import ClientOptions
#
#     from google.cloud import discoveryengine_v1 as discoveryengine
#
#     # TODO(developer): Uncomment these variables before running the sample.
#     # project_id = "YOUR_PROJECT_ID"
#     # location = "YOUR_LOCATION" # Values: "global"
#     # data_store_id = "YOUR_DATA_STORE_ID"
#     # NOTE: Do not include http or https protocol in the URI pattern
#     # uri_pattern = "cloud.google.com/generative-ai-app-builder/docs/*"
#
#     #  For more information, refer to:
#     # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
#     client_options = (
#         ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
#         if location != "global"
#         else None
#     )
#
#     # Create a client
#     client = discoveryengine.SiteSearchEngineServiceClient(
#         client_options=client_options
#     )
#
#     # The full resource name of the data store
#     # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}
#     site_search_engine = client.site_search_engine_path(
#         project=project_id, location=location, data_store=data_store_id
#     )
#
#     # Target Site to index
#     target_site = discoveryengine.TargetSite(
#         provided_uri_pattern=uri_pattern,
#         # Options: INCLUDE, EXCLUDE
#         type_=discoveryengine.TargetSite.Type.INCLUDE,
#         exact_match=False,
#     )
#
#     # Make the request
#     operation = client.create_target_site(
#         parent=site_search_engine,
#         target_site=target_site,
#     )
#
#     print(f"Waiting for operation to complete: {operation.operation.name}")
#     response = operation.result()
#
#     # After the operation is complete,
#     # get information from operation metadata
#     metadata = discoveryengine.CreateTargetSiteMetadata(operation.metadata)
#
#     # Handle the response
#     print(response)
#     print(metadata)

後續步驟

如要將網站資料儲存庫連結至應用程式，請建立已啟用 Enterprise 功能的應用程式，然後按照「建立搜尋應用程式」中的步驟選取資料儲存庫。
如果已啟用進階網站索引功能，可以使用結構化資料更新結構定義。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

從 BigQuery 匯入

Vertex AI Search 支援搜尋 BigQuery 資料。

您可以透過兩種方式，從 BigQuery 資料表建立資料儲存庫：

一次性擷取：將資料從 BigQuery 資料表匯入資料儲存庫。除非手動重新整理資料，否則資料存放區中的資料不會變更。
定期擷取：從一或多個 BigQuery 資料表匯入資料，並設定同步頻率，決定資料存放區更新 BigQuery 資料集最新資料的頻率。

下表比較將 BigQuery 資料匯入 Vertex AI Search 資料儲存庫的兩種方式。

單次擷取	定期擷取
正式發布 (GA)。	公開預先發布版。
資料必須手動重新整理。	資料會每 1、3 或 5 天自動更新一次。資料無法手動重新整理。
Vertex AI Search 會從 BigQuery 的單一資料表建立單一資料儲存庫。	Vertex AI Search 會為每個指定的資料表，建立 BigQuery「資料集」的「資料連接器」，以及資料儲存庫 (稱為「實體」資料儲存庫)。每個資料連結器的資料表必須具有相同的資料類型 (例如結構化)，且位於相同的 BigQuery 資料集中。
如要將多個資料表的資料合併到一個資料儲存庫，請先從一個資料表擷取資料，然後從另一個來源或 BigQuery 資料表擷取更多資料。	由於系統不支援手動匯入資料，實體資料儲存庫中的資料只能來自一個 BigQuery 資料表。
支援資料來源存取控管。	不支援資料來源存取控管。匯入的資料可能包含存取權控管機制，但系統不會採用這些機制。
您可以使用Google Cloud 控制台或 API 建立資料儲存庫。	您必須使用控制台建立資料連接器及其實體資料儲存庫。
符合 CMEK 規定。	符合 CMEK 規定。

從 BigQuery 匯入一次

如要從 BigQuery 資料表擷取資料，請按照下列步驟，使用 Google Cloud 控制台或 API 建立資料儲存庫並擷取資料。

匯入資料前，請先參閱「為資料擷取作業做準備」。

控制台

如要使用 Google Cloud 控制台擷取 BigQuery 資料，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
前往「Data Stores」(資料儲存庫) 頁面。
點按「 Create data store」(建立資料儲存庫)。
在「來源」頁面中，選取「BigQuery」。
在「你要匯入哪種資料？」部分，選取要匯入的資料類型。
在「Synchronization frequency」(同步處理頻率) 部分選取「One time」(一次性)。
在「BigQuery 路徑」欄位中，按一下「瀏覽」，選取您準備用於擷取的資料表，然後按一下「選取」。或者，直接在「BigQuery 路徑」欄位中輸入資料表位置。
按一下「繼續」。
如果只匯入一次結構化資料：
1. 將欄位對應至主要屬性。
2. 如果結構定義缺少重要欄位，請使用「新增欄位」新增。
  
  詳情請參閱「關於自動偵測和編輯」。
3. 按一下「繼續」。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
點選「建立」。
接著需要檢查擷取狀態，請前往「Data stores」(資料儲存庫) 頁面，點按資料儲存庫名稱，即可在相應的「Data」(資料) 頁面查看該儲存庫的詳細資料。「Activity」(活動) 分頁的狀態欄從「In progress」(進行中) 變為「Import completed」(匯入完成) 時，表示擷取作業已完成。

視資料大小而定，可能需要數分鐘至數小時才能擷取完畢。

REST

如要使用指令列建立資料存放區，並從 BigQuery 匯入資料，請按照下列步驟操作。

建立資料儲存庫。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
}'
```
注意：產業類別 GENERIC用於為自訂搜尋應用程式建立結構化、非結構化和網站資料儲存庫。

更改下列內容：
- PROJECT_ID：您的 Google Cloud 專案 ID。
- DATA_STORE_ID：要建立的 Vertex AI Search 資料儲存庫 ID。這個 ID 只能包含小寫字母、數字、底線和連字號。
- DATA_STORE_DISPLAY_NAME：要建立的 Vertex AI Search 資料儲存庫顯示名稱。
選用：如果您要上傳非結構化資料，並想設定文件剖析或開啟 RAG 的文件分塊功能，請指定 documentProcessingConfig 物件，並將其納入資料儲存庫建立要求。如果擷取的是掃描的 PDF，建議設定 PDF 適用的 OCR 剖析器。如要瞭解如何設定剖析或分塊選項，請參閱「剖析及分塊文件」。
從 BigQuery 匯入資料。

如果您已定義架構，請確認資料符合該架構。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
-d '{
  "bigquerySource": {
    "projectId": "PROJECT_ID",
    "datasetId":"DATASET_ID",
    "tableId": "TABLE_ID",
    "dataSchema": "DATA_SCHEMA",
    "aclEnabled": "BOOLEAN"
  },
  "reconciliationMode": "RECONCILIATION_MODE",
  "autoGenerateIds": "AUTO_GENERATE_IDS",
  "idField": "ID_FIELD",
  "errorConfig": {
    "gcsPrefix": "ERROR_DIRECTORY"
  }
}'
```
更改下列內容：
- PROJECT_ID：您的 Google Cloud 專案 ID。
- ：Vertex AI Search 資料儲存庫的 ID。DATA_STORE_ID
- DATASET_ID：BigQuery 資料集的 ID。
- TABLE_ID：BigQuery 資料表的 ID。
  - 如果 BigQuery 資料表不在 PROJECT_ID 下方，您需要為服務帳戶 service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com 授予 BigQuery 資料表的「BigQuery 資料檢視者」權限。舉例來說，如果您要將來源專案「123」的 BigQuery 資料表匯入目的地專案「456」，請為專案「123」下的 BigQuery 資料表授予 service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com 權限。
- DATA_SCHEMA：選用。值為 document 和 custom。預設值為 document。
  - document：您使用的 BigQuery 資料表必須符合「準備要擷取的資料」一文提供的預設 BigQuery 結構定義。您可以自行定義每份文件的 ID，同時將所有資料包裝在 jsonData 字串中。
  - custom：系統接受任何 BigQuery 資料表結構定義，且 Vertex AI Search 會自動為匯入的每個文件產生 ID。
- ERROR_DIRECTORY：選用。Cloud Storage 目錄，用於存放匯入作業的錯誤資訊，例如 gs://<your-gcs-bucket>/directory/import_errors。Google 建議將這個欄位留空，讓 Vertex AI Search 自動建立暫時目錄。
- RECONCILIATION_MODE：選用。值為 FULL 和 INCREMENTAL。預設值為 INCREMENTAL。指定 INCREMENTAL 會導致系統將 BigQuery 資料增量重新整理至資料存放區。這會執行 upsert 作業，新增文件並以 ID 相同的更新文件取代現有文件。指定 FULL 會導致資料儲存庫中的文件完全重新基準化。換句話說，系統會將新文件和更新的文件新增至資料存放區，並從資料存放區中移除不在 BigQuery 中的文件。如果您想自動刪除不再需要的檔案，FULL 模式會很有幫助。
- AUTO_GENERATE_IDS：選用。指定是否要自動產生文件 ID。如果設為 true，系統會根據酬載的雜湊值產生文件 ID。請注意，多次匯入時，產生的文件 ID 可能不一致。如果您在多次匯入時自動產生 ID，Google 強烈建議將 reconciliationMode 設為 FULL，確保文件 ID 一致。
  
  只有在 bigquerySource.dataSchema 設為 custom 時，才指定 autoGenerateIds。否則，系統會傳回 INVALID_ARGUMENT 錯誤。如未指定 autoGenerateIds 或將其設為 false，則必須指定 idField。否則文件無法匯入。
- ID_FIELD：選用。指定哪些欄位是文件 ID。如果是 BigQuery 來源檔案，idField 表示 BigQuery 資料表中包含文件 ID 的資料欄名稱。
  
  只有在 (1) bigquerySource.dataSchema 設為 custom，且 (2) auto_generate_ids 設為 false 或未指定時，才需要指定 idField。否則會傳回 INVALID_ARGUMENT 錯誤。
  
  BigQuery 資料欄名稱的值必須為字串類型，長度必須介於 1 至 63 個字元之間，且必須符合 RFC-1034。否則文件無法匯入。

C#

詳情請參閱 Vertex AI Search C# API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataStoreServiceClientSnippets
{
    /// <summary>Snippet for CreateDataStore</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataStoreRequestObject()
    {
        // Create client
        DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.Create();
        // Initialize request argument(s)
        CreateDataStoreRequest request = new CreateDataStoreRequest
        {
            ParentAsCollectionName = CollectionName.FromProjectLocationCollection("[PROJECT]", "[LOCATION]", "[COLLECTION]"),
            DataStore = new DataStore(),
            DataStoreId = "",
            CreateAdvancedSiteSearch = false,
            CmekConfigNameAsCmekConfigName = CmekConfigName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            SkipDefaultSchemaCreation = false,
        };
        // Make the request
        Operation<DataStore, CreateDataStoreMetadata> response = dataStoreServiceClient.CreateDataStore(request);

        // Poll until the returned long-running operation is complete
        Operation<DataStore, CreateDataStoreMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataStore result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataStore, CreateDataStoreMetadata> retrievedResponse = dataStoreServiceClient.PollOnceCreateDataStore(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataStore retrievedResult = retrievedResponse.Result;
        }
    }
}

匯入文件

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
            ForceRefreshContent = false,
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

詳情請參閱 Vertex AI Search Go API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDataStoreClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.CreateDataStoreRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.
	}
	op, err := c.CreateDataStore(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

匯入文件


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

詳情請參閱 Vertex AI Search Java API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫

import com.google.cloud.discoveryengine.v1.CollectionName;
import com.google.cloud.discoveryengine.v1.CreateDataStoreRequest;
import com.google.cloud.discoveryengine.v1.DataStore;
import com.google.cloud.discoveryengine.v1.DataStoreServiceClient;

public class SyncCreateDataStore {

  public static void main(String[] args) throws Exception {
    syncCreateDataStore();
  }

  public static void syncCreateDataStore() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.create()) {
      CreateDataStoreRequest request =
          CreateDataStoreRequest.newBuilder()
              .setParent(CollectionName.of("[PROJECT]", "[LOCATION]", "[COLLECTION]").toString())
              .setDataStore(DataStore.newBuilder().build())
              .setDataStoreId("dataStoreId929489618")
              .setCreateAdvancedSiteSearch(true)
              .setSkipDefaultSchemaCreation(true)
              .build();
      DataStore response = dataStoreServiceClient.createDataStoreAsync(request).get();
    }
  }
}

匯入文件

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .setForceRefreshContent(true)
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

詳情請參閱 Vertex AI Search Node.js API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Resource name of the CmekConfig to use for protecting this DataStore.
 */
// const cmekConfigName = 'abc123'
/**
 *  DataStore without CMEK protections. If a default CmekConfig is set for
 *  the project, setting this field will override the default CmekConfig as
 *  well.
 */
// const disableCmek = true
/**
 *  Required. The parent resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}`.
 */
// const parent = 'abc123'
/**
 *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to
 *  create.
 */
// const dataStore = {}
/**
 *  Required. The ID to use for the
 *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become
 *  the final component of the
 *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name.
 *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  standard with a length limit of 63 characters. Otherwise, an
 *  INVALID_ARGUMENT error is returned.
 */
// const dataStoreId = 'abc123'
/**
 *  A boolean flag indicating whether user want to directly create an advanced
 *  data store for site search.
 *  If the data store is not configured as site
 *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will
 *  be ignored.
 */
// const createAdvancedSiteSearch = true
/**
 *  A boolean flag indicating whether to skip the default schema creation for
 *  the data store. Only enable this flag if you are certain that the default
 *  schema is incompatible with your use case.
 *  If set to true, you must manually create a schema for the data store before
 *  any documents can be ingested.
 *  This flag cannot be specified if `data_store.starting_schema` is specified.
 */
// const skipDefaultSchemaCreation = true

// Imports the Discoveryengine library
const {DataStoreServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DataStoreServiceClient();

async function callCreateDataStore() {
  // Construct request
  const request = {
    parent,
    dataStore,
    dataStoreId,
  };

  // Run request
  const [operation] = await discoveryengineClient.createDataStore(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataStore();

匯入文件

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'
/**
 *  Optional. Whether to force refresh the unstructured content of the
 *  documents.
 *  If set to `true`, the content part of the documents will be refreshed
 *  regardless of the update status of the referencing content.
 */
// const forceRefreshContent = true

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

詳情請參閱 Vertex AI Search Python API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

匯入文件


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# bigquery_dataset = "YOUR_BIGQUERY_DATASET"
# bigquery_table = "YOUR_BIGQUERY_TABLE"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    bigquery_source=discoveryengine.BigQuerySource(
        project_id=project_id,
        dataset_id=bigquery_dataset,
        table_id=bigquery_table,
        data_schema="custom",
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Ruby

詳情請參閱 Vertex AI Search Ruby API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫

require "google/cloud/discovery_engine/v1"

##
# Snippet for the create_data_store call in the DataStoreService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.
#
def create_data_store
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new

  # Call the create_data_store method.
  result = client.create_data_store request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

匯入文件

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

連結至 BigQuery 並定期同步

匯入資料前，請先參閱「為資料擷取作業做準備」。

以下程序說明如何建立資料連接器，將 BigQuery 資料集與 Vertex AI Search 資料連接器建立關聯，以及如何為要建立的每個資料儲存庫指定資料集中的資料表。資料連接器的子項資料儲存庫稱為「實體」資料儲存庫。

資料集中的資料會定期同步至實體資料儲存庫。您可以指定每日、每三天或每五天同步一次。

控制台

如要使用 Google Cloud 控制台建立連接器，定期將 BigQuery 資料集中的資料同步至 Vertex AI Search，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
點按導覽選單中的「Data Stores」(資料儲存庫)。
按一下「Create data store」(建立資料儲存庫)。
在「來源」頁面中，選取「BigQuery」。
選取要匯入的資料類型。
按一下「週期性」。
選取「同步頻率」，也就是 Vertex AI Search 連接器與 BigQuery 資料集同步的頻率。日後可以變更頻率。
在「BigQuery dataset path」(BigQuery 資料集路徑) 欄位中，按一下「Browse」(瀏覽)，然後選取包含您準備要擷取的資料表。或者，您也可以直接在「BigQuery 路徑」欄位中輸入資料表位置。路徑格式為 projectname.datasetname。
在「Tables to sync」(要同步處理的資料表) 欄位中，按一下「Browse」(瀏覽)，然後選取包含資料儲存庫所需資料的資料表。
注意：
請確認表格中的資料與您在步驟 5 中選取的資料類型相符。
如果發生不符情況，您要等到下列其中一種情況發生，才會知道：
- 連接器嘗試匯入資料時發生錯誤。
- 看到出乎意料的結果。如果選取的類型為結構化，但應該是非結構化或含中繼資料的結構化，就會發生這種情況。資料已匯入，但系統無法辨識內容網址或中繼資料，並將其視為字串。
如果資料集中還有其他要用於資料儲存區的資料表，請按一下「新增資料表」並指定這些資料表。
按一下「繼續」。
選擇資料儲存庫的區域，輸入資料連接器的名稱，然後按一下「Create」(建立)。

您已建立資料連接器，該連接器會定期與 BigQuery 資料集同步處理資料。您已建立一或多個實體資料儲存庫。資料儲存庫的名稱與 BigQuery 資料表相同。
如要檢查擷取狀態，請前往「Data Stores」(資料儲存庫) 頁面，點按資料連接器名稱，即可在相應的「Data」(資料) 頁面查看該資料連接器的詳細資料 > 點按「Data ingestion activity」(資料擷取活動) 分頁標籤。「Activity」(活動) 分頁的狀態欄從「In progress」(進行中) 變為「succeeded」(成功) 時，表示首次擷取作業已完成。

視資料大小而定，可能需要數分鐘至數小時才能擷取完畢。

設定資料來源並首次匯入資料後，資料儲存庫會按照您在設定期間選取的頻率，從該來源同步資料。建立資料連接器後約一小時，系統就會進行首次同步。下一次同步處理會在 24 小時、72 小時或 120 小時後進行。

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

從 Cloud Storage 匯入

您可以透過兩種方式，從 Cloud Storage 資料表建立資料存放區：

一次性擷取：將 Cloud Storage 資料夾或檔案中的資料匯入資料儲存庫。除非手動重新整理資料，否則資料存放區中的資料不會變更。
定期擷取：從 Cloud Storage 資料夾或檔案匯入資料，並設定同步頻率，決定資料存放區更新來自該 Cloud Storage 位置的最新資料頻率。

下表比較了將 Cloud Storage 資料匯入 Vertex AI Search 資料儲存庫的兩種方式。

單次擷取	定期擷取
正式發布 (GA)。	公開預先發布版。
資料必須手動重新整理。	資料會每 1、3 或 5 天自動更新一次。資料無法手動重新整理。
Vertex AI Search 會從 Cloud Storage 中的單一資料夾或檔案建立資料儲存庫。	Vertex AI Search 會建立資料連接器，並將資料儲存庫 (稱為實體資料儲存庫) 與指定的檔案或資料夾建立關聯。每個 Cloud Storage 資料連接器只能有一個實體資料存放區。
您可以先從一個 Cloud Storage 位置擷取資料，然後從另一個位置擷取更多資料，將多個檔案、資料夾和 bucket 的資料合併到一個資料存放區。	由於系統不支援手動匯入資料，實體資料儲存庫中的資料只能來自一個 Cloud Storage 檔案或資料夾。
支援資料來源存取控管。詳情請參閱「資料來源存取控管」。	不支援資料來源存取控管。匯入的資料可能包含存取權控管機制，但系統不會採用這些機制。
您可以使用Google Cloud 控制台或 API 建立資料儲存庫。	您必須使用控制台建立資料連接器及其實體資料儲存庫。
符合 CMEK 規定。	符合 CMEK 規定。

從 Cloud Storage 匯入一次

如要從 Cloud Storage 擷取資料，請按照下列步驟，使用 Google Cloud 控制台或 API 建立資料儲存庫並擷取資料。

匯入資料前，請先參閱「為資料擷取作業做準備」。

控制台

如要透過控制台從 Cloud Storage 值區擷取資料，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
前往「Data Stores」(資料儲存庫) 頁面。
點按「 Create data store」(建立資料儲存庫)。
在「來源」頁面中，選取「Cloud Storage」。
在「選取要匯入的資料夾或檔案」部分，選取「資料夾」或「檔案」。
按一下「瀏覽」，然後選擇已準備好要擷取的資料，再按一下「選取」。或者，直接在 gs:// 欄位中輸入位置。
選取要匯入的資料類型。
按一下「繼續」。
如果只匯入一次結構化資料：
1. 將欄位對應至主要屬性。
2. 如果結構定義缺少重要欄位，請使用「新增欄位」新增。
  
  詳情請參閱「關於自動偵測和編輯」。
3. 按一下「繼續」。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
選用：如果選取非結構化文件，可以為文件選取剖析和分塊選項。如要比較剖析器，請參閱「剖析文件」。如要瞭解如何將文件分塊，請參閱「將文件分塊以供 RAG 使用」。

OCR 剖析器和版面配置剖析器可能會產生額外費用。請參閱「Document AI 功能定價」。

如要選取剖析器，請展開「文件處理選項」，然後指定要使用的剖析器選項。
點選「建立」。
接著需要檢查擷取狀態，請前往「Data stores」(資料儲存庫) 頁面，點按資料儲存庫名稱，即可在相應的「Data」(資料) 頁面查看該儲存庫的詳細資料。「Activity」(活動) 分頁的狀態欄從「In progress」(進行中) 變為「Import completed」(匯入完成) 時，表示擷取作業已完成。

視資料大小而定，擷取作業可能需要數分鐘至數小時才能完成。

REST

如要使用指令列建立資料儲存庫，並從 Cloud Storage 擷取資料，請按照下列步驟操作。

建立資料儲存庫。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
}'
```
注意：產業類別 GENERIC用於為自訂搜尋應用程式建立結構化、非結構化和網站資料儲存庫。

更改下列內容：
- PROJECT_ID：您的 Google Cloud 專案 ID。
- DATA_STORE_ID：要建立的 Vertex AI Search 資料儲存庫 ID。這個 ID 只能包含小寫字母、數字、底線和連字號。
- DATA_STORE_DISPLAY_NAME：要建立的 Vertex AI Search 資料儲存庫顯示名稱。
選用：如果您要上傳非結構化資料，並想設定文件剖析或開啟 RAG 的文件分塊功能，請指定 documentProcessingConfig 物件，並將其納入資料儲存庫建立要求。如果擷取的是掃描的 PDF，建議設定 PDF 適用的 OCR 剖析器。如要瞭解如何設定剖析或分塊選項，請參閱「剖析及分塊文件」。
從 Cloud Storage 匯入資料。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "gcsSource": {
      "inputUris": ["INPUT_FILE_PATTERN_1", "INPUT_FILE_PATTERN_2"],
      "dataSchema": "DATA_SCHEMA",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
    "errorConfig": {
      "gcsPrefix": "ERROR_DIRECTORY"
    }
  }'
```
更改下列內容：
- PROJECT_ID：您的 Google Cloud 專案 ID。
- ：Vertex AI Search 資料儲存庫的 ID。DATA_STORE_ID
- INPUT_FILE_PATTERN：Cloud Storage 中的檔案模式，內含您的文件。
  
  如果是結構化資料或含中繼資料的非結構化資料，輸入檔案模式的範例為 gs://<your-gcs-bucket>/directory/object.json，模式比對一或多個檔案的範例為 gs://<your-gcs-bucket>/directory/*.json。
  
  如為非結構化文件，範例為 gs://<your-gcs-bucket>/directory/*.pdf。符合模式的每個檔案都會成為文件。
  
  如果 <your-gcs-bucket> 不在 PROJECT_ID 下方，您需要為 Cloud Storage 值區授予服務帳戶 service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com「Storage 物件檢視者」權限。舉例來說，如果您要將來源專案「123」的 Cloud Storage 值區匯入目的地專案「456」，請在專案「123」的 Cloud Storage 值區中授予 service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com 權限。
- DATA_SCHEMA：選用。值為 document、custom、csv 和 content。預設值為 document。
  - document：上傳含有中繼資料的非結構化資料，適用於非結構化文件。檔案中的每一行都必須採用下列其中一種格式。您可以定義每份文件的 ID：
    - { "id": "<your-id>", "jsonData": "<JSON string>", "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
    - { "id": "<your-id>", "structData": <JSON object>, "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
  - custom：上傳結構化文件的 JSON 檔案。資料會根據結構定義整理。您可以指定結構定義，否則系統會自動偵測。您可以將文件 JSON 字串直接放在每一行，格式必須一致，Vertex AI Search 會自動為匯入的每份文件產生 ID。
  - content：上傳非結構化文件 (PDF、HTML、DOC、TXT、PPTX)。系統會自動產生每個文件的 ID，也就是以十六進位字串編碼的 SHA256(GCS_URI) 前 128 位元。只要相符的檔案不超過 10 萬個，您就可以指定多個輸入檔案模式。
  - csv：在 CSV 檔案中加入標題列，並將每個標題對應至文件欄位。使用 inputUris 欄位指定 CSV 檔案的路徑。
- ERROR_DIRECTORY：選用。Cloud Storage 目錄，用於存放匯入作業的錯誤資訊，例如 gs://<your-gcs-bucket>/directory/import_errors。Google 建議將這個欄位留空，讓 Vertex AI Search 自動建立暫時目錄。
- RECONCILIATION_MODE：選用。值為 FULL 和 INCREMENTAL。預設值為 INCREMENTAL。指定 INCREMENTAL 會導致系統從 Cloud Storage 遞增式重新整理資料至資料存放區。這會執行 upsert 作業，新增文件並以 ID 相同的更新文件取代現有文件。指定 FULL 會導致資料儲存庫中的文件完全重新基準化。換句話說，系統會將新的和更新的文件新增至資料存放區，並從資料存放區中移除不在 Cloud Storage 中的文件。如果您想自動刪除不再需要的文件，FULL 模式會很有幫助。
- AUTO_GENERATE_IDS：選用。指定是否要自動產生文件 ID。如果設為 true，系統會根據酬載的雜湊值產生文件 ID。請注意，多次匯入時，產生的文件 ID 可能會不一致。如果您在多次匯入時自動產生 ID，Google 強烈建議將 reconciliationMode 設為 FULL，以維持文件 ID 的一致性。
  
  只有在 gcsSource.dataSchema 設為 custom 或 csv 時，才指定 autoGenerateIds。否則，系統會傳回 INVALID_ARGUMENT 錯誤。如未指定 autoGenerateIds 或將其設為 false，則必須指定 idField。否則文件無法匯入。
- ID_FIELD：選用。指定哪些欄位是文件 ID。如果是 Cloud Storage 來源文件，idField 會指定 JSON 欄位中的名稱，這些欄位是文件 ID。舉例來說，如果 {"my_id":"some_uuid"} 是其中一個文件的文件 ID 欄位，請指定 "idField":"my_id"。這會將所有名稱為 "my_id" 的 JSON 欄位識別為文件 ID。
  
  只有在下列情況下才指定這個欄位：(1) gcsSource.dataSchema 設為 custom 或 csv，且 (2) auto_generate_ids 設為 false 或未指定。否則會傳回 INVALID_ARGUMENT 錯誤。
  
  請注意，Cloud Storage JSON 欄位的值必須為字串類型，長度介於 1 到 63 個字元之間，且必須符合 RFC-1034。否則文件無法匯入。
  
  請注意，id_field 指定的 JSON 欄位名稱必須為字串類型，長度介於 1 至 63 個字元之間，且必須符合 RFC-1034。否則文件無法匯入。

C#

詳情請參閱 Vertex AI Search C# API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataStoreServiceClientSnippets
{
    /// <summary>Snippet for CreateDataStore</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataStoreRequestObject()
    {
        // Create client
        DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.Create();
        // Initialize request argument(s)
        CreateDataStoreRequest request = new CreateDataStoreRequest
        {
            ParentAsCollectionName = CollectionName.FromProjectLocationCollection("[PROJECT]", "[LOCATION]", "[COLLECTION]"),
            DataStore = new DataStore(),
            DataStoreId = "",
            CreateAdvancedSiteSearch = false,
            CmekConfigNameAsCmekConfigName = CmekConfigName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            SkipDefaultSchemaCreation = false,
        };
        // Make the request
        Operation<DataStore, CreateDataStoreMetadata> response = dataStoreServiceClient.CreateDataStore(request);

        // Poll until the returned long-running operation is complete
        Operation<DataStore, CreateDataStoreMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataStore result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataStore, CreateDataStoreMetadata> retrievedResponse = dataStoreServiceClient.PollOnceCreateDataStore(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataStore retrievedResult = retrievedResponse.Result;
        }
    }
}

匯入文件

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
            ForceRefreshContent = false,
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

詳情請參閱 Vertex AI Search Go API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDataStoreClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.CreateDataStoreRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.
	}
	op, err := c.CreateDataStore(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

匯入文件


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

詳情請參閱 Vertex AI Search Java API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫

import com.google.cloud.discoveryengine.v1.CollectionName;
import com.google.cloud.discoveryengine.v1.CreateDataStoreRequest;
import com.google.cloud.discoveryengine.v1.DataStore;
import com.google.cloud.discoveryengine.v1.DataStoreServiceClient;

public class SyncCreateDataStore {

  public static void main(String[] args) throws Exception {
    syncCreateDataStore();
  }

  public static void syncCreateDataStore() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.create()) {
      CreateDataStoreRequest request =
          CreateDataStoreRequest.newBuilder()
              .setParent(CollectionName.of("[PROJECT]", "[LOCATION]", "[COLLECTION]").toString())
              .setDataStore(DataStore.newBuilder().build())
              .setDataStoreId("dataStoreId929489618")
              .setCreateAdvancedSiteSearch(true)
              .setSkipDefaultSchemaCreation(true)
              .build();
      DataStore response = dataStoreServiceClient.createDataStoreAsync(request).get();
    }
  }
}

匯入文件

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .setForceRefreshContent(true)
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

詳情請參閱 Vertex AI Search Node.js API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Resource name of the CmekConfig to use for protecting this DataStore.
 */
// const cmekConfigName = 'abc123'
/**
 *  DataStore without CMEK protections. If a default CmekConfig is set for
 *  the project, setting this field will override the default CmekConfig as
 *  well.
 */
// const disableCmek = true
/**
 *  Required. The parent resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}`.
 */
// const parent = 'abc123'
/**
 *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to
 *  create.
 */
// const dataStore = {}
/**
 *  Required. The ID to use for the
 *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become
 *  the final component of the
 *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name.
 *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  standard with a length limit of 63 characters. Otherwise, an
 *  INVALID_ARGUMENT error is returned.
 */
// const dataStoreId = 'abc123'
/**
 *  A boolean flag indicating whether user want to directly create an advanced
 *  data store for site search.
 *  If the data store is not configured as site
 *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will
 *  be ignored.
 */
// const createAdvancedSiteSearch = true
/**
 *  A boolean flag indicating whether to skip the default schema creation for
 *  the data store. Only enable this flag if you are certain that the default
 *  schema is incompatible with your use case.
 *  If set to true, you must manually create a schema for the data store before
 *  any documents can be ingested.
 *  This flag cannot be specified if `data_store.starting_schema` is specified.
 */
// const skipDefaultSchemaCreation = true

// Imports the Discoveryengine library
const {DataStoreServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DataStoreServiceClient();

async function callCreateDataStore() {
  // Construct request
  const request = {
    parent,
    dataStore,
    dataStoreId,
  };

  // Run request
  const [operation] = await discoveryengineClient.createDataStore(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataStore();

匯入文件

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'
/**
 *  Optional. Whether to force refresh the unstructured content of the
 *  documents.
 *  If set to `true`, the content part of the documents will be refreshed
 *  regardless of the update status of the referencing content.
 */
// const forceRefreshContent = true

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

詳情請參閱 Vertex AI Search Python API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

匯入文件

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"

# Examples:
# - Unstructured documents
#   - `gs://bucket/directory/file.pdf`
#   - `gs://bucket/directory/*.pdf`
# - Unstructured documents with JSONL Metadata
#   - `gs://bucket/directory/file.json`
# - Unstructured documents with CSV Metadata
#   - `gs://bucket/directory/file.csv`
# gcs_uri = "YOUR_GCS_PATH"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    gcs_source=discoveryengine.GcsSource(
        # Multiple URIs are supported
        input_uris=[gcs_uri],
        # Options:
        # - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)
        # - `custom` - Unstructured documents with custom JSONL metadata
        # - `document` - Structured documents in the discoveryengine.Document format.
        # - `csv` - Unstructured documents with CSV metadata
        data_schema="content",
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Ruby

詳情請參閱 Vertex AI Search Ruby API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫

require "google/cloud/discovery_engine/v1"

##
# Snippet for the create_data_store call in the DataStoreService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.
#
def create_data_store
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new

  # Call the create_data_store method.
  result = client.create_data_store request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

匯入文件

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

透過定期同步功能連線至 Cloud Storage

匯入資料前，請先參閱「為資料擷取作業做準備」。

以下程序說明如何建立資料連接器，將 Cloud Storage 位置與 Vertex AI Search 資料連接器建立關聯，以及如何指定該位置中的資料夾或檔案，做為要建立的資料存放區。資料連接器的子項資料儲存庫稱為「實體」資料儲存庫。

資料會定期同步至實體資料儲存庫。您可以指定每天、每三天或每五天同步一次。

控制台

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
前往「Data Stores」(資料儲存庫) 頁面。
按一下「Create data store」(建立資料儲存庫)。
在「來源」頁面中，選取「Cloud Storage」。
選取要匯入的資料類型。
按一下「週期性」。
選取「Synchronization frequency」(同步處理頻率)，設定 Vertex AI Search 連接器與 Cloud Storage 位置同步的頻率。日後可以變更頻率。
在「選取要匯入的資料夾或檔案」部分，選取「資料夾」或「檔案」。
按一下「瀏覽」，然後選擇已準備好要擷取的資料，再按一下「選取」。或者，直接在 gs:// 欄位中輸入位置。
按一下「繼續」。
選擇資料連接器的區域。
輸入資料連接器的名稱。
選用：如果選取非結構化文件，可以為文件選取剖析和分塊選項。如要比較剖析器，請參閱「剖析文件」。如要瞭解如何將文件分塊，請參閱「將文件分塊以供 RAG 使用」。

OCR 剖析器和版面配置剖析器可能會產生額外費用。請參閱「Document AI 功能定價」。

如要選取剖析器，請展開「文件處理選項」，然後指定要使用的剖析器選項。
點選「建立」。

您已建立資料連接器，該連接器會定期與 Cloud Storage 位置同步處理資料。您也建立了名為 gcs_store 的實體資料儲存庫。
如要檢查擷取狀態，請前往「Data Stores」(資料儲存庫) 頁面，點按資料連接器名稱，即可在相應的「Data」(資料) 頁面查看該資料連接器的詳細資料。

「資料擷取活動」分頁標籤。「資料擷取活動」分頁的狀態欄從「進行中」變為「成功」時，表示首次擷取作業已完成。

視資料大小而定，可能需要數分鐘至數小時才能擷取完畢。

設定資料來源並首次匯入資料後，系統會按照您在設定期間選取的頻率，從該來源同步資料。建立資料連接器後約一小時，系統就會進行首次同步。下一次同步處理會在 24 小時、72 小時或 120 小時後進行。

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

連結 Google 雲端硬碟

Vertex AI Search 可透過資料聯合功能搜尋 Google 雲端硬碟中的資料，直接從指定資料來源擷取資訊。由於資料不會複製到 Vertex AI Search 索引，因此不必擔心資料儲存問題。

事前準備

您登入 Google Cloud 控制台的帳戶，必須是待連結 Google 雲端硬碟執行個體所用的帳戶。Vertex AI Search 會使用 Google Workspace 客戶 ID 連線至 Google 雲端硬碟。

請務必設定識別資訊提供者，以強制執行資料來源存取控管，並保護 Vertex AI Search 中的資料。

確認所有文件都能存取，像是將文件放在網域擁有的共用雲端硬碟中，或是將擁有權授予網域中的使用者。
啟用其他 Google 產品的 Google Workspace 智慧功能，將 Google 雲端硬碟資料連結至 Vertex AI Search。如需相關說明，請參閱「開啟或關閉 Google Workspace 智慧功能」。

如果您設有安全控管措施，請留意這些措施對 Google 雲端硬碟資料的限制，如下表所示：

安全控管措施	注意事項
資料落地 (DRZ)	Vertex AI Search 僅保證資料會儲存在 Google Cloud。如要瞭解資料落地和 Google 雲端硬碟，請參閱 Google Workspace 法規遵循指南和文件，例如「選擇儲存資料的區域」和「數位主權」。
客戶自行管理的加密金鑰 (CMEK)	您的金鑰只會加密 Google Cloud中的資料。Cloud Key Management Service 控制選項不適用於儲存在 Google 雲端硬碟的資料。
資料存取透明化控管機制	「資料存取透明化控管機制」會記錄 Google 人員對 Google Cloud 專案採取的動作。您也需要檢查 Google Workspace 建立的資料存取透明化控管機制記錄。詳情請參閱 Google Workspace 管理員說明文件中的「資料存取透明化控管機制記錄事件」。

建立 Google 雲端硬碟資料儲存庫

主控台

如要透過控制台將 Google 雲端硬碟資料轉變成可供搜尋，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
點按導覽選單中的「Data Stores」(資料儲存庫)。
點按「 Create Data Store」(建立資料儲存庫)。
在「Select a data source」(選取資料來源) 頁面中，選取「Google Drive」(Google 雲端硬碟)。
指定資料儲存庫的雲端硬碟來源
- 「All」(全部)：將整個雲端硬碟新增至資料儲存庫。
- 「Specific shared drive(s)」(特定共用雲端硬碟)：新增共用雲端硬碟的資料夾 ID。
- 「Specific shared folder(s)」(特定共用資料夾)：新增共用資料夾的 ID。
前往共用雲端硬碟或資料夾，然後複製網址中的 ID，即可取得共用雲端硬碟的資料夾 ID 或特定資料夾 ID。網址格式如下：https://drive.google.com/corp/drive/folders/ID。

例如 https://drive.google.com/corp/drive/folders/123456789012345678901。
點選「Continue」(繼續)。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
選用步驟：點按「Generative AI options」(生成式 AI 選項)，然後選取「Exclude from generative AI features」(排除在生成式 AI 功能之外)，即可排除這個資料儲存庫中的資料，這樣當您使用應用程式查詢資料時，系統就不會將資料用於生成式 AI 內容。
點選「建立」。

錯誤訊息

下表是使用這個 Google 資料來源時可能看見的錯誤訊息，包括 HTTP 錯誤代碼和建議的疑難排解步驟。

錯誤代碼	錯誤訊息	說明	疑難排解
403 (權限遭拒)	Google Workspace 資料儲存庫不支援使用服務帳戶憑證進行搜尋。	系統正在搜尋的引擎有 Google Workspace 資料儲存庫，且傳遞的憑證屬於服務帳戶。不支援使用服務帳戶憑證在 Google Workspace 資料儲存庫中搜尋。	透過使用者憑證呼叫搜尋，或從引擎中移除 Google Workspace 資料儲存庫。
403 (權限遭拒)	Google Workspace 資料儲存庫不支援個人帳戶。	系統透過個人帳戶 (@gmail.com) 憑證呼叫搜尋功能，但 Google Workspace 資料儲存庫不支援這類憑證。	從引擎中移除 Google Workspace 資料儲存庫，或使用受管理 Google 帳戶。
403 (權限遭拒)	資料儲存庫的客戶 ID 不相符	只有與 Google Workspace 資料儲存庫屬於同一組織的使用者，才能進行搜尋。	從引擎中移除 Google Workspace 資料儲存庫，或是與支援團隊聯絡 (如果使用者和 Google Workspace 資料儲存庫需位於不同組織)。
400 (引數無效)	引擎不得同時包含預設和共用的 Google 雲端硬碟資料儲存庫。	您無法將包含所有雲端硬碟 (預設) 的資料儲存庫，以及包含特定共用雲端硬碟的資料儲存庫，連結至同一個應用程式。	如要將新的 Google 雲端硬碟資料來源連結至應用程式，請先取消連結不需要的資料儲存庫，然後新增要使用的資料儲存庫。

疑難排解

如果搜尋結果未顯示您要找的檔案，可能是因為下列搜尋索引限制的影響：

系統只能從檔案中擷取 1 MB 的文字和格式資料，以供搜尋。
大多數檔案類型的大小不得超過 10 MB，例外狀況如下：
- XLSX 檔案 (.xlsx) 不得超過 20 MB。
- PDF 檔案 (.pdf) 不得超過 30 MB。
- 文字檔 (.txt) 不得超過 100 MB。
注意：超出大小限制的檔案無法搜尋，也不會顯示在搜尋結果中。
PDF 檔案的光學字元辨識功能最多只能處理 80 頁。系統不會為超過 50 MB 或 80 頁的 PDF 建立索引，且無法搜尋超過 1 MB 索引限制的關鍵字。

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後取得搜尋結果，請參閱「取得搜尋結果」。

連結 Gmail

按照下列步驟，在 Google Cloud 控制台中建立連結至 Gmail 的資料儲存庫。連結資料儲存庫後，即可將資料儲存庫附加至搜尋應用程式，用來搜尋 Gmail 資料。

事前準備

您登入 Google Cloud 控制台的帳戶，必須是待連結 Google Workspace 執行個體所用的帳戶。Vertex AI Search 會使用 Google Workspace 客戶 ID 連線至 Gmail。

請務必設定識別資訊提供者，以強制執行資料來源存取控管，並保護 Vertex AI Search 中的資料。

限制

如果您設有安全控管措施，請留意這些措施對 Gmail 資料的限制，如下表所示：

安全控管措施	注意事項
資料落地 (DRZ)	Vertex AI Search 僅保證資料會儲存在 Google Cloud。如要瞭解資料落地和 Gmail，請參閱 Google Workspace 法規遵循指南和文件，例如「選擇儲存資料的區域」和「數位主權」。
客戶自行管理的加密金鑰 (CMEK)	您的金鑰只會加密 Google Cloud中的資料。Cloud Key Management Service 控制選項不適用於儲存在 Gmail 的資料。
資料存取透明化控管機制	「資料存取透明化控管機制」會記錄 Google 人員對 Google Cloud 專案採取的動作。您也需要檢查 Google Workspace 建立的資料存取透明化控管機制記錄。詳情請參閱 Google Workspace 管理員說明文件中的「資料存取透明化控管機制記錄事件」。

建立 Gmail 資料儲存庫

主控台

如要透過控制台將 Gmail 資料轉變成可供搜尋，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
點按導覽選單中的「Data Stores」(資料儲存庫)。
點按「 Create Data Store」(建立資料儲存庫)。
在「Select a data source」(選取資料來源) 頁面中，選取「Gmail」。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
點選「建立」。
按照「建立搜尋應用程式」中的步驟操作，將建立的資料儲存庫附加至 Vertex AI Search 應用程式。

錯誤訊息

下表是使用這個 Google 資料來源時可能看見的錯誤訊息，包括 HTTP 錯誤代碼和建議的疑難排解步驟。

錯誤代碼	錯誤訊息	說明	疑難排解
403 (權限遭拒)	Google Workspace 資料儲存庫不支援使用服務帳戶憑證進行搜尋。	系統正在搜尋的引擎有 Google Workspace 資料儲存庫，且傳遞的憑證屬於服務帳戶。不支援使用服務帳戶憑證在 Google Workspace 資料儲存庫中搜尋。	透過使用者憑證呼叫搜尋，或從引擎中移除 Google Workspace 資料儲存庫。
403 (權限遭拒)	Google Workspace 資料儲存庫不支援個人帳戶。	系統透過個人帳戶 (@gmail.com) 憑證呼叫搜尋功能，但 Google Workspace 資料儲存庫不支援這類憑證。	從引擎中移除 Google Workspace 資料儲存庫，或使用受管理 Google 帳戶。
403 (權限遭拒)	資料儲存庫的客戶 ID 不相符	只有與 Google Workspace 資料儲存庫屬於同一組織的使用者，才能進行搜尋。	從引擎中移除 Google Workspace 資料儲存庫，或是與支援團隊聯絡 (如果使用者和 Google Workspace 資料儲存庫需位於不同組織)。
400 (引數無效)	引擎不得同時包含預設和共用的 Google 雲端硬碟資料儲存庫。	您無法將包含所有雲端硬碟 (預設) 的資料儲存庫，以及包含特定共用雲端硬碟的資料儲存庫，連結至同一個應用程式。	如要將新的 Google 雲端硬碟資料來源連結至應用程式，請先取消連結不需要的資料儲存庫，然後新增要使用的資料儲存庫。

後續步驟

如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「預覽搜尋結果」。

連結至 Google 協作平台

如要搜尋 Google 協作平台中的資料，請按照下列步驟，使用 Google Cloud 控制台建立連接器。

事前準備：

您登入 Google Cloud 控制台的帳戶，必須是待連結 Google Workspace 執行個體所用的帳戶。Vertex AI Search 會使用 Google Workspace 客戶 ID 連結至 Google 協作平台。
請務必設定識別資訊提供者，以強制執行資料來源存取控管，並保護 Vertex AI Search 中的資料。

如果您設有安全控管措施，請留意這些措施對 Google 協作平台資料的限制，如下表所示：

安全控管措施	注意事項
資料落地 (DRZ)	Vertex AI Search 僅保證資料會儲存在 Google Cloud。如要瞭解資料落地和 Google 協作平台，請參閱 Google Workspace 法規遵循指南和文件，例如「選擇儲存資料的區域」和「數位主權」。
客戶自行管理的加密金鑰 (CMEK)	您的金鑰只會加密 Google Cloud中的資料。Cloud Key Management Service 控制選項不適用於儲存在 Google 協作平台的資料。
資料存取透明化控管機制	「資料存取透明化控管機制」會記錄 Google 人員對 Google Cloud 專案採取的動作。您也需要檢查 Google Workspace 建立的資料存取透明化控管機制記錄。詳情請參閱 Google Workspace 管理員說明文件中的「資料存取透明化控管機制記錄事件」。

控制台

如要透過控制台將 Google 協作平台資料轉變成可供搜尋，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
前往「Data Stores」(資料儲存庫) 頁面。
按一下「New data store」(新增資料儲存庫)。
在「來源」頁面中，選取「Google 協作平台」。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
點選「建立」。

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

連結 Google 日曆

如要搜尋 Google 日曆中的資料，請按照下列步驟，使用 Google Cloud 控制台建立資料儲存庫。

事前準備

您登入 Google Cloud 控制台的帳戶，必須是待連結 Google Workspace 執行個體所用的帳戶。Vertex AI Search 會使用 Google Workspace 客戶 ID 連結至 Google 日曆。

請務必設定識別資訊提供者，以強制執行資料來源存取控管，並保護 Vertex AI Search 中的資料。

如果您設有安全控管措施，請留意這些措施對 Google 日曆資料的限制，如下表所示：

安全控管措施	注意事項
資料落地 (DRZ)	Vertex AI Search 僅保證資料會儲存在 Google Cloud。如要瞭解資料落地和 Google 日曆，請參閱 Google Workspace 法規遵循指南和文件，例如「選擇儲存資料的區域」和「數位主權」。
客戶自行管理的加密金鑰 (CMEK)	您的金鑰只會加密 Google Cloud中的資料。Cloud Key Management Service 控制選項不適用於儲存在 Google 日曆的資料。
資料存取透明化控管機制	「資料存取透明化控管機制」會記錄 Google 人員對 Google Cloud 專案採取的動作。您也需要檢查 Google Workspace 建立的資料存取透明化控管機制記錄。詳情請參閱 Google Workspace 管理員說明文件中的「資料存取透明化控管機制記錄事件」。

建立 Google 日曆資料儲存庫

如要透過控制台將 Google 日曆資料轉變成可供搜尋，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
點按導覽選單中的「Data Stores」(資料儲存庫)。
點按「 Create Data Store」(建立資料儲存庫)。
在「Select a data source」(選取資料來源) 頁面中，選取「Google 日曆」。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
點選「建立」。

錯誤訊息

下表是使用這個 Google 資料來源時可能看見的錯誤訊息，包括 HTTP 錯誤代碼和建議的疑難排解步驟。

錯誤代碼	錯誤訊息	說明	疑難排解
403 (權限遭拒)	Google Workspace 資料儲存庫不支援使用服務帳戶憑證進行搜尋。	系統正在搜尋的引擎有 Google Workspace 資料儲存庫，且傳遞的憑證屬於服務帳戶。不支援使用服務帳戶憑證在 Google Workspace 資料儲存庫中搜尋。	透過使用者憑證呼叫搜尋，或從引擎中移除 Google Workspace 資料儲存庫。
403 (權限遭拒)	Google Workspace 資料儲存庫不支援個人帳戶。	系統透過個人帳戶 (@gmail.com) 憑證呼叫搜尋功能，但 Google Workspace 資料儲存庫不支援這類憑證。	從引擎中移除 Google Workspace 資料儲存庫，或使用受管理 Google 帳戶。
403 (權限遭拒)	資料儲存庫的客戶 ID 不相符	只有與 Google Workspace 資料儲存庫屬於同一組織的使用者，才能進行搜尋。	從引擎中移除 Google Workspace 資料儲存庫，或是與支援團隊聯絡 (如果使用者和 Google Workspace 資料儲存庫需位於不同組織)。
400 (引數無效)	引擎不得同時包含預設和共用的 Google 雲端硬碟資料儲存庫。	您無法將包含所有雲端硬碟 (預設) 的資料儲存庫，以及包含特定共用雲端硬碟的資料儲存庫，連結至同一個應用程式。	如要將新的 Google 雲端硬碟資料來源連結至應用程式，請先取消連結不需要的資料儲存庫，然後新增要使用的資料儲存庫。

後續步驟

如要將資料儲存庫連結至應用程式，請建立應用程式，然後按照「建立搜尋應用程式」的操作說明，選取資料儲存庫。
如要在設定應用程式和資料儲存庫後取得搜尋結果，請參閱「取得搜尋結果」。

連結至 Google 網路論壇

如要搜尋 Google 討論群組中的資料，請按照下列步驟，使用 Google Cloud 控制台建立連接器。

事前準備：

您登入 Google Cloud 控制台的帳戶，必須是待連結 Google Workspace 執行個體所用的帳戶。Vertex AI Search 會使用 Google Workspace 客戶 ID 連結至 Google 網路論壇。
請務必設定識別資訊提供者，以強制執行資料來源存取控管，並保護 Vertex AI Search 中的資料。

如果您設有安全控管措施，請留意這些措施對 Google 網路論壇資料的限制，如下表所示：

安全控管措施	注意事項
資料落地 (DRZ)	Vertex AI Search 僅保證資料會儲存在 Google Cloud。如要瞭解資料落地和 Google 協作平台，請參閱 Google Workspace 法規遵循指南和文件，例如「選擇儲存資料的區域」和「數位主權」。
客戶自行管理的加密金鑰 (CMEK)	您的金鑰只會加密 Google Cloud中的資料。Cloud Key Management Service 控制選項不適用於儲存在 Google Groups 的資料。
資料存取透明化控管機制	「資料存取透明化控管機制」會記錄 Google 人員對 Google Cloud 專案採取的動作。您也需要檢查 Google Workspace 建立的資料存取透明化控管機制記錄。詳情請參閱 Google Workspace 管理員說明文件中的「資料存取透明化控管機制記錄事件」。

控制台

如要透過控制台將 Google 網路論壇資料轉變成可供搜尋，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
前往「Data Stores」(資料儲存庫) 頁面。
按一下「New data store」(新增資料儲存庫)。
在「來源」頁面中，選取「Google 群組」。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
點按「Create」(建立)。視資料大小而定，擷取作業可能需要數分鐘至數小時才能完成。請至少等待一小時，再使用資料存放區進行搜尋。

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

從 Cloud SQL 匯入

如要從 Cloud SQL 擷取資料，請按照下列步驟設定 Cloud SQL 存取權、建立資料儲存庫，以及擷取資料。

設定 Cloud SQL 執行個體的暫存值區存取權

從 Cloud SQL 擷取資料時，資料會先暫存至 Cloud Storage 值區。請按照下列步驟，授予 Cloud SQL 執行個體 Cloud Storage 值區的存取權。

前往 Google Cloud 控制台的「SQL」頁面。

SQL
按一下要匯入資料的 Cloud SQL 執行個體。
複製執行個體服務帳戶的 ID，看起來像是電子郵件地址，例如 p9876-abcd33f@gcp-sa-cloud-sql.iam.gserviceaccount.com。
前往「IAM & Admin」(IAM 與管理) 頁面。

IAM 與管理
按一下「授予存取權」。
在「新增主體」中輸入執行個體的服務帳戶 ID，然後選取「Cloud Storage」>「Storage 管理員」角色。
按一下 [儲存]。

下一個：

如果 Cloud SQL 資料與 Vertex AI Search 位於相同專案：請前往「從 Cloud SQL 匯入資料」。
如果 Cloud SQL 資料與 Vertex AI Search 專案位於不同專案：請參閱從其他專案設定 Cloud SQL 存取權。

設定從其他專案存取 Cloud SQL

如要授予 Vertex AI Search 存取其他專案中 Cloud SQL 資料的權限，請按照下列步驟操作：

將下列 PROJECT_NUMBER 變數替換為 Vertex AI Search 專案編號，然後複製程式碼區塊的內容。這是您的 Vertex AI Search 服務帳戶 ID：
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com`
```
前往「IAM & Admin」(IAM 與管理) 頁面。

IAM 與管理
在「IAM 與管理」頁面中切換至 Cloud SQL 專案，然後按一下「授予存取權」。
在「新增主體」部分，輸入服務帳戶的 ID，然後選取「Cloud SQL」>「Cloud SQL 檢視者」角色。
按一下 [儲存]。

接著，請參閱從 Cloud SQL 匯入資料。

從 Cloud SQL 匯入資料

控制台

如要透過控制台擷取 Cloud SQL 的資料，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
前往「Data Stores」(資料儲存庫) 頁面。
按一下「New data store」(新增資料儲存庫)。
在「來源」頁面中，選取「Cloud SQL」。
指定要匯入資料的專案 ID、執行個體 ID、資料庫 ID 和資料表 ID。
按一下「瀏覽」，選擇要將資料匯出至哪個 Cloud Storage 中間位置，然後按一下「選取」。你也可以直接在「gs://」欄位中輸入位置。
選取是否要啟用無伺服器匯出功能。無伺服器匯出功能會產生額外費用。如要瞭解無伺服器匯出功能，請參閱 Cloud SQL 說明文件中的「將匯出作業對效能的影響降到最低」一文。
按一下「繼續」。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
點選「建立」。
接著需要檢查擷取狀態，請前往「Data stores」(資料儲存庫) 頁面，點按資料儲存庫名稱，即可在相應的「Data」(資料) 頁面查看該儲存庫的詳細資料。「Activity」(活動) 分頁的狀態欄從「In progress」(進行中) 變為「Import completed」(匯入完成) 時，表示擷取作業已完成。

視資料大小而定，擷取作業可能需要數分鐘至數小時才能完成。

REST

如要使用指令列建立資料儲存庫，並從 Cloud SQL 擷取資料，請按照下列步驟操作：

建立資料儲存庫。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'

更改下列內容：

PROJECT_ID：專案 ID。
DATA_STORE_ID：資料儲存庫的 ID。ID 只能包含小寫英文字母、數字、底線和連字號。
DISPLAY_NAME：資料儲存庫的顯示名稱。這可能會顯示在 Google Cloud 控制台。

從 Cloud SQL 匯入資料。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "cloudSqlSource": {
      "projectId": "SQL_PROJECT_ID",
      "instanceId": "INSTANCE_ID",
      "databaseId": "DATABASE_ID",
      "tableId": "TABLE_ID",
      "gcsStagingDir": "STAGING_DIRECTORY"
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
更改下列內容：
- PROJECT_ID：Vertex AI Search 專案的 ID。
- DATA_STORE_ID：資料儲存庫的 ID。ID 只能包含小寫英文字母、數字、底線和連字號。
- SQL_PROJECT_ID：Cloud SQL 專案的 ID。
- INSTANCE_ID：Cloud SQL 執行個體的 ID。
- DATABASE_ID：Cloud SQL 資料庫的 ID。
- TABLE_ID：Cloud SQL 資料表的 ID。
- STAGING_DIRECTORY：選用。Cloud Storage 目錄，例如 gs://<your-gcs-bucket>/directory/import_errors。
- RECONCILIATION_MODE：選用。值為 FULL 和 INCREMENTAL。預設值為 INCREMENTAL。指定 INCREMENTAL 會導致資料從 Cloud SQL 遞增重新整理至資料存放區。這會執行 upsert 作業，新增文件並以 ID 相同的更新文件取代現有文件。指定 FULL 會導致資料儲存庫中的文件完全重新基準化。換句話說，系統會將新文件和更新的文件新增至資料存放區，並從資料存放區中移除不在 Cloud SQL 中的文件。如果您想自動刪除不再需要的文件，可以啟用 FULL 模式。

Python

詳情請參閱 Vertex AI Search Python API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

匯入文件

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# sql_project_id = "YOUR_SQL_PROJECT_ID"
# sql_instance_id = "YOUR_SQL_INSTANCE_ID"
# sql_database_id = "YOUR_SQL_DATABASE_ID"
# sql_table_id = "YOUR_SQL_TABLE_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    cloud_sql_source=discoveryengine.CloudSqlSource(
        project_id=sql_project_id,
        instance_id=sql_instance_id,
        database_id=sql_database_id,
        table_id=sql_table_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

從 Spanner 匯入

如要從 Spanner 擷取資料，請按照下列步驟，使用 Google Cloud 控制台或 API 建立資料儲存庫並擷取資料。

從其他專案設定 Spanner 存取權

如果 Spanner 資料與 Vertex AI Search 位於同一個專案，請跳至「從 Spanner 匯入資料」。

如要授予 Vertex AI Search 存取其他專案中 Spanner 資料的權限，請按照下列步驟操作：

將下列 PROJECT_NUMBER 變數替換為 Vertex AI Search 專案編號，然後複製這個程式碼區塊的內容。這是您的 Vertex AI Search 服務帳戶 ID：
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
```
前往「IAM & Admin」(IAM 與管理) 頁面。

IAM 與管理
在「IAM 與管理」頁面中切換至 Spanner 專案，然後按一下「授予存取權」。
在「新增主體」中，輸入服務帳戶的 ID，然後選取下列其中一個選項：
- 如果匯入期間不會使用 Data Boost，請選取「Cloud Spanner」>「Cloud Spanner Database Reader」角色。
- 如要在匯入期間使用 Data Boost，請選取「Cloud Spanner」>「Cloud Spanner 資料庫管理員」角色，或是具有「Cloud Spanner 資料庫讀取者」和「spanner.databases.useDataBoost」權限的自訂角色。如要瞭解 Data Boost，請參閱 Spanner 說明文件中的「Data Boost 總覽」。
按一下 [儲存]。

接著，請參閱「從 Spanner 匯入資料」。

從 Spanner 匯入資料

控制台

如要使用控制台擷取 Spanner 的資料，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
前往「Data Stores」(資料儲存庫) 頁面。
按一下「New data store」(新增資料儲存庫)。
在「來源」頁面中，選取「Cloud Spanner」。
指定要匯入資料的專案 ID、執行個體 ID、資料庫 ID 和資料表 ID。
選取是否要開啟「資料加速」功能。如要瞭解 Data Boost，請參閱 Spanner 說明文件中的「Data Boost 總覽」。
按一下「繼續」。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
點選「建立」。
接著需要檢查擷取狀態，請前往「Data stores」(資料儲存庫) 頁面，點按資料儲存庫名稱，即可在相應的「Data」(資料) 頁面查看該儲存庫的詳細資料。「Activity」(活動) 分頁的狀態欄從「In progress」(進行中) 變為「Import completed」(匯入完成) 時，表示擷取作業已完成。

視資料大小而定，擷取作業可能需要數分鐘至數小時才能完成。

REST

如要使用指令列建立資料儲存庫，並從 Spanner 擷取資料，請按照下列步驟操作：

建立資料儲存庫。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
}'

更改下列內容：

PROJECT_ID：Vertex AI Search 專案的 ID。
DATA_STORE_ID：資料儲存庫的 ID。ID 只能包含小寫英文字母、數字、底線和連字號。
DISPLAY_NAME：資料儲存庫的顯示名稱。這可能會顯示在 Google Cloud 控制台。

從 Spanner 匯入資料。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "cloudSpannerSource": {
      "projectId": "SPANNER_PROJECT_ID",
      "instanceId": "INSTANCE_ID",
      "databaseId": "DATABASE_ID",
      "tableId": "TABLE_ID",
      "enableDataBoost": "DATA_BOOST_BOOLEAN"
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
更改下列內容：
- PROJECT_ID：Vertex AI Search 專案的 ID。
- DATA_STORE_ID：資料儲存庫的 ID。
- SPANNER_PROJECT_ID：Spanner 專案的 ID。
- INSTANCE_ID：Spanner 執行個體的 ID。
- DATABASE_ID：Spanner 資料庫的 ID。
- TABLE_ID：Spanner 資料表的 ID。
- DATA_BOOST_BOOLEAN：選用。是否要開啟「數據用量提升」功能。如要瞭解 Data Boost，請參閱 Spanner 說明文件中的「Data Boost 總覽」。
- RECONCILIATION_MODE：選用。值為 FULL 和 INCREMENTAL。預設值為 INCREMENTAL。指定 INCREMENTAL 會導致系統從 Spanner 遞增更新資料至資料儲存庫。這會執行 upsert 作業，新增文件並以 ID 相同的更新文件取代現有文件。指定 FULL 會導致資料儲存庫中的文件完全重新建立基準。換句話說，系統會將新的和更新的文件新增至資料儲存庫，並從資料儲存庫中移除不在 Spanner 中的文件。如果您想自動刪除不再需要的檔案，FULL 模式會很有幫助。
- AUTO_GENERATE_IDS：選用。指定是否要自動產生文件 ID。如果設為 true，系統會根據酬載的雜湊值產生文件 ID。請注意，多次匯入時，產生的文件 ID 可能不一致。如果您在多次匯入時自動產生 ID，Google 強烈建議將 reconciliationMode 設為 FULL，以維持文件 ID 的一致性。
- ID_FIELD：選用。指定哪些欄位是文件 ID。

Python

詳情請參閱 Vertex AI Search Python API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

匯入文件

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# spanner_project_id = "YOUR_SPANNER_PROJECT_ID"
# spanner_instance_id = "YOUR_SPANNER_INSTANCE_ID"
# spanner_database_id = "YOUR_SPANNER_DATABASE_ID"
# spanner_table_id = "YOUR_SPANNER_TABLE_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    spanner_source=discoveryengine.SpannerSource(
        project_id=spanner_project_id,
        instance_id=spanner_instance_id,
        database_id=spanner_database_id,
        table_id=spanner_table_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

從 Firestore 匯入

如要從 Firestore 擷取資料，請按照下列步驟，使用 Google Cloud 控制台或 API 建立資料儲存庫並擷取資料。

如果 Firestore 資料與 Vertex AI Search 位於同一個專案，請前往「從 Firestore 匯入資料」。

如果 Firestore 資料與 Vertex AI Search 專案位於不同專案，請參閱「設定 Firestore 存取權」。

從其他專案設定 Firestore 存取權

如要授予 Vertex AI Search 存取其他專案中 Firestore 資料的權限，請按照下列步驟操作：

將下列 PROJECT_NUMBER 變數替換為 Vertex AI Search 專案編號，然後複製這個程式碼區塊的內容。這是您的 Vertex AI Search 服務帳戶 ID：
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
```
前往「IAM & Admin」(IAM 與管理) 頁面。

IAM 與管理
在「IAM 與管理」頁面中切換至 Firestore 專案，然後按一下「授予存取權」。
在「新增主體」中，輸入執行個體的服務帳戶 ID，然後選取「Datastore」>「Cloud Datastore Import Export Admin」角色。
按一下 [儲存]。
切換回 Vertex AI Search 專案。

接著，請參閱「從 Firestore 匯入資料」。

從 Firestore 匯入資料

控制台

如要透過控制台擷取 Firestore 的資料，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
前往「Data Stores」(資料儲存庫) 頁面。
按一下「New data store」(新增資料儲存庫)。
在「來源」頁面中，選取「Firestore」。
指定要匯入資料的專案 ID、資料庫 ID 和集合 ID。
按一下「繼續」。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
點選「建立」。
接著需要檢查擷取狀態，請前往「Data stores」(資料儲存庫) 頁面，點按資料儲存庫名稱，即可在相應的「Data」(資料) 頁面查看該儲存庫的詳細資料。「Activity」(活動) 分頁的狀態欄從「In progress」(進行中) 變為「Import completed」(匯入完成) 時，表示擷取作業已完成。

視資料大小而定，擷取作業可能需要數分鐘至數小時才能完成。

REST

如要使用指令列建立資料儲存庫，並從 Firestore 擷取資料，請按照下列步驟操作：

建立資料儲存庫。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'

更改下列內容：

PROJECT_ID：專案 ID。
DATA_STORE_ID：資料儲存庫的 ID。ID 只能包含小寫英文字母、數字、底線和連字號。
DISPLAY_NAME：資料儲存庫的顯示名稱。這可能會顯示在 Google Cloud 控制台。

從 Firestore 匯入資料。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "firestoreSource": {
      "projectId": "FIRESTORE_PROJECT_ID",
      "databaseId": "DATABASE_ID",
      "collectionId": "COLLECTION_ID",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
更改下列內容：
- PROJECT_ID：Vertex AI Search 專案的 ID。
- DATA_STORE_ID：資料儲存庫的 ID。ID 只能包含小寫英文字母、數字、底線和連字號。
- FIRESTORE_PROJECT_ID：Firestore 專案的 ID。
- DATABASE_ID：Firestore 資料庫的 ID。
- COLLECTION_ID：Firestore 集合的 ID。
- RECONCILIATION_MODE：選用。值為 FULL 和 INCREMENTAL。預設值為 INCREMENTAL。指定 INCREMENTAL 會導致系統從 Firestore 增量重新整理資料至資料儲存庫。這會執行 upsert 作業，新增文件並以 ID 相同的更新文件取代現有文件。指定 FULL 會導致資料儲存庫中的文件完全重新基準化。換句話說，系統會將新的和更新的文件新增至資料儲存庫，並從資料儲存庫中移除不在 Firestore 中的文件。如果您想自動刪除不再需要的文件，FULL 模式會很有幫助。
- AUTO_GENERATE_IDS：選用。指定是否要自動產生文件 ID。如果設為 true，系統會根據酬載的雜湊值產生文件 ID。請注意，多次匯入時，產生的文件 ID 可能不一致。如果您在多次匯入時自動產生 ID，Google 強烈建議將 reconciliationMode 設為 FULL，以維持文件 ID 的一致性。
- ID_FIELD：選用。指定哪些欄位是文件 ID。

Python

詳情請參閱 Vertex AI Search Python API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

匯入文件

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# firestore_project_id = "YOUR_FIRESTORE_PROJECT_ID"
# firestore_database_id = "YOUR_FIRESTORE_DATABASE_ID"
# firestore_collection_id = "YOUR_FIRESTORE_COLLECTION_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    firestore_source=discoveryengine.FirestoreSource(
        project_id=firestore_project_id,
        database_id=firestore_database_id,
        collection_id=firestore_collection_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

從 Bigtable 匯入

如要從 Bigtable 擷取資料，請按照下列步驟建立資料儲存庫，並使用 API 擷取資料。

設定 Bigtable 存取權

如要授予 Vertex AI Search 存取其他專案中 Bigtable 資料的權限，請按照下列步驟操作：

將下列 PROJECT_NUMBER 變數替換為 Vertex AI Search 專案編號，然後複製這個程式碼區塊的內容。這是您的 Vertex AI Search 服務帳戶 ID：
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com`
```
前往「IAM & Admin」(IAM 與管理) 頁面。

IAM 與管理
在「IAM 與管理」頁面中切換至 Bigtable 專案，然後按一下「授予存取權」。
在「新增主體」中，輸入執行個體的服務帳戶 ID，然後選取「Bigtable」>「Bigtable Reader」角色。
按一下 [儲存]。
切換回 Vertex AI Search 專案。

接著，請前往「從 Bigtable 匯入資料」。

從 Bigtable 匯入資料

REST

如要使用指令列建立資料儲存庫，並從 Bigtable 擷取資料，請按照下列步驟操作：

建立資料儲存庫。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'

更改下列內容：

PROJECT_ID：專案 ID。
DATA_STORE_ID：資料儲存庫的 ID。ID 只能包含小寫英文字母、數字、底線和連字號。
DISPLAY_NAME：資料儲存庫的顯示名稱。這可能會顯示在 Google Cloud 控制台。

從 Bigtable 匯入資料。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "bigtableSource ": {
      "projectId": "BIGTABLE_PROJECT_ID",
      "instanceId": "INSTANCE_ID",
      "tableId": "TABLE_ID",
      "bigtableOptions": {
        "keyFieldName": "KEY_FIELD_NAME",
        "families": {
          "key": "KEY",
          "value": {
            "fieldName": "FIELD_NAME",
            "encoding": "ENCODING",
            "type": "TYPE",
            "columns": [
              {
                "qualifier": "QUALIFIER",
                "fieldName": "FIELD_NAME",
                "encoding": "COLUMN_ENCODING",
                "type": "COLUMN_VALUES_TYPE"
              }
            ]
          }
         }
         ...
      }
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
更改下列內容：
- PROJECT_ID：Vertex AI Search 專案的 ID。
- DATA_STORE_ID：資料儲存庫的 ID。ID 只能包含小寫英文字母、數字、底線和連字號。
- BIGTABLE_PROJECT_ID：Bigtable 專案的 ID。
- INSTANCE_ID：Bigtable 執行個體的 ID。
- TABLE_ID：Bigtable 資料表的 ID。
- KEY_FIELD_NAME：選用，但建議使用。將資料擷取至 Vertex AI Search 後，用於資料列鍵值的欄位名稱。
- KEY：必填。資料欄系列鍵的字串值。
- ENCODING：選用。類型不是 STRING 時的值編碼模式。如要為特定資料欄覆寫此模式，請在 columns 中列出該資料欄，並為其指定編碼。
- COLUMN_TYPE：選用。這個資料欄系列中的值類型。
- QUALIFIER：必填。資料欄的限定詞。
- FIELD_NAME：選用，但建議使用。將資料擷取至 Vertex AI Search 後，要用於這個資料欄的欄位名稱。
- COLUMN_ENCODING：選用。類型不是 STRING 時，特定資料欄值的編碼模式。
- RECONCILIATION_MODE：選用。值為 FULL 和 INCREMENTAL。預設值為 INCREMENTAL。指定 INCREMENTAL 會導致資料從 Bigtable 增量重新整理至資料儲存庫。這會執行 upsert 作業，新增文件並以 ID 相同的更新文件取代現有文件。指定 FULL 會導致資料儲存庫中的文件完全重新基準化。換句話說，系統會將新文件和更新的文件新增至資料存放區，並從資料存放區中移除不在 Bigtable 中的文件。如果您想自動刪除不再需要的文件，FULL 模式會很有幫助。
- AUTO_GENERATE_IDS：選用。指定是否要自動產生文件 ID。如果設為 true，系統會根據酬載的雜湊值產生文件 ID。請注意，多次匯入時，產生的文件 ID 可能不一致。如果您在多次匯入時自動產生 ID，Google 強烈建議將 reconciliationMode 設為 FULL，以維持文件 ID 的一致性。
  
  只有在 bigquerySource.dataSchema 設為 custom 時，才需要指定 autoGenerateIds。否則，系統會傳回 INVALID_ARGUMENT 錯誤。如未指定 autoGenerateIds 或將其設為 false，則必須指定 idField。否則文件無法匯入。
- ID_FIELD：選用。指定哪些欄位是文件 ID。

Python

詳情請參閱 Vertex AI Search Python API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

匯入文件

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# bigtable_project_id = "YOUR_BIGTABLE_PROJECT_ID"
# bigtable_instance_id = "YOUR_BIGTABLE_INSTANCE_ID"
# bigtable_table_id = "YOUR_BIGTABLE_TABLE_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

bigtable_options = discoveryengine.BigtableOptions(
    families={
        "family_name_1": discoveryengine.BigtableOptions.BigtableColumnFamily(
            type_=discoveryengine.BigtableOptions.Type.STRING,
            encoding=discoveryengine.BigtableOptions.Encoding.TEXT,
            columns=[
                discoveryengine.BigtableOptions.BigtableColumn(
                    qualifier="qualifier_1".encode("utf-8"),
                    field_name="field_name_1",
                ),
            ],
        ),
        "family_name_2": discoveryengine.BigtableOptions.BigtableColumnFamily(
            type_=discoveryengine.BigtableOptions.Type.INTEGER,
            encoding=discoveryengine.BigtableOptions.Encoding.BINARY,
        ),
    }
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    bigtable_source=discoveryengine.BigtableSource(
        project_id=bigtable_project_id,
        instance_id=bigtable_instance_id,
        table_id=bigtable_table_id,
        bigtable_options=bigtable_options,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

從 PostgreSQL 適用的 AlloyDB 匯入

如要從 AlloyDB for PostgreSQL 擷取資料，請按照下列步驟，使用 Google Cloud 控制台或 API 建立資料儲存庫並擷取資料。

如果 PostgreSQL 適用的 AlloyDB 資料與 Vertex AI Search 專案位於同一個專案，請前往「從 PostgreSQL 適用的 AlloyDB 匯入資料」。

如果 AlloyDB for PostgreSQL 資料與 Vertex AI Search 專案位於不同專案，請參閱設定 AlloyDB for PostgreSQL 存取權。

從其他專案設定 AlloyDB for PostgreSQL 存取權

如要授予 Vertex AI Search 權限，存取其他專案中的 PostgreSQL 適用的 AlloyDB 資料，請按照下列步驟操作：

將下列 PROJECT_NUMBER 變數替換為 Vertex AI Search 專案編號，然後複製這個程式碼區塊的內容。這是您的 Vertex AI Search 服務帳戶 ID：
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
```
切換至 AlloyDB for PostgreSQL 資料所在的 Google Cloud 專案。
前往身分與存取權管理頁面。

IAM
點選「授予存取權」。
在「新增主體」中，輸入 Vertex AI Search 服務帳戶 ID，然後選取「Cloud AlloyDB」>「Cloud AlloyDB 管理員」角色。
按一下 [儲存]。
切換回 Vertex AI Search 專案。

接著，請參閱「從 PostgreSQL 適用的 AlloyDB 匯入資料」。

從 PostgreSQL 適用的 AlloyDB 匯入資料

控制台

如要透過控制台擷取 PostgreSQL 適用的 AlloyDB 資料，請按照下列步驟操作：

前往 Google Cloud 控制台的「AI Applications」頁面。

AI 應用程式
點按導覽選單中的「Data Stores」(資料儲存庫)。
按一下「Create data store」(建立資料儲存庫)。
在「來源」頁面中，選取「AlloyDB」。
指定要匯入資料的專案 ID、位置 ID、叢集 ID、資料庫 ID 和資料表 ID。
按一下「繼續」。
選擇資料儲存庫的區域。
輸入資料儲存庫的名稱。
點選「建立」。
接著需要檢查擷取狀態，請前往「Data stores」(資料儲存庫) 頁面，點按資料儲存庫名稱，即可在相應的「Data」(資料) 頁面查看該儲存庫的詳細資料。「Activity」(活動) 分頁的狀態欄從「In progress」(進行中) 變為「Import completed」(匯入完成) 時，表示擷取作業已完成。

視資料大小而定，擷取作業可能需要數分鐘至數小時才能完成。

REST

如要使用指令列建立資料儲存庫，並從 AlloyDB for PostgreSQL 擷取資料，請按照下列步驟操作：

建立資料儲存庫。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'

更改下列內容：

PROJECT_ID：專案 ID。
DATA_STORE_ID：資料儲存庫的 ID。ID 只能包含小寫英文字母、數字、底線和連字號。
DISPLAY_NAME：資料儲存庫的顯示名稱。這可能會顯示在 Google Cloud 控制台。

從 PostgreSQL 適用的 AlloyDB 匯入資料。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "alloydbSource": {
      "projectId": "ALLOYDB_PROJECT_ID",
      "locationId": "LOCATION_ID",
      "clusterId": "CLUSTER_ID",
      "databaseId": "DATABASE_ID",
      "tableId": "TABLE_ID",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
更改下列內容：
- PROJECT_ID：Vertex AI Search 專案的 ID。
- DATA_STORE_ID：資料儲存庫的 ID。ID 只能包含小寫英文字母、數字、底線和連字號。
- ALLOYDB_PROJECT_ID：PostgreSQL 適用的 AlloyDB 專案 ID。
- ：PostgreSQL 適用的 AlloyDB 位置 ID。LOCATION_ID
- CLUSTER_ID：PostgreSQL 適用的 AlloyDB 叢集 ID。
- ：PostgreSQL 適用的 AlloyDB 資料庫 ID。DATABASE_ID
- TABLE_ID：PostgreSQL 適用的 AlloyDB 資料表 ID。
- RECONCILIATION_MODE：選用。值為 FULL 和 INCREMENTAL。預設值為 INCREMENTAL。指定 INCREMENTAL 會導致系統從 AlloyDB for PostgreSQL 遞增式重新整理資料至資料存放區。這會執行 upsert 作業，新增文件並以 ID 相同的更新文件取代現有文件。指定 FULL 會導致資料儲存庫中的文件完全重新基準化。換句話說，系統會將新文件和更新的文件新增至資料儲存庫，並從資料儲存庫中移除不在 AlloyDB for PostgreSQL 中的文件。如果您想自動刪除不再需要的文件，FULL 模式會很有幫助。
- AUTO_GENERATE_IDS：選用。指定是否要自動產生文件 ID。如果設為 true，系統會根據酬載的雜湊值產生文件 ID。請注意，多次匯入時，產生的文件 ID 可能不一致。如果您在多次匯入時自動產生 ID，Google 強烈建議將 reconciliationMode 設為 FULL，以維持文件 ID 的一致性。
- ID_FIELD：選用。指定哪些欄位是文件 ID。

Python

詳情請參閱 Vertex AI Search Python API 參考說明文件。

如要向 Vertex AI Search 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證」。

建立資料儲存庫


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

匯入文件

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine_v1 as discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# alloy_db_project_id = "YOUR_ALLOY_DB_PROJECT_ID"
# alloy_db_location_id = "YOUR_ALLOY_DB_LOCATION_ID"
# alloy_db_cluster_id = "YOUR_ALLOY_DB_CLUSTER_ID"
# alloy_db_database_id = "YOUR_ALLOY_DB_DATABASE_ID"
# alloy_db_table_id = "YOUR_ALLOY_DB_TABLE_ID"

# For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    alloy_db_source=discoveryengine.AlloyDbSource(
        project_id=alloy_db_project_id,
        location_id=alloy_db_location_id,
        cluster_id=alloy_db_cluster_id,
        database_id=alloy_db_database_id,
        table_id=alloy_db_table_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

使用 API 上傳 JSON 結構化資料

如要使用 API 直接上傳 JSON 文件或物件，請按照下列步驟操作。

匯入資料前，請先準備要擷取的資料。

REST

如要使用指令列建立資料儲存庫並匯入結構化 JSON 資料，請按照下列步驟操作。

建立資料儲存庫。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
}'

更改下列內容：

PROJECT_ID：您的 Google Cloud 專案 ID。
DATA_STORE_ID：要建立的 Vertex AI Search 資料儲存庫 ID。這個 ID 只能包含小寫字母、數字、底線和連字號。
DATA_STORE_DISPLAY_NAME：要建立的 Vertex AI Search 資料儲存庫顯示名稱。

匯入結構化資料。

上傳資料的方法有很多種，包括：

上傳 JSON 文件。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
-d '{
  "jsonData": "JSON_DOCUMENT_STRING"
}'

更改下列內容：

DOCUMENT_ID：文件的專屬 ID。這個 ID 的長度上限為 63 個字元，且只能包含小寫字母、數字、底線和連字號。
JSON_DOCUMENT_STRING：JSON 文件，以單一字串形式呈現。這必須符合您在上一個步驟中提供的 JSON 結構定義，例如：
```
{ \"title\": \"test title\", \"categories\": [\"cat_1\", \"cat_2\"], \"uri\": \"test uri\"}
```

上傳 JSON 物件。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
-d '{
  "structData": JSON_DOCUMENT_OBJECT
}'

將 JSON_DOCUMENT_OBJECT 替換為 JSON 文件 (JSON 物件)。這必須符合您在上一個步驟中提供的 JSON 結構定義，例如：

 {
   "title": "test title",
   "categories": [
     "cat_1",
     "cat_2"
   ],
   "uri": "test uri"
 }

使用 JSON 文件更新。

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
-d '{
  "jsonData": "JSON_DOCUMENT_STRING"
}'

使用 JSON 物件更新。

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
-d '{
  "structData": JSON_DOCUMENT_OBJECT
}'

後續步驟

如要將資料儲存庫連結至應用程式，請按照「建立搜尋應用程式」中的步驟，建立應用程式並選取資料儲存庫。
如要在設定應用程式和資料儲存庫後，預覽搜尋結果的顯示方式，請參閱「取得搜尋結果」。

排解資料擷取問題

如果資料擷取作業發生問題，請參閱下列提示：

如果您使用客戶管理的加密金鑰，且資料匯入失敗 (並顯示 The caller does not have permission 錯誤訊息)，請確認您已將金鑰的 CryptoKey 加密者/解密者 IAM 角色 (roles/cloudkms.cryptoKeyEncrypterDecrypter) 授予 Cloud Storage 服務代理程式。詳情請參閱「客戶管理的加密金鑰」一文中的「事前準備」。
如果您使用進階網站索引功能，且資料存放區的文件用量遠低於預期，請檢查您指定的索引網址模式，確認這些模式涵蓋您要建立索引的網頁，並視需要擴大範圍。舉例來說，如果您使用 *.en.example.com/*，可能需要在要建立索引的網站中加入 *.example.com/*。

使用 Terraform 建立資料儲存庫

您可以使用 Terraform 建立空白資料儲存庫。建立空白資料存放區後，您可以使用 Google Cloud 控制台或 API 指令，將資料擷取至資料存放區。

如要瞭解如何套用或移除 Terraform 設定，請參閱「基本 Terraform 指令」。

如要使用 Terraform 建立空白資料儲存區，請參閱 google_discovery_engine_data_store。

連結第三方資料來源

將第三方資料來源連結至 Vertex AI Search 是僅限許可清單的功能。

如果您已加入這項功能的封閉許可清單，請參閱 Gemini Enterprise 說明文件中的操作說明，瞭解如何連結第三方資料來源。無論是在 Vertex AI Search 或 Gemini Enterprise 中建立連結器，程序都相同。