此页面由 Cloud Translation API 翻译。

创建通用推荐数据存储区

如需创建数据存储区并注入通用推荐数据，请前往您计划使用的来源对应的部分：

网站网址
BigQuery
Cloud Storage
使用 API 上传结构化 JSON 数据

网站网址

控制台

如需使用 Google Cloud 控制台创建数据存储区并对网站中的数据进行编入索引，请按以下步骤操作：

在 Google Cloud 控制台中，前往 Agent Builder 页面。

Agent Builder
在导航菜单中，点击数据存储空间。
点击新建数据存储区。
在选择数据源页面上，选择网站内容。
选择是否为此数据存储区开启高级网站索引编制。此选项一经开启便无法关闭。

高级网站索引编制功能提供搜索摘要、搜索跟进和提取式回答等其他功能。使用高级网站索引编制功能会产生额外费用，并且要求您验证要编入索引的任何网站的域名所有权。如需了解详情，请参阅高级网站编入索引和价格。
在要包括的网站字段中，指定要编入索引的网站的网址。请在每行输入一个网址，不加英文逗号分隔符。
可选：在要排除的网站字段中，输入您要从应用中排除的网站。
点击继续。
为数据存储区输入名称。
为数据存储区选择位置。您必须启用高级网站索引编制功能，才能选择地理位置。
点击创建。Vertex AI Agent Builder 会创建数据存储区，并在数据存储区页面上显示数据存储区。
如需查看数据存储区相关信息，请点击名称列中的数据存储区名称。系统随即会显示您的数据存储区页面。

如果您已启用高级网站索引，系统会显示一条警告，提示您验证域名所有权。如果您未达到配额（您指定的网站中的网页数量超出了项目的“每个项目的文档数量”配额），系统会显示一条额外的警告，提示您升级配额。以下步骤介绍了如何验证域名所有权和升级配额。
如需验证域名所有权，请按以下步骤操作：
1. 点击在 Google Search Console 中验证。系统随即会显示欢迎使用 Google Search Console 页面。
2. 请按照屏幕上的说明验证域名或网址前缀，具体取决于您要验证的是整个网域还是网域中的网址前缀。如需了解详情，请参阅 Search Console 帮助中的验证网站所有权。
3. 完成域名验证工作流后，返回 Agent Builder 页面，然后点击导航菜单中的数据存储区。
4. 点击名称列中的数据存储区的名称。系统会显示您的数据存储区页面。
5. 点击刷新状态，以更新状态列中的值。网站的状态列表示索引编制正在进行中。
6. 针对需要验证域名的每个网站重复执行域名验证步骤，直到所有网站都开始编制索引。如果网址的状态列显示为已编入索引，则该网址或网址格式可以使用网站的“高级索引编制”功能。
如需升级配额，请按以下步骤操作：
1. 点击升级配额。系统随即会显示 Discovery Engine API 窗格，其中选中了 Quotas（配额）标签页。
2. 按照 Google Cloud 文档中的申请更高的配额上限部分中的说明操作。要增加的配额是文档数量。
3. 提交提高配额上限的请求后，返回 Agent Builder 页面，然后点击导航菜单中的数据存储区。
4. 点击名称列中的数据存储区的名称。状态列表示系统正在为超出配额的网站编制索引。如果网址的状态列显示为已编入索引，则该网址或网址格式可以使用高级网站索引编制功能。

后续步骤

如需将数据存储区附加到应用，请按照创建通用推荐应用中的步骤创建应用并选择数据存储区。
如需预览设置应用和数据存储区后建议的显示方式，请参阅获取建议。

BigQuery

如需从 BigQuery 注入数据，请按照以下步骤使用 Google Cloud 控制台或 API 创建数据存储区并注入数据。

在导入数据之前，请参阅准备数据以便提取。

控制台

如需使用 Google Cloud 控制台从 BigQuery 注入数据，请按以下步骤操作：

在 Google Cloud 控制台中，前往 Agent Builder 页面。

Agent Builder
前往数据存储区页面。
点击新建数据存储区。
在类型页面上，选择 BigQuery。
在 BigQuery 路径字段中，点击浏览，选择您准备好提取的表，然后点击选择。或者，您也可以直接在 BigQuery 路径字段中输入表位置。
选择要导入的数据类型。
点击继续。
如果您要一次性导入结构化数据，请执行以下操作：
1. 将字段映射到关键属性。
2. 如果架构中缺少重要字段，请使用添加新字段进行添加。
  
  如需了解详情，请参阅自动检测和修改简介。
3. 点击继续。
为您的数据存储区选择一个区域。
为数据存储区输入名称。
点击创建。
如需确认数据存储区是否已创建，请前往数据存储区页面，然后点击数据存储区名称，在其数据页面上查看其详细信息。
如需查看数据注入的状态，请前往数据存储区页面，然后点击数据存储区名称，在其数据页面上查看相关详细信息。当活动标签页上的状态列从进行中更改为导入已完成时，提取操作即告完成。

提取过程可能需要几分钟到几小时才能完成，具体取决于数据的大小。

REST

如需使用命令行创建数据存储区并从 BigQuery 导入数据，请按以下步骤操作：

创建数据存储区。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_RECOMMENDATION"]
}'

替换以下内容：

PROJECT_ID：您的 Google Cloud 项目的 ID。
DATA_STORE_ID：您要创建的推荐数据存储区的 ID。此 ID 只能包含小写字母、数字、下划线和连字符。
DATA_STORE_DISPLAY_NAME：您要创建的推荐数据存储区的显示名称。

可选：如果您要上传使用自定义架构的结构化数据，可以提供该架构。提供架构后，您通常会获得更好的结果。否则，系统会自动检测架构。如需了解详情，请参阅提供或自动检测架构。

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/schemas/default_schema" \
-d '{
  "structSchema": JSON_SCHEMA_OBJECT
}'

替换以下内容：

PROJECT_ID：您的 Google Cloud 项目的 ID。
DATA_STORE_ID：推荐数据存储区的 ID。

JSON_SCHEMA_OBJECT：您的 JSON 架构（作为 JSON 对象）- 例如：

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "keyPropertyMapping": "title"
    },
    "categories": {
      "type": "array",
      "items": {
        "type": "string",
        "keyPropertyMapping": "category"
      }
    },
    "uri": {
      "type": "string",
      "keyPropertyMapping": "uri"
    }
  }
}

从 BigQuery 导入数据。

如果您定义了架构，请确保数据符合该架构。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
-d '{
  "bigquerySource": {
    "projectId": "PROJECT_ID",
    "datasetId":"DATASET_ID",
    "tableId": "TABLE_ID",
    "dataSchema": "DATA_SCHEMA",
  },
  "reconciliationMode": "RECONCILIATION_MODE",
  "autoGenerateIds": "AUTO_GENERATE_IDS",
  "idField": "ID_FIELD",
  "errorConfig": {
    "gcsPrefix": "ERROR_DIRECTORY"
  }
}'
```
替换以下内容：
- PROJECT_ID：您的 Google Cloud 项目的 ID。
- DATA_STORE_ID：推荐数据存储区的 ID。
- DATASET_ID：BigQuery 数据集的 ID。
- TABLE_ID：BigQuery 表的 ID。
  - 如果 BigQuery 表不在 PROJECT_ID 下，您需要向服务账号 service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com 授予 BigQuery 表的“BigQuery Data Viewer”权限。例如，如果您要将 BigQuery 表从源项目“123”导入目标项目“456”，请为项目“123”下的 BigQuery 表授予 service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com 权限。
- DATA_SCHEMA：可选。值为 document 和 custom。默认值为 document。
  - document：您使用的 BigQuery 表必须符合准备数据以供提取中提供的默认 BigQuery 架构。您可以自行定义每个文档的 ID，同时将所有数据封装在 jsonData 字符串中。
  - custom：接受任何 BigQuery 表架构，并且推荐功能会自动为导入的每个文档生成 ID。
- ERROR_DIRECTORY：可选。存放与导入有关的错误信息的 Cloud Storage 目录，例如 gs://<your-gcs-bucket>/directory/import_errors。Google 建议将此字段留空，以便推荐功能自动创建临时目录。
- RECONCILIATION_MODE：可选。值为 FULL 和 INCREMENTAL。默认值为 INCREMENTAL。指定 INCREMENTAL 会导致从 BigQuery 到数据存储区的数据增量刷新。这会执行更新/插入操作，该操作会添加新文档，并将现有文档替换为具有相同 ID 的更新文档。指定 FULL 会导致数据存储区中的文档完全重新基准。换句话说，系统会将新建和更新的文档添加到您的数据存储区，并将 BigQuery 中不存在的文档从您的数据存储区中移除。如果您想自动删除不再需要的文档，FULL 模式会很有用。
- AUTO_GENERATE_IDS：可选。指定是否自动生成文档 ID。如果设置为 true，则文档 ID 会根据载荷的哈希生成。请注意，在多次导入后，生成的文档 ID 可能不会保持一致。如果您在多次导入时自动生成 ID，Google 强烈建议您将 reconciliationMode 设置为 FULL，以保持文档 ID 的一致性。
  
  仅当 bigquerySource.dataSchema 设置为 custom 时，才指定 autoGenerateIds。否则，系统将返回 INVALID_ARGUMENT 错误。如果您未指定 autoGenerateIds 或将其设置为 false，则必须指定 idField。否则，文档将无法导入。
- ID_FIELD：可选。指定哪些字段是文档 ID。对于 BigQuery 源文件，idField 表示 BigQuery 表中包含文档 ID 的列的名称。
  
  仅当满足以下条件时，才应指定 idField：(1) bigquerySource.dataSchema 设置为 custom，并且 (2) auto_generate_ids 设置为 false 或未指定。否则，系统将返回 INVALID_ARGUMENT 错误。
  
  BigQuery 列名称的值必须为字符串类型，必须介于 1 到 63 个字符之间，并且必须符合 RFC-1034 的要求。否则，文档将无法导入。

C#

如需了解详情，请参阅 Vertex AI Agent Builder C# API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

如需了解详情，请参阅 Vertex AI Agent Builder Go API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

如需了解详情，请参阅 Vertex AI Agent Builder Java API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

如需了解详情，请参阅 Vertex AI Agent Builder Node.js API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

如需了解详情，请参阅 Vertex AI Agent Builder Python API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。



def import_documents_bigquery_sample(
    project_id: str,
    location: str,
    data_store_id: str,
    bigquery_dataset: str,
    bigquery_table: str,
) -> str:

    from google.api_core.client_options import ClientOptions
    from google.cloud import discoveryengine

    # TODO(developer): Uncomment these variables before running the sample.
    # project_id = "YOUR_PROJECT_ID"
    # location = "YOUR_LOCATION" # Values: "global"
    # data_store_id = "YOUR_DATA_STORE_ID"
    # bigquery_dataset = "YOUR_BIGQUERY_DATASET"
    # bigquery_table = "YOUR_BIGQUERY_TABLE"

    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DocumentServiceClient(client_options=client_options)

    # The full resource name of the search engine branch.
    # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
    parent = client.branch_path(
        project=project_id,
        location=location,
        data_store=data_store_id,
        branch="default_branch",
    )

    request = discoveryengine.ImportDocumentsRequest(
        parent=parent,
        bigquery_source=discoveryengine.BigQuerySource(
            project_id=project_id,
            dataset_id=bigquery_dataset,
            table_id=bigquery_table,
            data_schema="custom",
        ),
        # Options: `FULL`, `INCREMENTAL`
        reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
    )

    # Make the request
    operation = client.import_documents(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name


def import_documents_gcs_sample(
    project_id: str,
    location: str,
    data_store_id: str,
    gcs_uri: str,
) -> str:
    from google.api_core.client_options import ClientOptions
    from google.cloud import discoveryengine

    # TODO(developer): Uncomment these variables before running the sample.
    # project_id = "YOUR_PROJECT_ID"
    # location = "YOUR_LOCATION" # Values: "global"
    # data_store_id = "YOUR_DATA_STORE_ID"

    # Examples:
    # - Unstructured documents
    #   - `gs://bucket/directory/file.pdf`
    #   - `gs://bucket/directory/*.pdf`
    # - Unstructured documents with JSONL Metadata
    #   - `gs://bucket/directory/file.json`
    # - Unstructured documents with CSV Metadata
    #   - `gs://bucket/directory/file.csv`
    # gcs_uri = "YOUR_GCS_PATH"

    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DocumentServiceClient(client_options=client_options)

    # The full resource name of the search engine branch.
    # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
    parent = client.branch_path(
        project=project_id,
        location=location,
        data_store=data_store_id,
        branch="default_branch",
    )

    request = discoveryengine.ImportDocumentsRequest(
        parent=parent,
        gcs_source=discoveryengine.GcsSource(
            # Multiple URIs are supported
            input_uris=[gcs_uri],
            # Options:
            # - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)
            # - `custom` - Unstructured documents with custom JSONL metadata
            # - `document` - Structured documents in the discoveryengine.Document format.
            # - `csv` - Unstructured documents with CSV metadata
            data_schema="content",
        ),
        # Options: `FULL`, `INCREMENTAL`
        reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
    )

    # Make the request
    operation = client.import_documents(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

Ruby

如需了解详情，请参阅 Vertex AI Agent Builder Ruby API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

后续步骤

如需将数据存储区附加到应用，请按照创建通用推荐应用中的步骤创建应用并选择数据存储区。
如需预览设置应用和数据存储区后建议的显示方式，请参阅获取建议。

Cloud Storage

如需从 Cloud Storage 注入数据，请按照以下步骤使用 Google Cloud 控制台或 API 创建数据存储区并注入数据。

在导入数据之前，请参阅准备数据以便提取。

控制台

如需使用控制台从 Cloud Storage 存储桶注入数据，请按以下步骤操作：

在 Google Cloud 控制台中，前往 Agent Builder 页面。

Agent Builder
前往数据存储区页面。
点击新建数据存储区。
在类型页面上，选择 Cloud Storage。
在选择要导入的文件夹或文件部分，选择文件夹或文件。
点击浏览，选择您准备好提取的数据，然后点击选择。或者，直接在 gs:// 字段中输入位置。
选择要导入的数据类型。
点击继续。
如果您要一次性导入结构化数据，请执行以下操作：
1. 将字段映射到关键属性。
2. 如果架构中缺少重要字段，请使用添加新字段进行添加。
  
  如需了解详情，请参阅自动检测和修改简介。
3. 点击继续。
为您的数据存储区选择一个区域。
为数据存储区输入名称。
点击创建。
如需确认数据存储区是否已创建，请前往数据存储区页面，然后点击数据存储区名称，在其数据页面上查看其详细信息。
如需查看数据注入的状态，请前往数据存储区页面，然后点击数据存储区名称，在其数据页面上查看相关详细信息。当活动标签页上的状态列从进行中更改为导入已完成时，提取操作即告完成。

提取过程可能需要几分钟到几小时才能完成，具体取决于数据的大小。

REST

如需使用命令行创建数据存储区并从 Cloud Storage 注入数据，请按以下步骤操作：

创建数据存储区。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_RECOMMENDATION"],
  "contentConfig": "CONTENT_REQUIRED"
}'

替换以下内容：

PROJECT_ID：您的 Google Cloud 项目的 ID。
DATA_STORE_ID：您要创建的推荐数据存储区的 ID。此 ID 只能包含小写字母、数字、下划线和连字符。
DATA_STORE_DISPLAY_NAME：您要创建的推荐数据存储区的显示名称。

从 Cloud Storage 导入数据。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "gcsSource": {
      "inputUris": ["INPUT_FILE_PATTERN_1", "INPUT_FILE_PATTERN_2"],
      "dataSchema": "DATA_SCHEMA",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
    "errorConfig": {
      "gcsPrefix": "ERROR_DIRECTORY"
    }
  }'
```
替换以下内容：
- PROJECT_ID：您的 Google Cloud 项目的 ID。
- DATA_STORE_ID：推荐数据存储区的 ID。
- INPUT_FILE_PATTERN：Cloud Storage 中包含文档的文件格式。
  
  对于结构化数据，或包含非结构化文档元数据的非结构化数据，输入文件模式示例为 gs://<your-gcs-bucket>/directory/object.json，或与一个或多个文件匹配的模式，例如 gs://<your-gcs-bucket>/directory/*.json。
  
  对于非结构化文档，示例为 gs://<your-gcs-bucket>/directory/*.pdf。与模式匹配的每个文件都会成为一个文档。
  
  如果 <your-gcs-bucket> 不属于 PROJECT_ID，您需要向服务账号 service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com 授予对 Cloud Storage 存储桶的“Storage Object Viewer”权限。例如，如果您要将 Cloud Storage 存储桶从源项目“123”导入目标项目“456”，请向 service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com 授予对项目“123”下的 Cloud Storage 存储桶的权限。
- DATA_SCHEMA：可选。值为 document、custom、csv 和 content。默认值为 document。
  - document：为非结构化文档上传包含元数据的非结构化数据。文件中的每一行都必须采用以下某种格式。您可以定义每个文档的 ID：
    - { "id": "<your-id>", "jsonData": "<JSON string>", "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
    - { "id": "<your-id>", "structData": <JSON object>, "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
  - custom：上传结构化文档的 JSON。数据会按照架构进行整理。您可以指定架构；否则，系统会自动检测架构。您可以将文档的 JSON 字符串以一致的格式直接放入每行中，推荐功能会自动为导入的每份文档生成 ID。
  - content：上传非结构化文档（PDF、HTML、DOC、TXT、PPTX）。每个文档的 ID 会自动生成为 SHA256(GCS_URI) 的前 128 位，编码为十六进制字符串。您可以指定多个输入文件格式，但匹配的文件不得超过 10 万个文件的限制。
  - csv：在 CSV 文件中添加标题行，并将每个标题映射到文档字段。使用 inputUris 字段指定 CSV 文件的路径。
- ERROR_DIRECTORY：可选。存放与导入有关的错误信息的 Cloud Storage 目录，例如 gs://<your-gcs-bucket>/directory/import_errors。Google 建议将此字段留空，以便推荐功能自动创建临时目录。
- RECONCILIATION_MODE：可选。值为 FULL 和 INCREMENTAL。默认值为 INCREMENTAL。指定 INCREMENTAL 会导致数据从 Cloud Storage 增量刷新到您的数据存储区。这会执行更新/插入操作，该操作会添加新文档，并将现有文档替换为具有相同 ID 的更新文档。指定 FULL 会导致数据存储区中的文档完全重新基准。换句话说，系统会将新文档和更新后的文档添加到您的数据存储区，并将 Cloud Storage 中不存在的文档从您的数据存储区中移除。如果您想自动删除不再需要的文档，FULL 模式会很有用。
- AUTO_GENERATE_IDS：可选。指定是否自动生成文档 ID。如果设置为 true，则文档 ID 会根据载荷的哈希生成。请注意，在多次导入后，生成的文档 ID 可能不会保持一致。如果您在多次导入时自动生成 ID，Google 强烈建议您将 reconciliationMode 设置为 FULL，以保持文档 ID 的一致性。
  
  仅当 gcsSource.dataSchema 设置为 custom 或 csv 时，才应指定 autoGenerateIds。否则，系统将返回 INVALID_ARGUMENT 错误。如果您未指定 autoGenerateIds 或将其设置为 false，则必须指定 idField。否则，文档将无法导入。
- ID_FIELD：可选。指定哪些字段是文档 ID。对于 Cloud Storage 来源文档，idField 用于在 JSON 字段（即文档 ID）中指定名称。例如，如果 {"my_id":"some_uuid"} 是文档中的一个文档 ID 字段，请指定 "idField":"my_id"。这会将名称为 "my_id" 的所有 JSON 字段标识为文档 ID。
  
  仅当满足以下条件时，才应指定此字段：(1) gcsSource.dataSchema 设置为 custom 或 csv，并且 (2) auto_generate_ids 设置为 false 或未指定。否则，系统将返回 INVALID_ARGUMENT 错误。
  
  请注意，Cloud Storage JSON 字段的值必须为字符串类型，必须介于 1 到 63 个字符之间，并且必须符合 RFC-1034。否则，文档将无法导入。
  
  请注意，id_field 指定的 JSON 字段名称必须为字符串类型，长度必须介于 1 到 63 个字符之间，并且必须符合 RFC-1034。否则，文档将无法导入。

C#

如需了解详情，请参阅 Vertex AI Agent Builder C# API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

如需了解详情，请参阅 Vertex AI Agent Builder Go API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

如需了解详情，请参阅 Vertex AI Agent Builder Java API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

如需了解详情，请参阅 Vertex AI Agent Builder Node.js API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

如需了解详情，请参阅 Vertex AI Agent Builder Python API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。



def import_documents_bigquery_sample(
    project_id: str,
    location: str,
    data_store_id: str,
    bigquery_dataset: str,
    bigquery_table: str,
) -> str:

    from google.api_core.client_options import ClientOptions
    from google.cloud import discoveryengine

    # TODO(developer): Uncomment these variables before running the sample.
    # project_id = "YOUR_PROJECT_ID"
    # location = "YOUR_LOCATION" # Values: "global"
    # data_store_id = "YOUR_DATA_STORE_ID"
    # bigquery_dataset = "YOUR_BIGQUERY_DATASET"
    # bigquery_table = "YOUR_BIGQUERY_TABLE"

    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DocumentServiceClient(client_options=client_options)

    # The full resource name of the search engine branch.
    # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
    parent = client.branch_path(
        project=project_id,
        location=location,
        data_store=data_store_id,
        branch="default_branch",
    )

    request = discoveryengine.ImportDocumentsRequest(
        parent=parent,
        bigquery_source=discoveryengine.BigQuerySource(
            project_id=project_id,
            dataset_id=bigquery_dataset,
            table_id=bigquery_table,
            data_schema="custom",
        ),
        # Options: `FULL`, `INCREMENTAL`
        reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
    )

    # Make the request
    operation = client.import_documents(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name


def import_documents_gcs_sample(
    project_id: str,
    location: str,
    data_store_id: str,
    gcs_uri: str,
) -> str:
    from google.api_core.client_options import ClientOptions
    from google.cloud import discoveryengine

    # TODO(developer): Uncomment these variables before running the sample.
    # project_id = "YOUR_PROJECT_ID"
    # location = "YOUR_LOCATION" # Values: "global"
    # data_store_id = "YOUR_DATA_STORE_ID"

    # Examples:
    # - Unstructured documents
    #   - `gs://bucket/directory/file.pdf`
    #   - `gs://bucket/directory/*.pdf`
    # - Unstructured documents with JSONL Metadata
    #   - `gs://bucket/directory/file.json`
    # - Unstructured documents with CSV Metadata
    #   - `gs://bucket/directory/file.csv`
    # gcs_uri = "YOUR_GCS_PATH"

    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DocumentServiceClient(client_options=client_options)

    # The full resource name of the search engine branch.
    # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
    parent = client.branch_path(
        project=project_id,
        location=location,
        data_store=data_store_id,
        branch="default_branch",
    )

    request = discoveryengine.ImportDocumentsRequest(
        parent=parent,
        gcs_source=discoveryengine.GcsSource(
            # Multiple URIs are supported
            input_uris=[gcs_uri],
            # Options:
            # - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)
            # - `custom` - Unstructured documents with custom JSONL metadata
            # - `document` - Structured documents in the discoveryengine.Document format.
            # - `csv` - Unstructured documents with CSV metadata
            data_schema="content",
        ),
        # Options: `FULL`, `INCREMENTAL`
        reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
    )

    # Make the request
    operation = client.import_documents(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

Ruby

如需了解详情，请参阅 Vertex AI Agent Builder Ruby API 参考文档。

如需向 Vertex AI Agent Builder 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

此示例会将 BigQuery 或 Cloud Storage 中的非结构化数据提取到现有数据存储区。

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

后续步骤

如需将数据存储区附加到应用，请按照创建通用推荐应用中的步骤创建应用并选择数据存储区。
如需预览设置应用和数据存储区后建议的显示方式，请参阅获取建议。

使用 API 上传结构化 JSON 数据

如需使用该 API 直接上传 JSON 文档或对象，请按以下步骤操作。

在导入数据之前，请准备数据以进行提取。

REST

如需使用命令行创建数据存储区并导入结构化 JSON 数据，请按以下步骤操作：

创建数据存储区。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_RECOMMENDATION"]
}'

替换以下内容：

PROJECT_ID：您的 Google Cloud 项目的 ID。
DATA_STORE_ID：您要创建的推荐数据存储区的 ID。此 ID 只能包含小写字母、数字、下划线和连字符。
DATA_STORE_DISPLAY_NAME：您要创建的推荐数据存储区的显示名称。

可选：提供您自己的架构。提供架构后，您通常会获得更好的结果。如需了解详情，请参阅提供或自动检测架构。

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/schemas/default_schema" \
-d '{
  "structSchema": JSON_SCHEMA_OBJECT
}'

替换以下内容：

PROJECT_ID：您的 Google Cloud 项目的 ID。
DATA_STORE_ID：推荐数据存储区的 ID。

JSON_SCHEMA_OBJECT：您的 JSON 架构（作为 JSON 对象）- 例如：

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "keyPropertyMapping": "title"
    },
    "categories": {
      "type": "array",
      "items": {
        "type": "string",
        "keyPropertyMapping": "category"
      }
    },
    "uri": {
      "type": "string",
      "keyPropertyMapping": "uri"
    }
  }
}

导入符合定义的架构的结构化数据。

您可以通过多种方式上传数据，包括：

上传 JSON 文档。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
-d '{
  "jsonData": "JSON_DOCUMENT_STRING"
}'

将 JSON_DOCUMENT_STRING 替换为 JSON 文档（作为单个字符串）。此字符串必须符合您在上一步中提供的 JSON 架构，例如：

```none
{ \"title\": \"test title\", \"categories\": [\"cat_1\", \"cat_2\"], \"uri\": \"test uri\"}
```

上传 JSON 对象。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
-d '{
  "structData": JSON_DOCUMENT_OBJECT
}'

将 JSON_DOCUMENT_OBJECT 替换为 JSON 文档作为 JSON 对象。此字符串必须符合您在上一步中提供的 JSON 架构，例如：

```json
{
  "title": "test title",
  "categories": [
    "cat_1",
    "cat_2"
  ],
  "uri": "test uri"
}
```

使用 JSON 文档进行更新。

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
-d '{
  "jsonData": "JSON_DOCUMENT_STRING"
}'

使用 JSON 对象进行更新。

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
-d '{
  "structData": JSON_DOCUMENT_OBJECT
}'

后续步骤

如需将数据存储区附加到应用，请按照创建通用推荐应用中的步骤创建应用并选择数据存储区。
如需预览设置应用和数据存储区后建议的显示方式，请参阅获取建议。

使用 Terraform 创建数据存储区

您可以使用 Terraform 创建空数据存储区。创建空数据存储区后，您可以使用 Google Cloud 控制台或 API 命令将数据注入到数据存储区。

如需了解如何应用或移除 Terraform 配置，请参阅基本 Terraform 命令。

如需使用 Terraform 创建空数据存储区，请参阅 google_discovery_engine_data_store。