Halaman ini diterjemahkan oleh Cloud Translation API.

Membuat penyimpanan data rekomendasi kustom

Untuk membuat penyimpanan data dan menyerap data untuk rekomendasi kustom, buka bagian untuk sumber yang ingin Anda gunakan:

BigQuery
Cloud Storage
Mengupload data JSON terstruktur dengan API

BigQuery

Anda dapat membuat penyimpanan data dari tabel BigQuery dengan dua cara:

Penyerapan satu kali: Anda mengimpor data dari tabel BigQuery ke penyimpanan data. Data di penyimpanan data tidak akan berubah kecuali jika Anda memuat ulang data secara manual.
Penyerapan berkala: Anda mengimpor data dari satu atau beberapa tabel BigQuery, dan menetapkan frekuensi sinkronisasi yang menentukan seberapa sering data toko diperbarui dengan data terbaru dari set data BigQuery.

Tabel berikut membandingkan dua cara yang dapat Anda gunakan untuk mengimpor data BigQuery ke penyimpanan data Vertex AI Search.

Penyerapan satu kali	Penyerapan berkala
Tersedia secara umum (GA).	Pratinjau publik.
Data harus diperbarui secara manual.	Data diperbarui secara otomatis setiap 1, 3, atau 5 hari. Data tidak dapat dimuat ulang secara manual.
Vertex AI Search membuat satu penyimpanan data dari satu tabel di BigQuery.	Vertex AI Search membuat konektor data untuk set data BigQuery dan penyimpanan data (disebut penyimpanan data entitas) untuk setiap tabel yang ditentukan. Untuk setiap konektor data, tabel harus memiliki jenis data yang sama (misalnya, terstruktur) dan berada dalam set data BigQuery yang sama.
Data dari beberapa tabel dapat digabungkan dalam satu penyimpanan data dengan terlebih dahulu menyerapan data dari satu tabel, lalu lebih banyak data dari sumber lain atau tabel BigQuery.	Karena impor data manual tidak didukung, data di penyimpanan data entitas hanya dapat bersumber dari satu tabel BigQuery.
Kontrol akses sumber data didukung.	Kontrol akses sumber data tidak didukung. Data yang diimpor dapat berisi kontrol akses, tetapi kontrol ini tidak akan dipatuhi.
Anda dapat membuat penyimpanan data menggunakan konsolGoogle Cloud atau API.	Anda harus menggunakan konsol untuk membuat konektor data dan penyimpanan data entitasnya.
Kompatibel dengan CMEK.	Kompatibel dengan CMEK.

Mengimpor satu kali dari BigQuery

Untuk menyerap data dari tabel BigQuery, gunakan langkah-langkah berikut untuk membuat penyimpanan data dan menyerap data menggunakan konsol atau API. Google Cloud

Sebelum mengimpor data, tinjau artikel Menyiapkan data untuk penyerapan.

Konsol

Untuk menggunakan konsol Google Cloud guna menyerap data dari BigQuery, ikuti langkah-langkah berikut:

Di konsol Google Cloud , buka halaman AI Applications.

Aplikasi AI
Buka halaman Data Stores.
Klik Buat penyimpanan data.
Di halaman Sumber, pilih BigQuery.
Pilih jenis data yang akan Anda impor dari bagian Jenis data apa yang Anda impor.
Pilih Satu kali di bagian Frekuensi sinkronisasi.
Di kolom BigQuery path, klik Browse, pilih tabel yang telah Anda siapkan untuk penyerapan, lalu klik Select. Atau, masukkan lokasi tabel langsung di kolom jalur BigQuery.
Klik Lanjutkan.
Jika Anda melakukan impor data terstruktur satu kali:
1. Petakan kolom ke properti utama.
2. Jika ada kolom penting yang tidak ada dalam skema, gunakan Tambahkan kolom baru untuk menambahkannya.
  
  Untuk mengetahui informasi selengkapnya, lihat Tentang deteksi otomatis dan pengeditan.
3. Klik Lanjutkan.
Pilih region untuk penyimpanan data Anda.
Masukkan nama untuk penyimpanan data Anda.
Klik Buat.
Untuk memeriksa status penyerapan, buka halaman Data Stores dan klik nama penyimpanan data Anda untuk melihat detailnya di halaman Data. Saat kolom status di tab Aktivitas berubah dari Sedang berlangsung menjadi Impor selesai, penyerapan selesai.

Bergantung pada ukuran data Anda, penyerapan data dapat memerlukan waktu beberapa menit hingga beberapa jam.

REST

Untuk menggunakan command line guna membuat penyimpanan data dan mengimpor data dari BigQuery, ikuti langkah-langkah berikut.

Buat penyimpanan data.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_RECOMMENDATION"]
}'
```
Catatan: <ahref=" docs="" generative-ai-app-builder="" industryvertical"="" reference="" rest="" v1="">Vertical industri GENERIC digunakan untuk membuat penyimpanan data terstruktur, tidak terstruktur, dan situs untuk aplikasi rekomendasi kustom. </ahref=">

Ganti kode berikut:
- PROJECT_ID: ID Google Cloud project Anda.
- DATA_STORE_ID: ID penyimpanan data Vertex AI Search yang ingin Anda buat. ID ini hanya boleh berisi huruf kecil, angka, garis bawah, dan tanda hubung.
- DATA_STORE_DISPLAY_NAME: nama tampilan penyimpanan data Vertex AI Search yang ingin Anda buat.
Mengimpor data dari BigQuery.

Jika Anda menentukan skema, pastikan data sesuai dengan skema tersebut.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
-d '{
  "bigquerySource": {
    "projectId": "PROJECT_ID",
    "datasetId":"DATASET_ID",
    "tableId": "TABLE_ID",
    "dataSchema": "DATA_SCHEMA",
    "aclEnabled": "BOOLEAN"
  },
  "reconciliationMode": "RECONCILIATION_MODE",
  "autoGenerateIds": "AUTO_GENERATE_IDS",
  "idField": "ID_FIELD",
  "errorConfig": {
    "gcsPrefix": "ERROR_DIRECTORY"
  }
}'
```
Ganti kode berikut:
- PROJECT_ID: ID Google Cloud project Anda.
- DATA_STORE_ID: ID penyimpanan data Vertex AI Search.
- DATASET_ID: ID set data BigQuery.
- TABLE_ID: ID tabel BigQuery.
  - Jika tabel BigQuery tidak berada di bawah PROJECT_ID, Anda harus memberikan izin "BigQuery Data Viewer" kepada akun layanan service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com untuk tabel BigQuery. Misalnya, jika Anda mengimpor tabel BigQuery dari project sumber "123" ke project tujuan "456", berikan izin service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com untuk tabel BigQuery di project "123".
- DATA_SCHEMA: optional. Nilainya adalah document dan custom. Defaultnya adalah document.
  - document: tabel BigQuery yang Anda gunakan harus sesuai dengan skema BigQuery default yang disediakan di Menyiapkan data untuk penyerapan. Anda dapat menentukan sendiri ID setiap dokumen, sambil membungkus semua data dalam string jsonData.
  - custom: Skema tabel BigQuery apa pun diterima, dan Vertex AI Search akan otomatis membuat ID untuk setiap dokumen yang diimpor.
- ERROR_DIRECTORY: optional. Direktori Cloud Storage untuk informasi error tentang impor—misalnya, gs://<your-gcs-bucket>/directory/import_errors. Google merekomendasikan agar Anda mengosongkan kolom ini agar Vertex AI Search dapat membuat direktori sementara secara otomatis.
- RECONCILIATION_MODE: optional. Nilainya adalah FULL dan INCREMENTAL. Default-nya adalah INCREMENTAL. Menentukan INCREMENTAL akan menyebabkan pembaruan data inkremental dari BigQuery ke penyimpanan data Anda. Operasi ini melakukan operasi upsert, yang menambahkan dokumen baru dan mengganti dokumen yang ada dengan dokumen yang telah diupdate dengan ID yang sama. Menentukan FULL akan menyebabkan rebase penuh dokumen di penyimpanan data Anda. Dengan kata lain, dokumen baru dan yang diperbarui ditambahkan ke penyimpanan data Anda, dan dokumen yang tidak ada di BigQuery akan dihapus dari penyimpanan data Anda. Mode FULL berguna jika Anda ingin menghapus dokumen yang tidak lagi diperlukan secara otomatis.
- AUTO_GENERATE_IDS: optional. Menentukan apakah ID dokumen akan dibuat secara otomatis. Jika disetel ke true, ID dokumen dibuat berdasarkan hash payload. Perhatikan bahwa ID dokumen yang dibuat mungkin tidak tetap konsisten selama beberapa kali impor. Jika Anda membuat ID secara otomatis di beberapa impor, Google sangat merekomendasikan agar Anda menyetel reconciliationMode ke FULL untuk mempertahankan ID dokumen yang konsisten.
  
  Tentukan autoGenerateIds hanya jika bigquerySource.dataSchema ditetapkan ke custom. Jika tidak, error INVALID_ARGUMENT akan ditampilkan. Jika Anda tidak menentukan autoGenerateIds atau menyetelnya ke false, Anda harus menentukan idField. Jika tidak, dokumen akan gagal diimpor.
- ID_FIELD: optional. Menentukan kolom mana yang merupakan ID dokumen. Untuk file sumber BigQuery, idField menunjukkan nama kolom dalam tabel BigQuery yang berisi ID dokumen.
  
  Tentukan idField hanya jika: (1) bigquerySource.dataSchema ditetapkan ke custom, dan (2) auto_generate_ids ditetapkan ke false atau tidak ditentukan. Jika tidak, error INVALID_ARGUMENT akan ditampilkan.
  
  Nilai nama kolom BigQuery harus berupa jenis string, harus terdiri dari 1 hingga 63 karakter, dan harus sesuai dengan RFC-1034. Jika tidak, dokumen akan gagal diimpor.

C#

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API C# Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataStoreServiceClientSnippets
{
    /// <summary>Snippet for CreateDataStore</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataStoreRequestObject()
    {
        // Create client
        DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.Create();
        // Initialize request argument(s)
        CreateDataStoreRequest request = new CreateDataStoreRequest
        {
            ParentAsCollectionName = CollectionName.FromProjectLocationCollection("[PROJECT]", "[LOCATION]", "[COLLECTION]"),
            DataStore = new DataStore(),
            DataStoreId = "",
            CreateAdvancedSiteSearch = false,
            CmekConfigNameAsCmekConfigName = CmekConfigName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            SkipDefaultSchemaCreation = false,
        };
        // Make the request
        Operation<DataStore, CreateDataStoreMetadata> response = dataStoreServiceClient.CreateDataStore(request);

        // Poll until the returned long-running operation is complete
        Operation<DataStore, CreateDataStoreMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataStore result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataStore, CreateDataStoreMetadata> retrievedResponse = dataStoreServiceClient.PollOnceCreateDataStore(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataStore retrievedResult = retrievedResponse.Result;
        }
    }
}

Mengimpor dokumen

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
            ForceRefreshContent = false,
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Go Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDataStoreClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.CreateDataStoreRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.
	}
	op, err := c.CreateDataStore(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Mengimpor dokumen


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Java Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data

import com.google.cloud.discoveryengine.v1.CollectionName;
import com.google.cloud.discoveryengine.v1.CreateDataStoreRequest;
import com.google.cloud.discoveryengine.v1.DataStore;
import com.google.cloud.discoveryengine.v1.DataStoreServiceClient;

public class SyncCreateDataStore {

  public static void main(String[] args) throws Exception {
    syncCreateDataStore();
  }

  public static void syncCreateDataStore() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.create()) {
      CreateDataStoreRequest request =
          CreateDataStoreRequest.newBuilder()
              .setParent(CollectionName.of("[PROJECT]", "[LOCATION]", "[COLLECTION]").toString())
              .setDataStore(DataStore.newBuilder().build())
              .setDataStoreId("dataStoreId929489618")
              .setCreateAdvancedSiteSearch(true)
              .setSkipDefaultSchemaCreation(true)
              .build();
      DataStore response = dataStoreServiceClient.createDataStoreAsync(request).get();
    }
  }
}

Mengimpor dokumen

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .setForceRefreshContent(true)
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Node.js Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Resource name of the CmekConfig to use for protecting this DataStore.
 */
// const cmekConfigName = 'abc123'
/**
 *  DataStore without CMEK protections. If a default CmekConfig is set for
 *  the project, setting this field will override the default CmekConfig as
 *  well.
 */
// const disableCmek = true
/**
 *  Required. The parent resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}`.
 */
// const parent = 'abc123'
/**
 *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to
 *  create.
 */
// const dataStore = {}
/**
 *  Required. The ID to use for the
 *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become
 *  the final component of the
 *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name.
 *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  standard with a length limit of 63 characters. Otherwise, an
 *  INVALID_ARGUMENT error is returned.
 */
// const dataStoreId = 'abc123'
/**
 *  A boolean flag indicating whether user want to directly create an advanced
 *  data store for site search.
 *  If the data store is not configured as site
 *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will
 *  be ignored.
 */
// const createAdvancedSiteSearch = true
/**
 *  A boolean flag indicating whether to skip the default schema creation for
 *  the data store. Only enable this flag if you are certain that the default
 *  schema is incompatible with your use case.
 *  If set to true, you must manually create a schema for the data store before
 *  any documents can be ingested.
 *  This flag cannot be specified if `data_store.starting_schema` is specified.
 */
// const skipDefaultSchemaCreation = true

// Imports the Discoveryengine library
const {DataStoreServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DataStoreServiceClient();

async function callCreateDataStore() {
  // Construct request
  const request = {
    parent,
    dataStore,
    dataStoreId,
  };

  // Run request
  const [operation] = await discoveryengineClient.createDataStore(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataStore();

Mengimpor dokumen

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'
/**
 *  Optional. Whether to force refresh the unstructured content of the
 *  documents.
 *  If set to `true`, the content part of the documents will be refreshed
 *  regardless of the update status of the referencing content.
 */
// const forceRefreshContent = true

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Python Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

Mengimpor dokumen


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# bigquery_dataset = "YOUR_BIGQUERY_DATASET"
# bigquery_table = "YOUR_BIGQUERY_TABLE"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    bigquery_source=discoveryengine.BigQuerySource(
        project_id=project_id,
        dataset_id=bigquery_dataset,
        table_id=bigquery_table,
        data_schema="custom",
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Ruby

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Ruby Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data

require "google/cloud/discovery_engine/v1"

##
# Snippet for the create_data_store call in the DataStoreService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.
#
def create_data_store
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new

  # Call the create_data_store method.
  result = client.create_data_store request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

Mengimpor dokumen

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

Menghubungkan ke BigQuery dengan sinkronisasi berkala

Sebelum mengimpor data, tinjau artikel Menyiapkan data untuk penyerapan.

Prosedur berikut menjelaskan cara membuat konektor data yang mengaitkan set data BigQuery dengan konektor data Vertex AI Search dan cara menentukan tabel pada set data untuk setiap penyimpanan data yang ingin Anda buat. Penyimpanan data yang merupakan turunan dari konektor data disebut penyimpanan data entitas.

Data dari set data disinkronkan secara berkala ke penyimpanan data entity. Anda dapat menentukan sinkronisasi harian, setiap tiga hari, atau setiap lima hari.

Konsol

Untuk menggunakan konsol Google Cloud guna membuat konektor yang secara berkala menyinkronkan data dari set data BigQuery ke Vertex AI Search, ikuti langkah-langkah berikut:

Di konsol Google Cloud , buka halaman AI Applications.

Aplikasi AI
Di menu navigasi, klik Data Stores.
Klik Create data store.
Di halaman Sumber, pilih BigQuery.
Pilih jenis data yang Anda impor.
Klik Berkala.
Pilih Frekuensi sinkronisasi, seberapa sering Anda ingin konektor Vertex AI Search disinkronkan dengan set data BigQuery. Anda dapat mengubah frekuensi nanti.
Di kolom BigQuery dataset path, klik Browse, pilih set data yang berisi tabel yang telah Anda siapkan untuk penyerapan. Atau, masukkan lokasi tabel secara langsung di kolom jalur BigQuery. Format untuk jalur adalah projectname.datasetname.
Di kolom Tabel yang akan disinkronkan, klik Telusuri, lalu pilih tabel yang berisi data yang Anda inginkan untuk penyimpanan data.
Catatan:
Pastikan data dalam tabel cocok dengan jenis data yang Anda pilih pada langkah 5.
Jika ada ketidakcocokan, Anda tidak akan mengetahuinya hingga salah satu hal berikut terjadi:
- Anda mendapatkan error saat konektor mencoba mengimpor data.
- Anda melihat hasil yang tidak terduga. Hal ini terjadi jika jenis yang dipilih terstruktur, tetapi seharusnya tidak terstruktur atau terstruktur dengan metadata. Data diimpor, tetapi URL atau metadata konten tidak dikenali dan diperlakukan sebagai string.
Jika ada tabel tambahan dalam set data yang ingin Anda gunakan untuk penyimpanan data, klik Tambahkan tabel dan tentukan tabel tersebut juga.
Klik Lanjutkan.
Pilih region untuk penyimpanan data Anda, masukkan nama untuk penghubung data Anda, lalu klik Buat.

Anda kini telah membuat konektor data, yang akan menyinkronkan data secara berkala dengan set data BigQuery. Selain itu, Anda telah membuat satu atau beberapa penyimpanan data entitas. Penyimpanan data memiliki nama yang sama dengan tabel BigQuery.
Untuk memeriksa status penyerapan, buka halaman Data Stores, lalu klik nama penghubung data Anda untuk melihat detailnya di halaman Data > tab Data ingestion activity. Saat kolom status di tab Aktivitas berubah dari Sedang berlangsung menjadi berhasil, penyerapan pertama selesai.

Bergantung pada ukuran data Anda, penyerapan data dapat memerlukan waktu beberapa menit hingga beberapa jam.

Setelah Anda menyiapkan sumber data dan mengimpor data untuk pertama kalinya, penyimpanan data akan menyinkronkan data dari sumber tersebut dengan frekuensi yang Anda pilih selama penyiapan. Sekitar satu jam setelah konektor data dibuat, sinkronisasi pertama akan terjadi. Sinkronisasi berikutnya akan terjadi sekitar 24 jam, 72 jam, atau 120 jam kemudian.

Langkah berikutnya

Untuk melampirkan penyimpanan data ke aplikasi, buat aplikasi dan pilih penyimpanan data Anda dengan mengikuti langkah-langkah di Membuat aplikasi rekomendasi kustom.
Untuk melihat pratinjau atau mendapatkan rekomendasi setelah aplikasi dan penyimpanan data Anda disiapkan, lihat Mendapatkan rekomendasi.

Cloud Storage

Anda dapat membuat penyimpanan data dari tabel Cloud Storage dengan dua cara:

Penyerapan satu kali: Anda mengimpor data dari folder atau file Cloud Storage ke dalam penyimpanan data. Data di penyimpanan data tidak berubah kecuali jika Anda memuat ulang data secara manual.
Penyerapan berkala: Anda mengimpor data dari folder atau file Cloud Storage, dan menetapkan frekuensi sinkronisasi yang menentukan seberapa sering penyimpanan data diperbarui dengan data terbaru dari lokasi Cloud Storage tersebut.

Tabel berikut membandingkan dua cara Anda dapat mengimpor data Cloud Storage ke penyimpanan data Vertex AI Search.

Penyerapan satu kali	Penyerapan berkala
Tersedia secara umum (GA).	Pratinjau publik.
Data harus diperbarui secara manual.	Data diperbarui secara otomatis setiap satu, tiga, atau lima hari. Data tidak dapat dimuat ulang secara manual.
Vertex AI Search membuat satu penyimpanan data dari satu folder atau file di Cloud Storage.	Vertex AI Search membuat konektor data, dan mengaitkan penyimpanan data (yang disebut penyimpanan data entitas) dengannya untuk file atau folder yang ditentukan. Setiap konektor data Cloud Storage dapat memiliki satu penyimpanan data entity.
Data dari beberapa file, folder, dan bucket dapat digabungkan dalam satu penyimpanan data dengan terlebih dahulu menyerap data dari satu lokasi Cloud Storage, lalu menyerap lebih banyak data dari lokasi lain.	Karena impor data manual tidak didukung, data di penyimpanan data entitas hanya dapat bersumber dari satu file atau folder Cloud Storage.
Kontrol akses sumber data didukung. Untuk mengetahui informasi selengkapnya, lihat Kontrol akses sumber data.	Kontrol akses sumber data tidak didukung. Data yang diimpor dapat berisi kontrol akses, tetapi kontrol ini tidak akan dipatuhi.
Anda dapat membuat penyimpanan data menggunakan konsolGoogle Cloud atau API.	Anda harus menggunakan konsol untuk membuat konektor data dan penyimpanan data entitasnya.
Kompatibel dengan CMEK.	Kompatibel dengan CMEK.

Mengimpor sekali dari Cloud Storage

Untuk menyerap data dari Cloud Storage, gunakan langkah-langkah berikut untuk membuat penyimpanan data dan menyerap data menggunakan konsol atau API. Google Cloud

Sebelum mengimpor data, tinjau artikel Menyiapkan data untuk penyerapan.

Konsol

Untuk menggunakan konsol guna menyerap data dari bucket Cloud Storage, ikuti langkah-langkah berikut:

Di konsol Google Cloud , buka halaman AI Applications.

Aplikasi AI
Buka halaman Data Stores.
Klik Buat penyimpanan data.
Di halaman Source, pilih Cloud Storage.
Di bagian Pilih folder atau file yang ingin Anda impor, pilih Folder atau File.
Klik Jelajahi, lalu pilih data yang telah Anda siapkan untuk penyerapan, lalu klik Pilih. Atau, masukkan lokasi langsung di kolom gs://.
Pilih jenis data yang Anda impor.
Klik Lanjutkan.
Jika Anda melakukan impor data terstruktur satu kali:
1. Petakan kolom ke properti utama.
2. Jika ada kolom penting yang tidak ada dalam skema, gunakan Tambahkan kolom baru untuk menambahkannya.
  
  Untuk mengetahui informasi selengkapnya, lihat Tentang deteksi otomatis dan pengeditan.
3. Klik Lanjutkan.
Pilih region untuk penyimpanan data Anda.
Masukkan nama untuk penyimpanan data Anda.
Opsional: Jika Anda memilih dokumen tidak terstruktur, Anda dapat memilih opsi penguraian dan pengelompokan untuk dokumen Anda. Untuk membandingkan parser, lihat Mem-parsing dokumen. Untuk mengetahui informasi tentang chunking, lihat Mengelompokkan dokumen untuk RAG.

Parser OCR dan parser tata letak dapat menimbulkan biaya tambahan. Lihat Harga fitur Document AI.

Untuk memilih parser, luaskan Opsi pemrosesan dokumen dan tentukan opsi parser yang ingin Anda gunakan.
Klik Buat.
Untuk memeriksa status penyerapan, buka halaman Data Stores dan klik nama penyimpanan data Anda untuk melihat detailnya di halaman Data. Saat kolom status di tab Aktivitas berubah dari Sedang berlangsung menjadi Impor selesai, penyerapan selesai.

Bergantung pada ukuran data Anda, penyerapan dapat memerlukan waktu beberapa menit atau beberapa jam.

REST

Untuk menggunakan command line guna membuat penyimpanan data dan menyerap data dari Cloud Storage, ikuti langkah-langkah berikut.

Buat penyimpanan data.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_RECOMMENDATION"]
}'
```
Catatan: <ahref=" docs="" generative-ai-app-builder="" industryvertical"="" reference="" rest="" v1="">Vertical industri GENERIC digunakan untuk membuat penyimpanan data terstruktur, tidak terstruktur, dan situs untuk aplikasi rekomendasi kustom. </ahref=">

Ganti kode berikut:
- PROJECT_ID: ID Google Cloud project Anda.
- DATA_STORE_ID: ID penyimpanan data Vertex AI Search yang ingin Anda buat. ID ini hanya boleh berisi huruf kecil, angka, garis bawah, dan tanda hubung.
- DATA_STORE_DISPLAY_NAME: nama tampilan penyimpanan data Vertex AI Search yang ingin Anda buat.
Mengimpor data dari Cloud Storage.
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "gcsSource": {
      "inputUris": ["INPUT_FILE_PATTERN_1", "INPUT_FILE_PATTERN_2"],
      "dataSchema": "DATA_SCHEMA",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
    "errorConfig": {
      "gcsPrefix": "ERROR_DIRECTORY"
    }
  }'
```
Ganti kode berikut:
- PROJECT_ID: ID Google Cloud project Anda.
- DATA_STORE_ID: ID penyimpanan data Vertex AI Search.
- INPUT_FILE_PATTERN: pola file di Cloud Storage yang berisi dokumen Anda.
  
  Untuk data terstruktur atau data tidak terstruktur dengan metadata, contoh pola file input adalah gs://<your-gcs-bucket>/directory/object.jsondan contoh pencocokan pola satu atau beberapa file adalah gs://<your-gcs-bucket>/directory/*.json.
  
  Untuk dokumen tidak terstruktur, contohnya adalah gs://<your-gcs-bucket>/directory/*.pdf. Setiap file yang cocok dengan pola akan menjadi dokumen.
  
  Jika <your-gcs-bucket> tidak berada di bawah PROJECT_ID, Anda harus memberikan izin "Storage Object Viewer" kepada akun layanan service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com untuk bucket Cloud Storage. Misalnya, jika Anda mengimpor bucket Cloud Storage dari project sumber "123" ke project tujuan "456", berikan izin service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com pada bucket Cloud Storage di project "123".
- DATA_SCHEMA: optional. Nilainya adalah document, custom, csv, dan content. Defaultnya adalah document.
  - document: Upload data tidak terstruktur dengan metadata untuk dokumen tidak terstruktur. Setiap baris file harus mengikuti salah satu format berikut. Anda dapat menentukan ID setiap dokumen:
    - { "id": "<your-id>", "jsonData": "<JSON string>", "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
    - { "id": "<your-id>", "structData": <JSON object>, "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
  - custom: Upload JSON untuk dokumen terstruktur. Data diatur sesuai dengan skema. Anda dapat menentukan skema; jika tidak, skema akan terdeteksi secara otomatis. Anda dapat menempatkan string JSON dokumen dalam format yang konsisten langsung di setiap baris, dan Vertex AI Search akan otomatis membuat ID untuk setiap dokumen yang diimpor.
  - content: Upload dokumen tidak terstruktur (PDF, HTML, DOC, TXT, PPTX). ID setiap dokumen dibuat secara otomatis sebagai 128 bit pertama SHA256(GCS_URI) yang dienkode sebagai string hex. Anda dapat menentukan beberapa pola file input selama file yang cocok tidak melebihi batas 100 ribu file.
  - csv: Sertakan baris header dalam file CSV Anda, dengan setiap header dipetakan ke kolom dokumen. Tentukan jalur ke file CSV menggunakan kolom inputUris.
- ERROR_DIRECTORY: optional. Direktori Cloud Storage untuk informasi error tentang impor—misalnya, gs://<your-gcs-bucket>/directory/import_errors. Google merekomendasikan agar kolom ini dibiarkan kosong agar Vertex AI Search dapat membuat direktori sementara secara otomatis.
- RECONCILIATION_MODE: optional. Nilainya adalah FULL dan INCREMENTAL. Default-nya adalah INCREMENTAL. Menentukan INCREMENTAL akan menyebabkan refresh data inkremental dari Cloud Storage ke penyimpanan data Anda. Operasi ini melakukan operasi upsert, yang menambahkan dokumen baru dan menggantikan dokumen yang ada dengan dokumen yang diperbarui dengan ID yang sama. Menentukan FULL akan menyebabkan rebase penuh dokumen di penyimpanan data Anda. Dengan kata lain, dokumen baru dan yang diperbarui ditambahkan ke penyimpanan data Anda, dan dokumen yang tidak ada di Cloud Storage akan dihapus dari penyimpanan data Anda. Mode FULL berguna jika Anda ingin menghapus dokumen yang tidak lagi diperlukan secara otomatis.
- AUTO_GENERATE_IDS: optional. Menentukan apakah ID dokumen akan dibuat secara otomatis. Jika disetel ke true, ID dokumen dibuat berdasarkan hash payload. Perhatikan bahwa ID dokumen yang dibuat mungkin tidak tetap konsisten selama beberapa kali impor. Jika Anda membuat ID secara otomatis di beberapa impor, Google sangat merekomendasikan agar Anda menyetel reconciliationMode ke FULL untuk mempertahankan ID dokumen yang konsisten.
  
  Tentukan autoGenerateIds hanya jika gcsSource.dataSchema ditetapkan ke custom atau csv. Jika tidak, error INVALID_ARGUMENT akan ditampilkan. Jika Anda tidak menentukan autoGenerateIds atau menyetelnya ke false, Anda harus menentukan idField. Jika tidak, dokumen akan gagal diimpor.
- ID_FIELD: optional. Menentukan kolom mana yang merupakan ID dokumen. Untuk dokumen sumber Cloud Storage, idField menentukan nama di kolom JSON yang merupakan ID dokumen. Misalnya, jika {"my_id":"some_uuid"} adalah kolom ID dokumen di salah satu dokumen Anda, tentukan "idField":"my_id". Mengidentifikasi semua kolom JSON dengan nama "my_id" sebagai ID dokumen.
  
  Tentukan kolom ini hanya jika: (1) gcsSource.dataSchema ditetapkan ke custom atau csv, dan (2) auto_generate_ids ditetapkan ke false atau tidak ditentukan. Jika tidak, error INVALID_ARGUMENT akan ditampilkan.
  
  Perhatikan bahwa nilai kolom JSON Cloud Storage harus berupa jenis string, harus antara 1-63 karakter, dan harus sesuai dengan RFC-1034. Jika tidak, dokumen akan gagal diimpor.
  
  Perhatikan bahwa nama kolom JSON yang ditentukan oleh id_field harus berupa jenis string, harus terdiri dari 1 hingga 63 karakter, dan harus sesuai dengan RFC-1034. Jika tidak, dokumen akan gagal diimpor.

C#

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API C# Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataStoreServiceClientSnippets
{
    /// <summary>Snippet for CreateDataStore</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataStoreRequestObject()
    {
        // Create client
        DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.Create();
        // Initialize request argument(s)
        CreateDataStoreRequest request = new CreateDataStoreRequest
        {
            ParentAsCollectionName = CollectionName.FromProjectLocationCollection("[PROJECT]", "[LOCATION]", "[COLLECTION]"),
            DataStore = new DataStore(),
            DataStoreId = "",
            CreateAdvancedSiteSearch = false,
            CmekConfigNameAsCmekConfigName = CmekConfigName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            SkipDefaultSchemaCreation = false,
        };
        // Make the request
        Operation<DataStore, CreateDataStoreMetadata> response = dataStoreServiceClient.CreateDataStore(request);

        // Poll until the returned long-running operation is complete
        Operation<DataStore, CreateDataStoreMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataStore result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataStore, CreateDataStoreMetadata> retrievedResponse = dataStoreServiceClient.PollOnceCreateDataStore(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataStore retrievedResult = retrievedResponse.Result;
        }
    }
}

Mengimpor dokumen

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
            ForceRefreshContent = false,
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Go Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDataStoreClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.CreateDataStoreRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.
	}
	op, err := c.CreateDataStore(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Mengimpor dokumen


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Java Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data

import com.google.cloud.discoveryengine.v1.CollectionName;
import com.google.cloud.discoveryengine.v1.CreateDataStoreRequest;
import com.google.cloud.discoveryengine.v1.DataStore;
import com.google.cloud.discoveryengine.v1.DataStoreServiceClient;

public class SyncCreateDataStore {

  public static void main(String[] args) throws Exception {
    syncCreateDataStore();
  }

  public static void syncCreateDataStore() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.create()) {
      CreateDataStoreRequest request =
          CreateDataStoreRequest.newBuilder()
              .setParent(CollectionName.of("[PROJECT]", "[LOCATION]", "[COLLECTION]").toString())
              .setDataStore(DataStore.newBuilder().build())
              .setDataStoreId("dataStoreId929489618")
              .setCreateAdvancedSiteSearch(true)
              .setSkipDefaultSchemaCreation(true)
              .build();
      DataStore response = dataStoreServiceClient.createDataStoreAsync(request).get();
    }
  }
}

Mengimpor dokumen

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .setForceRefreshContent(true)
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Node.js Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Resource name of the CmekConfig to use for protecting this DataStore.
 */
// const cmekConfigName = 'abc123'
/**
 *  DataStore without CMEK protections. If a default CmekConfig is set for
 *  the project, setting this field will override the default CmekConfig as
 *  well.
 */
// const disableCmek = true
/**
 *  Required. The parent resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}`.
 */
// const parent = 'abc123'
/**
 *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to
 *  create.
 */
// const dataStore = {}
/**
 *  Required. The ID to use for the
 *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become
 *  the final component of the
 *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name.
 *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  standard with a length limit of 63 characters. Otherwise, an
 *  INVALID_ARGUMENT error is returned.
 */
// const dataStoreId = 'abc123'
/**
 *  A boolean flag indicating whether user want to directly create an advanced
 *  data store for site search.
 *  If the data store is not configured as site
 *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will
 *  be ignored.
 */
// const createAdvancedSiteSearch = true
/**
 *  A boolean flag indicating whether to skip the default schema creation for
 *  the data store. Only enable this flag if you are certain that the default
 *  schema is incompatible with your use case.
 *  If set to true, you must manually create a schema for the data store before
 *  any documents can be ingested.
 *  This flag cannot be specified if `data_store.starting_schema` is specified.
 */
// const skipDefaultSchemaCreation = true

// Imports the Discoveryengine library
const {DataStoreServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DataStoreServiceClient();

async function callCreateDataStore() {
  // Construct request
  const request = {
    parent,
    dataStore,
    dataStoreId,
  };

  // Run request
  const [operation] = await discoveryengineClient.createDataStore(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataStore();

Mengimpor dokumen

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'
/**
 *  Optional. Whether to force refresh the unstructured content of the
 *  documents.
 *  If set to `true`, the content part of the documents will be refreshed
 *  regardless of the update status of the referencing content.
 */
// const forceRefreshContent = true

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Python Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

Mengimpor dokumen

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"

# Examples:
# - Unstructured documents
#   - `gs://bucket/directory/file.pdf`
#   - `gs://bucket/directory/*.pdf`
# - Unstructured documents with JSONL Metadata
#   - `gs://bucket/directory/file.json`
# - Unstructured documents with CSV Metadata
#   - `gs://bucket/directory/file.csv`
# gcs_uri = "YOUR_GCS_PATH"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    gcs_source=discoveryengine.GcsSource(
        # Multiple URIs are supported
        input_uris=[gcs_uri],
        # Options:
        # - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)
        # - `custom` - Unstructured documents with custom JSONL metadata
        # - `document` - Structured documents in the discoveryengine.Document format.
        # - `csv` - Unstructured documents with CSV metadata
        data_schema="content",
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Ruby

Untuk mengetahui informasi selengkapnya, lihat dokumentasi referensi API Ruby Vertex AI Search.

Untuk melakukan autentikasi ke Vertex AI Search, siapkan Kredensial Default Aplikasi. Untuk mengetahui informasi selengkapnya, lihat Menyiapkan autentikasi untuk lingkungan pengembangan lokal.

Membuat penyimpanan data

require "google/cloud/discovery_engine/v1"

##
# Snippet for the create_data_store call in the DataStoreService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.
#
def create_data_store
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new

  # Call the create_data_store method.
  result = client.create_data_store request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

Mengimpor dokumen

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

Menghubungkan ke Cloud Storage dengan sinkronisasi berkala

Sebelum mengimpor data, tinjau artikel Menyiapkan data untuk penyerapan.

Prosedur berikut menjelaskan cara membuat konektor data yang mengaitkan lokasi Cloud Storage dengan konektor data Vertex AI Search dan cara menentukan folder atau file di lokasi tersebut untuk penyimpanan data yang ingin Anda buat. Penyimpanan data yang merupakan turunan dari konektor data disebut penyimpanan data entitas.

Data disinkronkan secara berkala ke penyimpanan data entitas. Anda dapat menentukan sinkronisasi harian, setiap tiga hari, atau setiap lima hari.

Konsol

Di konsol Google Cloud , buka halaman AI Applications.

Aplikasi AI
Buka halaman Data Stores.
Klik Create data store.
Di halaman Source, pilih Cloud Storage.
Pilih jenis data yang Anda impor.
Klik Berkala.
Pilih Frekuensi sinkronisasi, seberapa sering Anda ingin konektor Vertex AI Search disinkronkan dengan lokasi Cloud Storage. Anda dapat mengubah frekuensi nanti.
Di bagian Pilih folder atau file yang ingin Anda impor, pilih Folder atau File.
Klik Jelajahi, lalu pilih data yang telah Anda siapkan untuk penyerapan, lalu klik Pilih. Atau, masukkan lokasi langsung di kolom gs://.
Klik Lanjutkan.
Pilih region untuk konektor data Anda.
Masukkan nama untuk konektor data Anda.
Opsional: Jika Anda memilih dokumen tidak terstruktur, Anda dapat memilih opsi penguraian dan pengelompokan untuk dokumen Anda. Untuk membandingkan parser, lihat Mem-parsing dokumen. Untuk mengetahui informasi tentang chunking, lihat Mengelompokkan dokumen untuk RAG.

Parser OCR dan parser tata letak dapat menimbulkan biaya tambahan. Lihat Harga fitur Document AI.

Untuk memilih parser, luaskan Opsi pemrosesan dokumen dan tentukan opsi parser yang ingin Anda gunakan.
Klik Buat.

Anda kini telah membuat konektor data, yang akan menyinkronkan data secara berkala dengan lokasi Cloud Storage. Anda juga telah membuat penyimpanan data entitas, yang diberi nama gcs_store.
Untuk memeriksa status penyerapan, buka halaman Data Stores dan klik nama konektor data Anda untuk melihat detailnya di halaman Data

Tab Aktivitas penyerapan data. Saat kolom status pada tab Aktivitas penyerapan data berubah dari Sedang berlangsung menjadi berhasil, penyerapan pertama selesai.

Bergantung pada ukuran data Anda, penyerapan data dapat memerlukan waktu beberapa menit hingga beberapa jam.

Setelah Anda menyiapkan sumber data dan mengimpor data untuk pertama kalinya, data akan disinkronkan dari sumber tersebut dengan frekuensi yang Anda pilih selama penyiapan. Sekitar satu jam setelah konektor data dibuat, sinkronisasi pertama akan terjadi. Sinkronisasi berikutnya akan terjadi sekitar 24 jam, 72 jam, atau 120 jam kemudian.

Langkah berikutnya

Untuk melampirkan penyimpanan data ke aplikasi, buat aplikasi dan pilih penyimpanan data Anda dengan mengikuti langkah-langkah di Membuat aplikasi rekomendasi kustom.
Untuk melihat pratinjau atau mendapatkan rekomendasi setelah aplikasi dan penyimpanan data Anda disiapkan, lihat Mendapatkan rekomendasi.

Mengupload data JSON terstruktur dengan API

Untuk mengupload dokumen atau objek JSON secara langsung menggunakan API, ikuti langkah-langkah berikut.

Sebelum mengimpor data Anda, Siapkan data untuk penyerapan.

REST

Untuk menggunakan command line guna membuat penyimpanan data dan mengimpor data JSON terstruktur, ikuti langkah-langkah berikut:

Buat penyimpanan data.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_RECOMMENDATION"]
}'
```
Ganti kode berikut:
- PROJECT_ID: ID Google Cloud project Anda.
- DATA_STORE_ID: ID penyimpanan data rekomendasi yang ingin Anda buat. ID ini hanya boleh berisi huruf kecil, angka, garis bawah, dan tanda hubung.
- DATA_STORE_DISPLAY_NAME: nama tampilan penyimpanan data rekomendasi yang ingin Anda buat.
Catatan: <ahref=" docs="" generative-ai-app-builder="" industryvertical"="" reference="" rest="" v1="">Vertical industri GENERIC digunakan untuk membuat penyimpanan data terstruktur, tidak terstruktur, dan situs untuk aplikasi rekomendasi kustom. </ahref=">

Opsional: Berikan skema Anda sendiri. Jika Anda memberikan skema, biasanya Anda akan mendapatkan hasil yang lebih baik. Untuk mengetahui informasi selengkapnya, lihat Menyediakan atau mendeteksi skema secara otomatis.

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/schemas/default_schema" \
-d '{
  "structSchema": JSON_SCHEMA_OBJECT
}'

Ganti kode berikut:

PROJECT_ID: ID Google Cloud project Anda.
DATA_STORE_ID: ID penyimpanan data rekomendasi.

JSON_SCHEMA_OBJECT: skema JSON Anda sebagai objek JSON—misalnya:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "keyPropertyMapping": "title"
    },
    "categories": {
      "type": "array",
      "items": {
        "type": "string",
        "keyPropertyMapping": "category"
      }
    },
    "uri": {
      "type": "string",
      "keyPropertyMapping": "uri"
    }
  }
}

Impor data terstruktur yang sesuai dengan skema yang ditentukan.

Ada beberapa pendekatan yang dapat Anda gunakan untuk mengupload data, termasuk:

Upload dokumen JSON.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
-d '{
  "jsonData": "JSON_DOCUMENT_STRING"
}'

Ganti JSON_DOCUMENT_STRING dengan dokumen JSON sebagai satu string. Hal ini harus sesuai dengan skema JSON yang Anda berikan di langkah sebelumnya—misalnya:

```none
{ \"title\": \"test title\", \"categories\": [\"cat_1\", \"cat_2\"], \"uri\": \"test uri\"}
```

Upload objek JSON.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
-d '{
  "structData": JSON_DOCUMENT_OBJECT
}'

Ganti JSON_DOCUMENT_OBJECT dengan dokumen JSON sebagai objek JSON. Nilai ini harus sesuai dengan skema JSON yang Anda berikan di langkah sebelumnya—misalnya:

```json
{
  "title": "test title",
  "categories": [
    "cat_1",
    "cat_2"
  ],
  "uri": "test uri"
}
```

Perbarui dengan dokumen JSON.

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
-d '{
  "jsonData": "JSON_DOCUMENT_STRING"
}'

Perbarui dengan objek JSON.

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
-d '{
  "structData": JSON_DOCUMENT_OBJECT
}'

Langkah berikutnya

Untuk melampirkan penyimpanan data ke aplikasi, buat aplikasi dan pilih penyimpanan data Anda dengan mengikuti langkah-langkah di Membuat aplikasi rekomendasi kustom.
Untuk melihat pratinjau tampilan rekomendasi setelah aplikasi dan penyimpanan data Anda disiapkan, lihat Mendapatkan rekomendasi.

Membuat penyimpanan data menggunakan Terraform

Anda dapat menggunakan Terraform untuk membuat penyimpanan data kosong. Setelah penyimpanan data kosong dibuat, Anda dapat memasukkan data ke dalam penyimpanan data menggunakan perintah Google Cloud konsol atau API.

Untuk mempelajari cara menerapkan atau menghapus konfigurasi Terraform, lihat Perintah dasar Terraform.

Untuk membuat penyimpanan data kosong menggunakan Terraform, lihat google_discovery_engine_data_store.