이 페이지는 Cloud Translation API를 통해 번역되었습니다.

검색 데이터 스토어 만들기

데이터 스토어를 만들고 검색을 위해 데이터를 수집하려면 사용할 소스의 섹션으로 이동하세요.

웹사이트 콘텐츠를 사용하여 데이터 스토어 만들기
BigQuery에서 가져오기
Cloud Storage에서 가져오기
Google Drive에서 동기화
Gmail에서 동기화 (공개 미리보기)
Google Sites에서 동기화 (공개 미리보기)
Google Calendar에서 동기화 (공개 미리보기)
Google 그룹에서 동기화 (공개 미리보기)
Cloud SQL에서 가져오기
Spanner에서 가져오기 (공개 미리보기)
Firestore에서 가져오기
Bigtable에서 가져오기 (공개 미리보기)
PostgreSQL용 AlloyDB에서 가져오기 (공개 미리보기)
API를 사용하여 구조화된 JSON 데이터 업로드
Terraform을 사용하여 데이터 스토어 만들기

대신 서드 파티 데이터 소스에서 데이터를 동기화하려면 서드 파티 데이터 소스 연결을 참조하세요.

웹사이트 콘텐츠를 사용하여 데이터 스토어 만들기

다음 절차에 따라 데이터 스토어를 만들고 웹사이트 색인을 생성하세요.

웹사이트 데이터 스토어를 만든 후 사용하려면 Enterprise 기능이 사용 설정된 앱에 연결해야 합니다. 앱을 만들 때 앱에 대해 Enterprise 버전을 사용 설정할 수 있습니다. 추가 비용이 발생합니다. 검색 앱 만들기 및 고급 기능 정보를 참고하세요.

콘솔

Google Cloud 콘솔을 사용하여 데이터 스토어를 만들고 웹사이트 색인을 생성하려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
탐색 메뉴에서 데이터 스토어를 클릭합니다.
데이터 저장소 만들기를 클릭합니다.
소스 페이지에서 웹사이트 콘텐츠를 선택합니다.
이 데이터 스토어에 대해 고급 웹 사이트 색인 생성을 사용 설정할지 여부를 선택합니다. 이 옵션은 나중에 사용 설정하거나 사용 중지할 수 없습니다.

고급 웹사이트 색인 생성은 검색 요약, 질문으로 검색, 추출 답변과 같은 추가 기능을 제공합니다. 고급 웹사이트 색인 생성에는 추가 비용이 발생하며 색인을 생성하는 웹사이트에 대한 도메인 소유권을 확인해야 합니다. 자세한 내용은 고급 웹사이트 색인 생성 및 가격 책정을 참조하세요.
포함할 사이트 필드에 데이터 스토어에 포함할 웹사이트와 일치하는 URL 패턴을 입력합니다. 쉼표 구분자 없이 한 줄에 URL 패턴 하나씩 포함합니다. 예를 들면 example.com/docs/*입니다.
선택사항: 제외할 사이트 필드에 데이터 스토어에서 제외하려는 URL 패턴을 입력합니다.

포함된 사이트보다 제외된 사이트가 우선 적용됩니다. 따라서 example.com/docs/*를 포함하고 example.com를 제외하면 웹사이트가 색인 생성되지 않습니다. 자세한 내용은 웹사이트 데이터를 참고하세요.
계속을 클릭합니다.
데이터 스토어의 위치를 선택합니다.
- 기본 웹사이트 검색 데이터 스토어를 만들면 항상 전역 (글로벌)로 설정됩니다.
- 고급 웹사이트 색인 생성을 사용하여 데이터 스토어를 만들 때 위치를 선택할 수 있습니다. 색인이 생성되는 웹사이트는 공개되어야 하므로 Google에서는 위치로 전역 (전 세계)을 선택할 것을 적극 권장합니다. 이렇게 하면 모든 검색 및 답변 서비스의 가용성이 극대화되고 지역 데이터 스토어의 제한이 사라집니다.
데이터 스토어 이름을 입력합니다.
만들기를 클릭합니다. Vertex AI Search가 데이터 스토어를 만들고 데이터 스토어 페이지에 데이터 스토어를 표시합니다.
데이터 스토어에 대한 자세한 내용을 보려면 이름 열에서 데이터 스토어의 이름을 클릭합니다. 데이터 스토어 페이지가 표시됩니다.
- 고급 웹사이트 색인 생성을 사용 설정했으면 데이터 스토어의 도메인을 확인하라는 경고가 표시됩니다.
- 지정한 웹사이트의 페이지 수가 프로젝트에 대한 '프로젝트별 문서 수' 할당량을 초과하여 할당량이 부족한 경우 할당량을 업그레이드하라는 추가 경고가 표시됩니다.
데이터 스토어의 URL 패턴에 대한 도메인을 확인하려면 웹사이트 도메인 확인 페이지의 안내를 따르세요.
할당량을 업그레이드하려면 다음 단계를 수행합니다.
1. 할당량 업그레이드를 클릭합니다. Google Cloud 콘솔의 IAM 및 관리자 페이지가 표시됩니다.
2. Google Cloud 문서의 할당량 한도 상향 요청에 나온 안내를 따릅니다. 상향 요청할 할당량은 Discovery Engine API 서비스의 문서 수입니다.
3. 할당량 한도 상향 요청을 제출했으면 AI 애플리케이션 페이지로 돌아가고 탐색 메뉴에서 데이터 스토어를 클릭합니다.
4. 이름 열에서 데이터 스토어의 이름을 클릭합니다. 상태 열에 할당량을 초과한 웹사이트의 색인 생성을 진행 중이라고 표시됩니다. URL의 상태 열에 색인 생성됨이 표시되면 해당 URL 또는 URL 패턴에 대해 고급 웹사이트 색인 생성 기능을 사용할 수 있습니다.
자세한 내용은 '할당량 및 한도' 페이지의 웹페이지 색인 생성 할당량을 참고하세요.

Python

자세한 내용은 AI Applications Python API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

웹사이트 가져오기

#     from google.api_core.client_options import ClientOptions
#
#     from google.cloud import discoveryengine_v1 as discoveryengine
#
#     # TODO(developer): Uncomment these variables before running the sample.
#     # project_id = "YOUR_PROJECT_ID"
#     # location = "YOUR_LOCATION" # Values: "global"
#     # data_store_id = "YOUR_DATA_STORE_ID"
#     # NOTE: Do not include http or https protocol in the URI pattern
#     # uri_pattern = "cloud.google.com/generative-ai-app-builder/docs/*"
#
#     #  For more information, refer to:
#     # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
#     client_options = (
#         ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
#         if location != "global"
#         else None
#     )
#
#     # Create a client
#     client = discoveryengine.SiteSearchEngineServiceClient(
#         client_options=client_options
#     )
#
#     # The full resource name of the data store
#     # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}
#     site_search_engine = client.site_search_engine_path(
#         project=project_id, location=location, data_store=data_store_id
#     )
#
#     # Target Site to index
#     target_site = discoveryengine.TargetSite(
#         provided_uri_pattern=uri_pattern,
#         # Options: INCLUDE, EXCLUDE
#         type_=discoveryengine.TargetSite.Type.INCLUDE,
#         exact_match=False,
#     )
#
#     # Make the request
#     operation = client.create_target_site(
#         parent=site_search_engine,
#         target_site=target_site,
#     )
#
#     print(f"Waiting for operation to complete: {operation.operation.name}")
#     response = operation.result()
#
#     # After the operation is complete,
#     # get information from operation metadata
#     metadata = discoveryengine.CreateTargetSiteMetadata(operation.metadata)
#
#     # Handle the response
#     print(response)
#     print(metadata)

다음 단계

웹사이트 데이터 스토어를 앱에 연결하려면 Enterprise 기능이 사용 설정된 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
고급 웹사이트 색인 생성을 사용 설정한 경우 구조화된 데이터를 사용하여 스키마를 업데이트할 수 있습니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

BigQuery에서 가져오기

Vertex AI Search는 BigQuery 데이터 전반에서 검색을 지원합니다.

BigQuery 테이블에서 데이터 스토어를 만드는 방법은 두 가지입니다.

일회성 수집: BigQuery 테이블에서 데이터 스토어로 데이터를 가져옵니다. 수동으로 데이터를 새로고침하지 않는 한 데이터 스토어의 데이터가 변경되지 않습니다.
주기적 수집: 하나 이상의 BigQuery 테이블에서 데이터를 가져오고 데이터 스토어가 BigQuery 데이터 세트의 최신 데이터로 업데이트되는 빈도를 결정하는 동기화 빈도를 설정합니다.

다음 표에서는 BigQuery 데이터를 Vertex AI Search 데이터 스토어로 가져올 수 있는 두 가지 방법을 비교합니다.

일회성 수집	주기적 수집
정식 버전 (GA)	공개 미리보기
데이터를 수동으로 새로고침해야 합니다.	데이터가 1일, 3일 또는 5일마다 자동으로 업데이트됩니다. 데이터를 수동으로 새로고침할 수 없습니다.
Vertex AI Search가 BigQuery의 한 테이블에서 단일 데이터 스토어를 만듭니다.	Vertex AI Search가 BigQuery 데이터 세트의 데이터 커넥터와 지정된 각 테이블의 데이터 스토어(항목 데이터 스토어라고 함)를 만듭니다. 각 데이터 커넥터는 테이블의 데이터 유형(예: 구조화된 데이터)이 동일해야 하며 동일한 BigQuery 데이터 세트에 있어야 합니다.
먼저 한 테이블에서 데이터를 수집한 다음 다른 소스 또는 BigQuery 테이블에서 데이터를 추가로 수집하여 여러 테이블의 데이터를 하나의 데이터 스토어에 결합할 수 있습니다.	수동 데이터 가져오기는 지원되지 않으므로 항목 데이터 스토어의 데이터를 한 BigQuery 테이블에서만 가져올 수 있습니다.
데이터 소스 액세스 제어가 지원됩니다.	데이터 소스 액세스 제어는 지원되지 않습니다. 가져온 데이터에 액세스 제어가 포함될 수 있지만 이러한 제어가 적용되지 않습니다.
Google Cloud 콘솔 또는 API를 사용하여 데이터 스토어를 만들 수 있습니다.	데이터 커넥터와 해당 항목 데이터 스토어를 만들려면 콘솔을 사용해야 합니다.
CMEK 준수	CMEK 준수

BigQuery에서 한 번 가져오기

BigQuery 테이블에서 데이터를 수집하려면 다음 단계를 따라 데이터 스토어를 만들고 Google Cloud 콘솔 또는 API를 사용하여 데이터를 수집합니다.

데이터를 가져오기 전에 수집할 데이터 준비를 검토하세요.

콘솔

Google Cloud 콘솔을 사용하여 BigQuery에서 데이터를 수집하려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
데이터 스토어 페이지로 이동합니다.
데이터 스토어 만들기를 클릭합니다.
소스 페이지에서 BigQuery를 선택합니다.
어떤 종류의 데이터를 가져오시겠어요? 섹션에서 가져올 데이터 유형을 선택합니다.
동기화 빈도 섹션에서 일회성을 선택합니다.
BigQuery 경로 필드에서 찾아보기를 클릭하고 수집을 위해 준비한 테이블을 선택한 다음 선택을 클릭합니다. 또는 BigQuery 경로 필드에 테이블 위치를 직접 입력해도 됩니다.
계속을 클릭합니다.
구조화된 데이터를 일회성으로 가져오는 경우:
1. 필드를 키 속성에 매핑합니다.
2. 스키마에서 중요한 필드가 누락된 경우 새 필드 추가를 사용하여 필드를 추가합니다.
  
  자세한 내용은 자동 감지 및 수정 정보를 참조하세요.
3. 계속을 클릭합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
만들기를 클릭합니다.
수집 상태를 확인하려면 데이터 스토어 페이지로 이동하여 데이터 스토어 이름을 클릭한 후 데이터 페이지에서 세부정보를 확인합니다. 활동 탭의 상태 열이 진행 중에서 가져오기 완료됨으로 변경되면 수집이 완료된 것입니다.

데이터 크기에 따라 수집에 몇 분부터 몇 시간까지 걸릴 수 있습니다.

REST

명령줄을 사용하여 데이터 스토어를 만들고 BigQuery에서 데이터를 가져오려면 다음 단계를 따르세요.

데이터 스토어를 만듭니다.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
}'
```
참고: 업종 카테고리 GENERIC는 맞춤 검색 앱을 위한 구조화된 데이터, 비정형 데이터, 웹사이트 데이터 스토어를 만드는 데 사용됩니다.

다음을 바꿉니다.
- PROJECT_ID: Google Cloud 프로젝트의 ID입니다.
- DATA_STORE_ID: 만들려는 Vertex AI Search 데이터 스토어의 ID입니다. 이 ID는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- DATA_STORE_DISPLAY_NAME: 만들려는 Vertex AI Search 데이터 스토어의 표시 이름입니다.
선택사항: 비정형 데이터를 업로드하고 문서 파싱을 구성하거나 RAG를 위해 문서 청크 처리를 사용 설정하려면 documentProcessingConfig 객체를 지정하고 데이터 스토어 생성 요청에 포함합니다. 스캔된 PDF를 수집하는 경우 PDF용 OCR 파서를 구성하는 것이 좋습니다. 파싱 또는 청크 처리 옵션을 구성하는 방법은 문서 파싱 및 청크 처리를 참고하세요.
BigQuery에서 데이터를 가져옵니다.

스키마를 정의한 경우 데이터가 해당 스키마를 준수하는지 확인합니다.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
-d '{
  "bigquerySource": {
    "projectId": "PROJECT_ID",
    "datasetId":"DATASET_ID",
    "tableId": "TABLE_ID",
    "dataSchema": "DATA_SCHEMA",
    "aclEnabled": "BOOLEAN"
  },
  "reconciliationMode": "RECONCILIATION_MODE",
  "autoGenerateIds": "AUTO_GENERATE_IDS",
  "idField": "ID_FIELD",
  "errorConfig": {
    "gcsPrefix": "ERROR_DIRECTORY"
  }
}'
```
다음을 바꿉니다.
- PROJECT_ID: Google Cloud 프로젝트의 ID입니다.
- DATA_STORE_ID: Vertex AI Search 데이터 스토어의 ID입니다.
- DATASET_ID: BigQuery 데이터 세트의 ID입니다.
- TABLE_ID: BigQuery 테이블의 ID입니다.
  - BigQuery 테이블이 PROJECT_ID에 없으면 서비스 계정 service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com에 BigQuery 테이블에 대한 "BigQuery 데이터 뷰어" 권한을 부여해야 합니다. 예를 들어 '123' 소스 프로젝트에서 '456' 대상 프로젝트로 BigQuery 테이블을 가져오는 경우 '123' 프로젝트의 BigQuery 테이블에 대한 service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com 권한을 부여합니다.
- DATA_SCHEMA: 선택사항입니다. 값은 document 및 custom입니다. 기본값은 document입니다.
  - document: 사용하는 BigQuery 테이블이 수집을 위한 데이터 준비에 제공된 기본 BigQuery 스키마를 준수해야 합니다. 각 문서의 ID를 정의할 수 있으며, 모든 데이터가 jsonData 문자열에 포함되어야 합니다.
  - custom: 모든 BigQuery 테이블 스키마가 허용되며 Vertex AI Search가 가져온 각 문서의 ID를 자동으로 생성합니다.
- ERROR_DIRECTORY: 선택사항입니다. 가져오기에 대한 오류 정보를 볼 수 있는 Cloud Storage 디렉터리입니다. 예를 들면 gs://<your-gcs-bucket>/directory/import_errors입니다. Vertex AI Search가 임시 디렉터리를 자동으로 만들도록 하려면 이 필드를 비워 두는 것이 좋습니다.
- RECONCILIATION_MODE: 선택사항입니다. 값은 FULL 및 INCREMENTAL입니다. 기본값은 INCREMENTAL입니다. INCREMENTAL을 지정하면 BigQuery의 데이터가 데이터 스토어에 점진적으로 새로고침됩니다. 이 경우 새 문서를 추가하고 기존 문서를 동일한 ID의 업데이트된 문서로 대체하는 삽입/업데이트(upsert) 작업이 실행됩니다. FULL을 지정하면 데이터 스토어에서 문서의 전체 재배치가 이루어집니다. 즉, 새 문서와 업데이트된 문서는 데이터 스토어에 추가되고 BigQuery에 없는 문서는 데이터 스토어에서 삭제됩니다. FULL 모드는 더 이상 필요하지 않은 문서를 자동으로 삭제하려는 경우에 유용합니다.
- AUTO_GENERATE_IDS: 선택사항입니다. 문서 ID를 자동으로 생성할지 지정합니다. true로 설정하면 페이로드의 해시에 따라 문서 ID가 생성됩니다. 생성된 문서 ID는 여러 가져오기에서 일관되지 않을 수 있습니다. 여러 가져오기에서 ID를 자동으로 생성하는 경우 문서 ID의 일관성을 유지하기 위해 reconciliationMode를 FULL로 설정하는 것이 좋습니다.
  
  bigquerySource.dataSchema가 custom으로 설정된 경우에만 autoGenerateIds를 지정합니다. 그렇지 않으면 INVALID_ARGUMENT 오류가 반환됩니다. autoGenerateIds를 지정하지 않거나 false로 설정한 경우 idField를 지정해야 합니다. 그렇지 않으면 문서를 가져오지 못합니다.
- ID_FIELD: 선택사항입니다. 문서 ID인 필드를 지정합니다. BigQuery 소스 파일의 경우 idField는 문서 ID가 포함된 BigQuery 테이블의 열 이름을 나타냅니다.
  
  (1) bigquerySource.dataSchema가 custom으로 설정되고 (2) auto_generate_ids가 false로 설정되었거나 지정되지 않은 경우에만 idField를 지정합니다. 그렇지 않으면 INVALID_ARGUMENT 오류가 반환됩니다.
  
  BigQuery 열 이름의 값은 문자열 유형이고 1~63자(영문 기준)여야 하며 RFC-1034를 준수해야 합니다. 그렇지 않으면 문서를 가져오지 못합니다.

C#

자세한 내용은 AI Applications C# API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataStoreServiceClientSnippets
{
    /// <summary>Snippet for CreateDataStore</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataStoreRequestObject()
    {
        // Create client
        DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.Create();
        // Initialize request argument(s)
        CreateDataStoreRequest request = new CreateDataStoreRequest
        {
            ParentAsCollectionName = CollectionName.FromProjectLocationCollection("[PROJECT]", "[LOCATION]", "[COLLECTION]"),
            DataStore = new DataStore(),
            DataStoreId = "",
            CreateAdvancedSiteSearch = false,
            CmekConfigNameAsCmekConfigName = CmekConfigName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            SkipDefaultSchemaCreation = false,
        };
        // Make the request
        Operation<DataStore, CreateDataStoreMetadata> response = dataStoreServiceClient.CreateDataStore(request);

        // Poll until the returned long-running operation is complete
        Operation<DataStore, CreateDataStoreMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataStore result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataStore, CreateDataStoreMetadata> retrievedResponse = dataStoreServiceClient.PollOnceCreateDataStore(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataStore retrievedResult = retrievedResponse.Result;
        }
    }
}

문서 가져오기

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
            ForceRefreshContent = false,
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

자세한 내용은 AI Applications Go API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDataStoreClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.CreateDataStoreRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.
	}
	op, err := c.CreateDataStore(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

문서 가져오기


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

자세한 내용은 AI Applications Java API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기

import com.google.cloud.discoveryengine.v1.CollectionName;
import com.google.cloud.discoveryengine.v1.CreateDataStoreRequest;
import com.google.cloud.discoveryengine.v1.DataStore;
import com.google.cloud.discoveryengine.v1.DataStoreServiceClient;

public class SyncCreateDataStore {

  public static void main(String[] args) throws Exception {
    syncCreateDataStore();
  }

  public static void syncCreateDataStore() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.create()) {
      CreateDataStoreRequest request =
          CreateDataStoreRequest.newBuilder()
              .setParent(CollectionName.of("[PROJECT]", "[LOCATION]", "[COLLECTION]").toString())
              .setDataStore(DataStore.newBuilder().build())
              .setDataStoreId("dataStoreId929489618")
              .setCreateAdvancedSiteSearch(true)
              .setSkipDefaultSchemaCreation(true)
              .build();
      DataStore response = dataStoreServiceClient.createDataStoreAsync(request).get();
    }
  }
}

문서 가져오기

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .setForceRefreshContent(true)
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

자세한 내용은 AI Applications Node.js API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Resource name of the CmekConfig to use for protecting this DataStore.
 */
// const cmekConfigName = 'abc123'
/**
 *  DataStore without CMEK protections. If a default CmekConfig is set for
 *  the project, setting this field will override the default CmekConfig as
 *  well.
 */
// const disableCmek = true
/**
 *  Required. The parent resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}`.
 */
// const parent = 'abc123'
/**
 *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to
 *  create.
 */
// const dataStore = {}
/**
 *  Required. The ID to use for the
 *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become
 *  the final component of the
 *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name.
 *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  standard with a length limit of 63 characters. Otherwise, an
 *  INVALID_ARGUMENT error is returned.
 */
// const dataStoreId = 'abc123'
/**
 *  A boolean flag indicating whether user want to directly create an advanced
 *  data store for site search.
 *  If the data store is not configured as site
 *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will
 *  be ignored.
 */
// const createAdvancedSiteSearch = true
/**
 *  A boolean flag indicating whether to skip the default schema creation for
 *  the data store. Only enable this flag if you are certain that the default
 *  schema is incompatible with your use case.
 *  If set to true, you must manually create a schema for the data store before
 *  any documents can be ingested.
 *  This flag cannot be specified if `data_store.starting_schema` is specified.
 */
// const skipDefaultSchemaCreation = true

// Imports the Discoveryengine library
const {DataStoreServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DataStoreServiceClient();

async function callCreateDataStore() {
  // Construct request
  const request = {
    parent,
    dataStore,
    dataStoreId,
  };

  // Run request
  const [operation] = await discoveryengineClient.createDataStore(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataStore();

문서 가져오기

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'
/**
 *  Optional. Whether to force refresh the unstructured content of the
 *  documents.
 *  If set to `true`, the content part of the documents will be refreshed
 *  regardless of the update status of the referencing content.
 */
// const forceRefreshContent = true

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

자세한 내용은 AI Applications Python API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

문서 가져오기


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# bigquery_dataset = "YOUR_BIGQUERY_DATASET"
# bigquery_table = "YOUR_BIGQUERY_TABLE"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    bigquery_source=discoveryengine.BigQuerySource(
        project_id=project_id,
        dataset_id=bigquery_dataset,
        table_id=bigquery_table,
        data_schema="custom",
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Ruby

자세한 내용은 AI Applications Ruby API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기

require "google/cloud/discovery_engine/v1"

##
# Snippet for the create_data_store call in the DataStoreService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.
#
def create_data_store
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new

  # Call the create_data_store method.
  result = client.create_data_store request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

문서 가져오기

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

주기적 동기화를 사용하여 BigQuery에 연결

데이터를 가져오기 전에 수집할 데이터 준비를 검토하세요.

다음 절차에서는 BigQuery 데이터 세트를 Vertex AI Search 데이터 커넥터와 연결하는 데이터 커넥터를 만드는 방법과 만들려는 각 데이터 스토어의 데이터 세트에 테이블을 지정하는 방법을 설명합니다. 데이터 커넥터의 하위 데이터 스토어를 항목 데이터 스토어라고 합니다.

데이터 세트의 데이터는 주기적으로 항목 데이터 스토어에 동기화됩니다. 매일, 3일마다 또는 5일마다 동기화를 지정할 수 있습니다.

콘솔

Google Cloud 콘솔을 사용하여 BigQuery 데이터 세트의 데이터를 Vertex AI Search에 주기적으로 동기화하는 커넥터를 만들려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
탐색 메뉴에서 데이터 스토어를 클릭합니다.
데이터 저장소 만들기를 클릭합니다.
소스 페이지에서 BigQuery를 선택합니다.
가져올 데이터의 종류를 선택합니다.
주기적을 클릭합니다.
동기화 빈도를 선택합니다. Vertex AI Search 커넥터가 BigQuery 데이터 세트와 동기화되는 빈도입니다. 빈도는 나중에 변경할 수 있습니다.
BigQuery 데이터 세트 경로 필드에서 찾아보기를 클릭하고 수집을 위해 준비한 테이블이 포함된 데이터 세트를 선택합니다. 또는 BigQuery 경로 필드에 테이블 위치를 직접 입력해도 됩니다. 경로의 형식은 projectname.datasetname입니다.
동기화할 테이블 필드에서 찾아보기를 클릭한 다음 데이터 스토어에 저장할 데이터가 포함된 테이블을 선택합니다.
참고:
테이블의 데이터가 5단계에서 선택한 데이터 종류와 일치하는지 확인합니다.
불일치가 있는 경우 다음 중 하나가 발생할 때까지 알 수 없습니다.
- 커넥터가 데이터를 가져오려고 하면 오류가 발생합니다.
- 예상치 못한 결과가 표시됩니다. 선택한 유형이 구조화된 데이터인데 구조화되지 않았거나 메타데이터로 구조화되었어야 하는 경우 이 오류가 발생합니다. 데이터를 가져오지만 콘텐츠 URL 또는 메타데이터가 인식되지 않고 문자열로 취급됩니다.
데이터 스토어에 사용할 추가 테이블이 데이터 세트에 있는 경우 테이블 추가를 클릭하고 해당 테이블도 지정합니다.
계속을 클릭합니다.
데이터 스토어의 리전을 선택하고 데이터 커넥터의 이름을 입력한 후 만들기를 클릭합니다.

데이터를 BigQuery 데이터 세트와 주기적으로 동기화하는 데이터 커넥터를 만들었습니다. 하나 이상의 항목 데이터 스토어도 만들었습니다. 데이터 스토어의 이름은 BigQuery 테이블과 동일합니다.
수집 상태를 확인하려면 데이터 스토어 페이지로 이동하여 데이터 커넥터 이름을 클릭한 후 데이터 페이지 > 데이터 수집 활동 탭에서 세부정보를 확인합니다. 활동 탭의 상태 열이 진행 중에서 성공으로 변경되면 첫 번째 수집이 완료된 것입니다.

데이터 크기에 따라 수집에 몇 분부터 몇 시간까지 걸릴 수 있습니다.

데이터 소스를 설정하고 데이터를 처음 가져온 후에는 데이터 스토어가 설정 중에 선택한 빈도로 해당 소스의 데이터를 동기화합니다. 데이터 커넥터가 생성된 후 약 1시간 후에 첫 번째 동기화가 이루어집니다. 이후 다음 동기화가 약 24시간, 72시간 또는 120시간 후에 발생합니다.

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

Cloud Storage에서 가져오기

다음 두 가지 방법으로 Cloud Storage 테이블에서 데이터 스토어를 만들 수 있습니다.

일회성 수집: Cloud Storage 폴더 또는 파일에서 데이터 스토어로 데이터를 가져옵니다. 수동으로 데이터를 새로고침하지 않는 한 데이터 스토어의 데이터가 변경되지 않습니다.
주기적 수집: Cloud Storage 폴더 또는 파일에서 데이터를 가져오고 데이터 스토어가 해당 Cloud Storage 위치의 최신 데이터로 업데이트되는 빈도를 결정하는 동기화 빈도를 설정합니다.

다음 표에서는 Cloud Storage 데이터를 Vertex AI Search 데이터 스토어로 가져올 수 있는 두 가지 방법을 비교합니다.

일회성 수집	주기적 수집
정식 버전 (GA)	공개 미리보기
데이터를 수동으로 새로고침해야 합니다.	데이터가 1일, 3일 또는 5일마다 자동으로 업데이트됩니다. 데이터를 수동으로 새로고침할 수 없습니다.
Vertex AI Search가 Cloud Storage의 한 폴더 또는 파일에서 단일 데이터 스토어를 만듭니다.	Vertex AI Search가 데이터 커넥터를 만들고 지정된 파일 또는 폴더의 데이터 스토어(항목 데이터 스토어라고 함)를 연결합니다. 각 Cloud Storage 데이터 커넥터는 단일 항목 데이터 스토어를 가질 수 있습니다.
먼저 한 Cloud Storage 위치에서 데이터를 수집한 다음 다른 위치에서 더 많은 데이터를 수집하여 여러 파일, 폴더, 버킷의 데이터를 하나의 데이터 스토어에 결합할 수 있습니다.	수동 데이터 가져오기는 지원되지 않으므로 항목 데이터 스토어의 데이터는 한 Cloud Storage 파일 또는 폴더에서만 가져올 수 있습니다.
데이터 소스 액세스 제어가 지원됩니다. 자세한 내용은 데이터 소스 액세스 제어를 참고하세요.	데이터 소스 액세스 제어는 지원되지 않습니다. 가져온 데이터에 액세스 제어가 포함될 수 있지만 이러한 제어가 적용되지 않습니다.
Google Cloud 콘솔 또는 API를 사용하여 데이터 스토어를 만들 수 있습니다.	데이터 커넥터와 해당 항목 데이터 스토어를 만들려면 콘솔을 사용해야 합니다.
CMEK 준수	CMEK 준수

Cloud Storage에서 한 번 가져오기

Cloud Storage에서 데이터를 수집하려면 다음 단계를 따라 데이터 스토어를 만들고 Google Cloud 콘솔 또는 API를 사용하여 데이터를 수집합니다.

데이터를 가져오기 전에 수집할 데이터 준비를 검토하세요.

콘솔

콘솔을 사용하여 Cloud Storage 버킷에서 데이터를 수집하려면 다음 단계를 수행합니다.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
데이터 스토어 페이지로 이동합니다.
데이터 스토어 만들기를 클릭합니다.
소스 페이지에서 Cloud Storage를 선택합니다.
가져올 폴더 또는 파일 선택 섹션에서 폴더 또는 파일을 선택합니다.
찾아보기를 클릭하고 수집을 위해 준비한 데이터를 선택한 다음 선택을 클릭합니다. 또는 gs:// 필드에 위치를 직접 입력해도 됩니다.
가져올 데이터의 유형을 선택합니다.
계속을 클릭합니다.
구조화된 데이터를 일회성으로 가져오는 경우:
1. 필드를 키 속성에 매핑합니다.
2. 스키마에서 중요한 필드가 누락된 경우 새 필드 추가를 사용하여 필드를 추가합니다.
  
  자세한 내용은 자동 감지 및 수정 정보를 참조하세요.
3. 계속을 클릭합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
선택사항: 구조화되지 않은 문서를 선택한 경우 문서의 파싱 및 청크 처리 옵션을 선택할 수 있습니다. 파서를 비교하려면 문서 파싱을 참고하세요. 청크 처리에 관한 자세한 내용은 RAG용 문서 청크 처리를 참고하세요.

OCR 파서 및 레이아웃 파서를 사용하면 추가 비용이 발생할 수 있습니다. Document AI 기능 가격 책정을 참고하세요.

파서를 선택하려면 문서 처리 옵션을 펼치고 사용할 파서 옵션을 지정합니다.
만들기를 클릭합니다.
수집 상태를 확인하려면 데이터 스토어 페이지로 이동하여 데이터 스토어 이름을 클릭한 후 데이터 페이지에서 세부정보를 확인합니다. 활동 탭의 상태 열이 진행 중에서 가져오기 완료됨으로 변경되면 수집이 완료된 것입니다.

데이터 크기에 따라 수집에 몇 분 또는 몇 시간까지 걸릴 수 있습니다.

REST

명령줄을 사용하여 데이터 스토어를 만들고 Cloud Storage에서 데이터를 수집하려면 다음 단계를 따르세요.

데이터 스토어를 만듭니다.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
}'
```
참고: 업종 카테고리 GENERIC는 맞춤 검색 앱을 위한 구조화된 데이터, 비정형 데이터, 웹사이트 데이터 스토어를 만드는 데 사용됩니다.

다음을 바꿉니다.
- PROJECT_ID: Google Cloud 프로젝트의 ID입니다.
- DATA_STORE_ID: 만들려는 Vertex AI Search 데이터 스토어의 ID입니다. 이 ID는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- DATA_STORE_DISPLAY_NAME: 만들려는 Vertex AI Search 데이터 스토어의 표시 이름입니다.
선택사항: 비정형 데이터를 업로드하고 문서 파싱을 구성하거나 RAG를 위해 문서 청크 처리를 사용 설정하려면 documentProcessingConfig 객체를 지정하고 데이터 스토어 생성 요청에 포함합니다. 스캔된 PDF를 수집하는 경우 PDF용 OCR 파서를 구성하는 것이 좋습니다. 파싱 또는 청크 처리 옵션을 구성하는 방법은 문서 파싱 및 청크 처리를 참고하세요.
Cloud Storage에서 데이터를 가져옵니다.
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "gcsSource": {
      "inputUris": ["INPUT_FILE_PATTERN_1", "INPUT_FILE_PATTERN_2"],
      "dataSchema": "DATA_SCHEMA",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
    "errorConfig": {
      "gcsPrefix": "ERROR_DIRECTORY"
    }
  }'
```
다음을 바꿉니다.
- PROJECT_ID: Google Cloud 프로젝트의 ID입니다.
- DATA_STORE_ID: Vertex AI Search 데이터 스토어의 ID입니다.
- INPUT_FILE_PATTERN: 문서가 포함된 Cloud Storage의 파일 패턴입니다.
  
  정형 데이터 또는 메타데이터가 있는 비정형 데이터의 경우 입력 파일 패턴의 예는 gs://<your-gcs-bucket>/directory/object.json이고 하나 이상의 파일을 일치시키는 패턴의 예는 gs://<your-gcs-bucket>/directory/*.json입니다.
  
  비정형 문서의 경우 예시는 gs://<your-gcs-bucket>/directory/*.pdf입니다. 패턴과 일치하는 각 파일은 문서가 됩니다.
  
  <your-gcs-bucket>이 PROJECT_ID에 있지 않으면 서비스 계정 service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com에 Cloud Storage 버킷에 대한 '스토리지 객체 뷰어' 권한을 부여해야 합니다. 예를 들어 '123' 소스 프로젝트에서 '456' 대상 프로젝트로 Cloud Storage 버킷을 가져오는 경우 '123' 프로젝트의 Cloud Storage 버킷에 대한 service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com 권한을 부여합니다.
- DATA_SCHEMA: 선택사항입니다. 값은 document, custom, csv, content입니다. 기본값은 document입니다.
  - document: 비정형 문서의 메타데이터와 함께 구조화되지 않은 데이터를 업로드합니다. 파일의 각 줄은 다음 형식 중 하나를 따라야 합니다. 각 문서의 ID를 정의할 수 있습니다.
    - { "id": "<your-id>", "jsonData": "<JSON string>", "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
    - { "id": "<your-id>", "structData": <JSON object>, "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
  - custom: 정형 문서의 JSON을 업로드합니다. 데이터는 스키마에 따라 구성됩니다. 스키마를 지정할 수 있으며, 그렇지 않으면 자동으로 감지됩니다. 문서의 JSON 문자열을 각 줄에 일관된 형식으로 직접 배치할 수 있으며 Vertex AI Search가 가져온 각 문서의 ID를 자동으로 생성합니다.
  - content: 구조화되지 않은 문서(PDF, HTML, DOC, TXT, PPTX)를 업로드합니다. 각 문서의 ID는 16진수 문자열로 인코딩된 SHA256(GCS_URI)의 처음 128비트로 자동으로 생성됩니다. 일치하는 파일이 파일 한도인 10만 개를 초과하지 않는 한 입력 파일 패턴을 여러 개 지정할 수 있습니다.
  - csv: 각 헤더가 문서 필드에 매핑되도록 헤더 행을 CSV 파일에 포함합니다. inputUris 필드를 사용하여 CSV 파일의 경로를 지정합니다.
- ERROR_DIRECTORY: 선택사항입니다. 가져오기에 대한 오류 정보를 볼 수 있는 Cloud Storage 디렉터리입니다. 예를 들면 gs://<your-gcs-bucket>/directory/import_errors입니다. Vertex AI Search가 임시 디렉터리를 자동으로 만들도록 하려면 이 필드를 비워 두는 것이 좋습니다.
- RECONCILIATION_MODE: 선택사항입니다. 값은 FULL 및 INCREMENTAL입니다. 기본값은 INCREMENTAL입니다. INCREMENTAL을 지정하면 Cloud Storage의 데이터가 데이터 스토어에 점진적으로 새로고침됩니다. 이 경우 새 문서를 추가하고 기존 문서를 동일한 ID의 업데이트된 문서로 대체하는 삽입/업데이트(upsert) 작업이 실행됩니다. FULL을 지정하면 데이터 스토어에서 문서의 전체 재배치가 이루어집니다. 즉, 새 문서와 업데이트된 문서는 데이터 스토어에 추가되고 Cloud Storage에 없는 문서는 데이터 스토어에서 삭제됩니다. FULL 모드는 더 이상 필요하지 않은 문서를 자동으로 삭제하려는 경우에 유용합니다.
- AUTO_GENERATE_IDS: 선택사항입니다. 문서 ID를 자동으로 생성할지 지정합니다. true로 설정하면 페이로드의 해시에 따라 문서 ID가 생성됩니다. 생성된 문서 ID는 여러 가져오기에서 일관되지 않을 수 있습니다. 여러 가져오기에서 ID를 자동으로 생성하는 경우 문서 ID의 일관성을 유지하기 위해 reconciliationMode를 FULL로 설정하는 것이 좋습니다.
  
  gcsSource.dataSchema가 custom 또는 csv로 설정된 경우에만 autoGenerateIds를 지정합니다. 그렇지 않으면 INVALID_ARGUMENT 오류가 반환됩니다. autoGenerateIds를 지정하지 않거나 false로 설정한 경우 idField를 지정해야 합니다. 그렇지 않으면 문서를 가져오지 못합니다.
- ID_FIELD: 선택사항입니다. 문서 ID인 필드를 지정합니다. Cloud Storage 소스 문서의 경우 idField는 문서 ID인 JSON 필드의 이름을 지정합니다. 예를 들어 {"my_id":"some_uuid"}가 문서 중 하나에서 문서 ID 필드이면 "idField":"my_id"를 지정합니다. 그러면 이름이 "my_id"인 모든 JSON 필드가 문서 ID로 식별됩니다.
  
  이 필드는 (1) gcsSource.dataSchema가 custom 또는 csv로 설정되었고 (2) auto_generate_ids가 false로 설정되었거나 지정되지 않은 경우에만 지정합니다. 그렇지 않으면 INVALID_ARGUMENT 오류가 반환됩니다.
  
  Cloud Storage JSON 필드의 값은 문자열 유형이고 1~63자(영문 기준)여야 하며 RFC-1034를 준수해야 합니다. 그렇지 않으면 문서를 가져오지 못합니다.
  
  id_field로 지정된 JSON 필드 이름은 문자열 유형이고 1~63자(영문 기준)여야 하고 RFC-1034를 준수해야 합니다. 그렇지 않으면 문서를 가져오지 못합니다.

C#

자세한 내용은 AI Applications C# API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataStoreServiceClientSnippets
{
    /// <summary>Snippet for CreateDataStore</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataStoreRequestObject()
    {
        // Create client
        DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.Create();
        // Initialize request argument(s)
        CreateDataStoreRequest request = new CreateDataStoreRequest
        {
            ParentAsCollectionName = CollectionName.FromProjectLocationCollection("[PROJECT]", "[LOCATION]", "[COLLECTION]"),
            DataStore = new DataStore(),
            DataStoreId = "",
            CreateAdvancedSiteSearch = false,
            CmekConfigNameAsCmekConfigName = CmekConfigName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            SkipDefaultSchemaCreation = false,
        };
        // Make the request
        Operation<DataStore, CreateDataStoreMetadata> response = dataStoreServiceClient.CreateDataStore(request);

        // Poll until the returned long-running operation is complete
        Operation<DataStore, CreateDataStoreMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataStore result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataStore, CreateDataStoreMetadata> retrievedResponse = dataStoreServiceClient.PollOnceCreateDataStore(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataStore retrievedResult = retrievedResponse.Result;
        }
    }
}

문서 가져오기

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
            ForceRefreshContent = false,
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

자세한 내용은 AI Applications Go API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDataStoreClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.CreateDataStoreRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.
	}
	op, err := c.CreateDataStore(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

문서 가져오기


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

자세한 내용은 AI Applications Java API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기

import com.google.cloud.discoveryengine.v1.CollectionName;
import com.google.cloud.discoveryengine.v1.CreateDataStoreRequest;
import com.google.cloud.discoveryengine.v1.DataStore;
import com.google.cloud.discoveryengine.v1.DataStoreServiceClient;

public class SyncCreateDataStore {

  public static void main(String[] args) throws Exception {
    syncCreateDataStore();
  }

  public static void syncCreateDataStore() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.create()) {
      CreateDataStoreRequest request =
          CreateDataStoreRequest.newBuilder()
              .setParent(CollectionName.of("[PROJECT]", "[LOCATION]", "[COLLECTION]").toString())
              .setDataStore(DataStore.newBuilder().build())
              .setDataStoreId("dataStoreId929489618")
              .setCreateAdvancedSiteSearch(true)
              .setSkipDefaultSchemaCreation(true)
              .build();
      DataStore response = dataStoreServiceClient.createDataStoreAsync(request).get();
    }
  }
}

문서 가져오기

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .setForceRefreshContent(true)
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

자세한 내용은 AI Applications Node.js API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Resource name of the CmekConfig to use for protecting this DataStore.
 */
// const cmekConfigName = 'abc123'
/**
 *  DataStore without CMEK protections. If a default CmekConfig is set for
 *  the project, setting this field will override the default CmekConfig as
 *  well.
 */
// const disableCmek = true
/**
 *  Required. The parent resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}`.
 */
// const parent = 'abc123'
/**
 *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to
 *  create.
 */
// const dataStore = {}
/**
 *  Required. The ID to use for the
 *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become
 *  the final component of the
 *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name.
 *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  standard with a length limit of 63 characters. Otherwise, an
 *  INVALID_ARGUMENT error is returned.
 */
// const dataStoreId = 'abc123'
/**
 *  A boolean flag indicating whether user want to directly create an advanced
 *  data store for site search.
 *  If the data store is not configured as site
 *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will
 *  be ignored.
 */
// const createAdvancedSiteSearch = true
/**
 *  A boolean flag indicating whether to skip the default schema creation for
 *  the data store. Only enable this flag if you are certain that the default
 *  schema is incompatible with your use case.
 *  If set to true, you must manually create a schema for the data store before
 *  any documents can be ingested.
 *  This flag cannot be specified if `data_store.starting_schema` is specified.
 */
// const skipDefaultSchemaCreation = true

// Imports the Discoveryengine library
const {DataStoreServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DataStoreServiceClient();

async function callCreateDataStore() {
  // Construct request
  const request = {
    parent,
    dataStore,
    dataStoreId,
  };

  // Run request
  const [operation] = await discoveryengineClient.createDataStore(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataStore();

문서 가져오기

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'
/**
 *  Optional. Whether to force refresh the unstructured content of the
 *  documents.
 *  If set to `true`, the content part of the documents will be refreshed
 *  regardless of the update status of the referencing content.
 */
// const forceRefreshContent = true

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

자세한 내용은 AI Applications Python API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

문서 가져오기

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"

# Examples:
# - Unstructured documents
#   - `gs://bucket/directory/file.pdf`
#   - `gs://bucket/directory/*.pdf`
# - Unstructured documents with JSONL Metadata
#   - `gs://bucket/directory/file.json`
# - Unstructured documents with CSV Metadata
#   - `gs://bucket/directory/file.csv`
# gcs_uri = "YOUR_GCS_PATH"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    gcs_source=discoveryengine.GcsSource(
        # Multiple URIs are supported
        input_uris=[gcs_uri],
        # Options:
        # - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)
        # - `custom` - Unstructured documents with custom JSONL metadata
        # - `document` - Structured documents in the discoveryengine.Document format.
        # - `csv` - Unstructured documents with CSV metadata
        data_schema="content",
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Ruby

자세한 내용은 AI Applications Ruby API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기

require "google/cloud/discovery_engine/v1"

##
# Snippet for the create_data_store call in the DataStoreService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.
#
def create_data_store
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new

  # Call the create_data_store method.
  result = client.create_data_store request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

문서 가져오기

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

주기적 동기화로 Cloud Storage에 연결

데이터를 가져오기 전에 수집할 데이터 준비를 검토하세요.

다음 절차에서는 Cloud Storage 위치를 Vertex AI Search 데이터 커넥터와 연결하는 데이터 커넥터를 만드는 방법과 만들려는 데이터 스토어의 해당 위치에 폴더 또는 파일을 지정하는 방법을 설명합니다. 데이터 커넥터의 하위 데이터 스토어를 항목 데이터 스토어라고 합니다.

데이터는 주기적으로 항목 데이터 스토어에 동기화됩니다. 매일, 3일마다 또는 5일마다 동기화를 지정할 수 있습니다.

콘솔

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
데이터 스토어 페이지로 이동합니다.
데이터 저장소 만들기를 클릭합니다.
소스 페이지에서 Cloud Storage를 선택합니다.
가져올 데이터의 유형을 선택합니다.
주기적을 클릭합니다.
동기화 빈도를 선택합니다. Vertex AI Search 커넥터가 Cloud Storage 위치와 동기화할 빈도입니다. 빈도는 나중에 변경할 수 있습니다.
가져올 폴더 또는 파일 선택 섹션에서 폴더 또는 파일을 선택합니다.
찾아보기를 클릭하고 수집을 위해 준비한 데이터를 선택한 다음 선택을 클릭합니다. 또는 gs:// 필드에 위치를 직접 입력해도 됩니다.
계속을 클릭합니다.
데이터 커넥터의 리전을 선택합니다.
데이터 커넥터 이름을 입력합니다.
선택사항: 구조화되지 않은 문서를 선택한 경우 문서의 파싱 및 청크 처리 옵션을 선택할 수 있습니다. 파서를 비교하려면 문서 파싱을 참고하세요. 청크 처리에 관한 자세한 내용은 RAG용 문서 청크 처리를 참고하세요.

OCR 파서 및 레이아웃 파서를 사용하면 추가 비용이 발생할 수 있습니다. Document AI 기능 가격 책정을 참고하세요.

파서를 선택하려면 문서 처리 옵션을 펼치고 사용할 파서 옵션을 지정합니다.
만들기를 클릭합니다.

데이터를 Cloud Storage 위치와 주기적으로 동기화하는 데이터 커넥터를 만들었습니다. gcs_store라는 항목 데이터 스토어도 만들었습니다.
수집 상태를 확인하려면 데이터 스토어 페이지로 이동하여 데이터 커넥터 이름을 클릭한 후 데이터 페이지에서 세부정보를 확인합니다.

데이터 수집 활동 탭. 데이터 수집 활동 탭의 상태 열이 진행 중에서 성공으로 변경되면 첫 번째 수집이 완료된 것입니다.

데이터 크기에 따라 수집에 몇 분부터 몇 시간까지 걸릴 수 있습니다.

데이터 소스를 설정하고 데이터를 처음 가져온 후에는 설정 중에 선택한 빈도로 해당 소스의 데이터가 동기화됩니다. 데이터 커넥터가 생성된 후 약 1시간 후에 첫 번째 동기화가 이루어집니다. 이후 다음 동기화가 약 24시간, 72시간 또는 120시간 후에 발생합니다.

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

Google Drive에 연결

AI 애플리케이션은 지정된 데이터 소스에서 정보를 직접 가져오는 데이터 제휴를 사용하여 Google Drive의 데이터를 검색할 수 있습니다. 데이터가 Vertex AI Search 색인에 복사되지 않으므로 데이터 스토리지를 걱정할 필요가 없습니다.

시작하기 전에

연결하려는 Google Drive 인스턴스에 사용하는 계정과 동일한 계정으로 Google Cloud 콘솔에 로그인해야 합니다. AI Applications는 Google Workspace 고객 ID를 사용하여 Google Drive에 연결합니다.

AI Applications에서 데이터 소스 액세스 제어를 적용하고 데이터를 보호하려면 ID 공급업체를 구성해야 합니다.

도메인이 소유한 공유 드라이브에 문서를 배치하거나 도메인의 사용자에게 소유권을 할당하여 모든 문서에 액세스할 수 있는지 확인합니다.
Google Drive 데이터를 AI 애플리케이션에 연결하려면 Google Workspace 스마트 기능을 사용 설정해야 합니다. 자세한 내용은 Google Workspace 스마트 기능 사용 설정 또는 사용 중지를 참조하세요.

보안 제어를 사용하는 경우 다음 표의 설명대로 Google Drive의 데이터와 관련된 제한사항에 유의하세요.

보안 제어	다음에 유의하세요.
데이터 상주(DRZ)	AI Applications는 Google Cloud의 데이터 상주만 보장합니다. 데이터 상주 및 Google Drive에 대한 자세한 내용은 Google Workspace 규정 준수 가이드 및 문서(예: 데이터가 저장되는 리전 선택 및 디지털 주권)를 참조하세요.
고객 관리 암호화 키(CMEK)	키는 Google Cloud내 데이터만 암호화합니다. Google Drive에 저장된 데이터에는 Cloud Key Management Service 제어가 적용되지 않습니다.
액세스 투명성	액세스 투명성은 Google 직원이 Google Cloud 프로젝트에서 수행한 작업을 로깅합니다. Google Workspace에서 생성된 액세스 투명성 로그도 검토해야 합니다. 자세한 내용은 Google Workspace 관리 도움말 문서의 액세스 투명성 로그 이벤트를 참고하세요.

Google Drive 데이터 스토어 만들기

콘솔

콘솔을 사용하여 Google Drive 데이터를 검색 가능하게 만들려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
탐색 메뉴에서 데이터 스토어를 클릭합니다.
데이터 스토어 만들기를 클릭합니다.
데이터 소스 선택 페이지에서 Google Drive를 선택합니다.
데이터 스토어의 드라이브 소스 지정
- 전체: 전체 드라이브를 데이터 스토어에 추가합니다.
- 특정 공유 드라이브: 공유 드라이브의 폴더 ID를 추가합니다.
- 특정 공유 폴더: 공유 폴더의 ID를 추가합니다.
공유 드라이브의 폴더 ID 또는 특정 폴더 ID를 찾으려면 공유 드라이브 또는 폴더로 이동하여 URL에서 ID를 복사합니다. URL은 https://drive.google.com/corp/drive/folders/ID 형식을 따릅니다.

예를 들면 https://drive.google.com/corp/drive/folders/123456789012345678901입니다.
계속을 클릭합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
선택사항: 앱을 사용하여 데이터를 쿼리할 때 이 데이터 스토어의 데이터가 생성형 AI 콘텐츠에 사용되지 않도록 하려면 생성형 AI 옵션을 클릭하고 생성형 AI 기능에서 제외를 선택합니다.
만들기를 클릭합니다.

오류 메시지

다음 표에서는 이 Google 데이터 소스를 사용할 때 발생할 수 있는 오류 메시지를 설명하고 HTTP 오류 코드와 문제 해결 추천 단계를 포함하고 있습니다.

오류 코드	오류 메시지	설명	문제 해결
403 (Permission Denied)	Google Workspace 데이터 스토어에서는 서비스 계정 사용자 인증 정보를 사용한 검색이 지원되지 않습니다.	검색된 엔진에 Google Workspace 데이터 스토어가 있고 전달된 사용자 인증 정보가 서비스 계정의 사용자 인증 정보입니다. Google Workspace 데이터 스토어에서 서비스 계정 사용자 인증 정보를 사용한 검색은 지원되지 않습니다.	사용자 인증 정보를 사용하여 검색을 호출하거나 엔진에서 Google Workspace 데이터 스토어를 삭제합니다.
403 (Permission Denied)	Google Workspace 데이터 스토어에서는 일반 계정이 지원되지 않습니다.	Google Workspace 데이터 스토어에서 지원되지 않는 일반 계정(@gmail.com) 사용자 인증 정보를 통해 검색이 호출됩니다.	엔진에서 Google Workspace 데이터 스토어를 삭제하거나 관리 Google 계정을 사용합니다.
403 (Permission Denied)	데이터 스토어의 고객 ID가 일치하지 않습니다.	검색은 Google Workspace 데이터 스토어와 동일한 조직에 속한 사용자에게만 허용됩니다.	사용자와 Google Workspace 데이터 스토어가 서로 다른 조직에 있어야 하는 경우에는 엔진에서 Google Workspace 데이터 스토어를 삭제하거나 지원팀에 문의하세요.
400 (Invalid Argument)	엔진에는 기본 Google Drive 데이터 스토어와 공유 Google Drive 데이터 스토어를 모두 포함할 수 없습니다.	모든 드라이브(기본값)가 있는 데이터 스토어와 특정 공유 드라이브가 있는 데이터 스토어를 동일한 앱에 연결할 수 없습니다.	새 Google Drive 데이터 소스를 앱에 연결하려면 먼저 불필요한 데이터 스토어를 연결 해제한 다음 사용할 새 데이터 스토어를 추가합니다.

문제 해결

검색해도 원하는 파일이 표시되지 않는 경우 다음과 같은 검색 색인 제한 때문일 수 있습니다.

파일에서 검색 가능하도록 추출할 수 있는 텍스트 및 서식 데이터는 1MB입니다.
대부분의 파일 유형의 경우 파일 크기는 10MB를 초과할 수 없습니다. 다음은 예외입니다.
- XLSX 파일 유형은 20MB를 초과할 수 없습니다.
- PDF 파일은 30MB를 초과할 수 없습니다.
- 텍스트 파일은 100MB를 초과할 수 없습니다.
참고: 크기 제한을 초과하는 파일은 검색할 수 없으며 검색 결과에 표시되지 않습니다.
PDF 파일의 광학 문자 인식은 80페이지로 제한됩니다. 50MB 또는 80페이지를 초과하는 PDF는 색인이 생성되지 않으며 1MB 색인 제한을 초과하는 키워드는 검색할 수 없습니다.

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과를 가져오려면 검색 결과 가져오기를 참고하세요.

Gmail에 연결

참고: Gmail 데이터를 AI 애플리케이션에 연결하려면 Google Workspace 스마트 기능을 사용 설정해야 합니다. 자세한 내용은 Google Workspace 스마트 기능 사용 설정 또는 사용 중지를 참고하세요.

다음 단계를 수행하여 Google Cloud 콘솔에서 Gmail에 연결되는 데이터 스토어를 만듭니다. 데이터 스토어를 연결한 후 데이터 스토어를 검색 앱에 연결해 Gmail 데이터를 검색할 수 있습니다.

시작하기 전에

연결하려는 Google Workspace 인스턴스에 사용하는 계정과 동일한 계정으로 Google Cloud 콘솔에 로그인해야 합니다. Vertex AI Search는 Google Workspace 고객 ID를 사용하여 Gmail에 연결합니다.

AI Applications에서 데이터 소스 액세스 제어를 적용하고 데이터를 보호하려면 ID 공급업체를 구성해야 합니다.

제한사항

보안 제어를 사용하는 경우 다음 표의 설명대로 Gmail의 데이터와 관련된 제한사항에 유의하세요.

보안 제어	다음에 유의하세요.
데이터 상주(DRZ)	AI Applications는 Google Cloud의 데이터 상주만 보장합니다. 데이터 상주 및 Gmail에 대한 자세한 내용은 Google Workspace 규정 준수 가이드 및 문서(예: 데이터가 저장되는 리전 선택 및 디지털 주권)를 참조하세요.
고객 관리 암호화 키(CMEK)	키는 Google Cloud내 데이터만 암호화합니다. Gmail에 저장된 데이터에는 Cloud Key Management Service 제어가 적용되지 않습니다.
액세스 투명성	액세스 투명성은 Google 직원이 Google Cloud 프로젝트에서 수행한 작업을 로깅합니다. Google Workspace에서 생성된 액세스 투명성 로그도 검토해야 합니다. 자세한 내용은 Google Workspace 관리 도움말 문서의 액세스 투명성 로그 이벤트를 참조하세요.

Gmail 데이터 스토어 만들기

콘솔

콘솔을 사용하여 Gmail 데이터를 검색 가능하게 만들려면 다음 단계를 수행합니다.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
탐색 메뉴에서 데이터 스토어를 클릭합니다.
데이터 스토어 만들기를 클릭합니다.
데이터 소스 선택 페이지에서 Google Gmail을 선택합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
만들기를 클릭합니다.
검색 앱 만들기의 단계를 수행하여 생성된 데이터 스토어를 Vertex AI Search 앱에 연결합니다.

오류 메시지

다음 표에서는 이 Google 데이터 소스를 사용할 때 발생할 수 있는 오류 메시지를 설명하고 HTTP 오류 코드와 문제 해결 추천 단계를 포함하고 있습니다.

오류 코드	오류 메시지	설명	문제 해결
403 (Permission Denied)	Google Workspace 데이터 스토어에서는 서비스 계정 사용자 인증 정보를 사용한 검색이 지원되지 않습니다.	검색된 엔진에 Google Workspace 데이터 스토어가 있고 전달된 사용자 인증 정보가 서비스 계정의 사용자 인증 정보입니다. Google Workspace 데이터 스토어에서 서비스 계정 사용자 인증 정보를 사용한 검색은 지원되지 않습니다.	사용자 인증 정보를 사용하여 검색을 호출하거나 엔진에서 Google Workspace 데이터 스토어를 삭제합니다.
403 (Permission Denied)	Google Workspace 데이터 스토어에서는 일반 계정이 지원되지 않습니다.	Google Workspace 데이터 스토어에서 지원되지 않는 일반 계정(@gmail.com) 사용자 인증 정보를 통해 검색이 호출됩니다.	엔진에서 Google Workspace 데이터 스토어를 삭제하거나 관리 Google 계정을 사용합니다.
403 (Permission Denied)	데이터 스토어의 고객 ID가 일치하지 않습니다.	검색은 Google Workspace 데이터 스토어와 동일한 조직에 속한 사용자에게만 허용됩니다.	사용자와 Google Workspace 데이터 스토어가 서로 다른 조직에 있어야 하는 경우에는 엔진에서 Google Workspace 데이터 스토어를 삭제하거나 지원팀에 문의하세요.
400 (Invalid Argument)	엔진에는 기본 Google Drive 데이터 스토어와 공유 Google Drive 데이터 스토어를 모두 포함할 수 없습니다.	모든 드라이브(기본값)가 있는 데이터 스토어와 특정 공유 드라이브가 있는 데이터 스토어를 동일한 앱에 연결할 수 없습니다.	새 Google Drive 데이터 소스를 앱에 연결하려면 먼저 불필요한 데이터 스토어를 연결 해제한 다음 사용할 새 데이터 스토어를 추가합니다.

다음 단계

앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 미리보기를 참조하세요.

Google Sites에 연결

Google Sites의 데이터를 검색하려면 다음 단계를 따라 Google Cloud 콘솔을 사용하여 커넥터를 만드세요.

시작하기 전에 다음 사항을 확인하세요.

연결하려는 Google Workspace 인스턴스에 사용하는 계정과 동일한 계정으로 Google Cloud 콘솔에 로그인해야 합니다. Vertex AI Search는 Google Workspace 고객 ID를 사용하여 Google Sites에 연결합니다.
AI Applications에서 데이터 소스 액세스 제어를 적용하고 데이터를 보호하려면 ID 공급업체를 구성해야 합니다.

보안 제어를 사용하는 경우 다음 표의 설명대로 Google Sites의 데이터와 관련된 제한사항에 유의하세요.

보안 제어	다음에 유의하세요.
데이터 상주(DRZ)	AI Applications는 Google Cloud의 데이터 상주만 보장합니다. 데이터 상주 및 Google Sites에 대한 자세한 내용은 Google Workspace 규정 준수 가이드 및 문서(예: 데이터가 저장되는 리전 선택 및 디지털 주권)를 참고하세요.
고객 관리 암호화 키(CMEK)	키는 Google Cloud내 데이터만 암호화합니다. Google Sites에 저장된 데이터에는 Cloud Key Management Service 제어가 적용되지 않습니다.
액세스 투명성	액세스 투명성은 Google 직원이 Google Cloud 프로젝트에서 수행한 작업을 로깅합니다. Google Workspace에서 생성된 액세스 투명성 로그도 검토해야 합니다. 자세한 내용은 Google Workspace 관리 도움말 문서의 액세스 투명성 로그 이벤트를 참고하세요.

콘솔

콘솔을 사용하여 Google Sites 데이터를 검색 가능하게 만들려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
데이터 스토어 페이지로 이동합니다.
새 데이터 스토어를 클릭합니다.
소스 페이지에서 Google 사이트 도구를 선택합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
만들기를 클릭합니다.

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

Google Calendar에 연결

Google Calendar의 데이터를 검색하려면 다음 단계를 따라 Google Cloud 콘솔을 사용하여 커넥터를 만드세요.

시작하기 전에

연결하려는 Google Workspace 인스턴스에 사용하는 계정과 동일한 계정으로 Google Cloud 콘솔에 로그인해야 합니다. Vertex AI Search는 Google Workspace 고객 ID를 사용하여 Google Calendar에 연결합니다.

AI Applications에서 데이터 소스 액세스 제어를 적용하고 데이터를 보호하려면 ID 공급업체를 구성해야 합니다.

보안 제어를 사용하는 경우 다음 표의 설명대로 Google Calendar의 데이터와 관련된 제한사항에 유의하세요.

보안 제어	다음에 유의하세요.
데이터 상주(DRZ)	AI Applications는 Google Cloud의 데이터 상주만 보장합니다. 데이터 상주 및 Google Calendar에 대한 자세한 내용은 Google Workspace 규정 준수 가이드 및 문서(예: 데이터가 저장되는 리전 선택 및 디지털 주권)를 참조하세요.
고객 관리 암호화 키(CMEK)	키는 Google Cloud내 데이터만 암호화합니다. Google Calendar에 저장된 데이터에는 Cloud Key Management Service 제어가 적용되지 않습니다.
액세스 투명성	액세스 투명성은 Google 직원이 Google Cloud 프로젝트에서 수행한 작업을 로깅합니다. Google Workspace에서 생성된 액세스 투명성 로그도 검토해야 합니다. 자세한 내용은 Google Workspace 관리 도움말 문서의 액세스 투명성 로그 이벤트를 참고하세요.

Google Calendar 데이터 스토어 만들기

콘솔

콘솔을 사용하여 Google Calendar 데이터를 검색 가능하게 만들려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
탐색 메뉴에서 데이터 스토어를 클릭합니다.
데이터 스토어 만들기를 클릭합니다.
데이터 소스 선택 페이지에서 Google Calendar를 선택합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
만들기를 클릭합니다.

오류 메시지

다음 표에서는 이 Google 데이터 소스를 사용할 때 발생할 수 있는 오류 메시지를 설명하고 HTTP 오류 코드와 문제 해결 추천 단계를 포함하고 있습니다.

오류 코드	오류 메시지	설명	문제 해결
403 (Permission Denied)	Google Workspace 데이터 스토어에서는 서비스 계정 사용자 인증 정보를 사용한 검색이 지원되지 않습니다.	검색된 엔진에 Google Workspace 데이터 스토어가 있고 전달된 사용자 인증 정보가 서비스 계정의 사용자 인증 정보입니다. Google Workspace 데이터 스토어에서 서비스 계정 사용자 인증 정보를 사용한 검색은 지원되지 않습니다.	사용자 인증 정보를 사용하여 검색을 호출하거나 엔진에서 Google Workspace 데이터 스토어를 삭제합니다.
403 (Permission Denied)	Google Workspace 데이터 스토어에서는 일반 계정이 지원되지 않습니다.	Google Workspace 데이터 스토어에서 지원되지 않는 일반 계정(@gmail.com) 사용자 인증 정보를 통해 검색이 호출됩니다.	엔진에서 Google Workspace 데이터 스토어를 삭제하거나 관리 Google 계정을 사용합니다.
403 (Permission Denied)	데이터 스토어의 고객 ID가 일치하지 않습니다.	검색은 Google Workspace 데이터 스토어와 동일한 조직에 속한 사용자에게만 허용됩니다.	사용자와 Google Workspace 데이터 스토어가 서로 다른 조직에 있어야 하는 경우에는 엔진에서 Google Workspace 데이터 스토어를 삭제하거나 지원팀에 문의하세요.
400 (Invalid Argument)	엔진에는 기본 Google Drive 데이터 스토어와 공유 Google Drive 데이터 스토어를 모두 포함할 수 없습니다.	모든 드라이브(기본값)가 있는 데이터 스토어와 특정 공유 드라이브가 있는 데이터 스토어를 동일한 앱에 연결할 수 없습니다.	새 Google Drive 데이터 소스를 앱에 연결하려면 먼저 불필요한 데이터 스토어를 연결 해제한 다음 사용할 새 데이터 스토어를 추가합니다.

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만든 다음 검색 앱 만들기의 안내에 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과를 가져오려면 검색 결과 가져오기를 참고하세요.

Google 그룹스에 연결

Google Groups의 데이터를 검색하려면 다음 단계를 따라 Google Cloud 콘솔을 사용하여 커넥터를 만드세요.

시작하기 전에 다음 사항을 확인하세요.

연결하려는 Google Workspace 인스턴스에 사용하는 계정과 동일한 계정으로 Google Cloud 콘솔에 로그인해야 합니다. Vertex AI Search는 Google Workspace 고객 ID를 사용하여 Google 그룹스에 연결합니다.
AI Applications에서 데이터 소스 액세스 제어를 적용하고 데이터를 보호하려면 ID 공급업체를 구성해야 합니다.

보안 제어를 사용하는 경우 다음 표의 설명대로 Google 그룹스의 데이터와 관련된 제한사항에 유의하세요.

보안 제어	다음에 유의하세요.
데이터 상주(DRZ)	AI Applications는 Google Cloud의 데이터 상주만 보장합니다. 데이터 상주 및 Google 그룹에 대한 자세한 내용은 Google Workspace 규정 준수 가이드 및 문서(예: 데이터가 저장되는 리전 선택 및 디지털 주권)를 참고하세요.
고객 관리 암호화 키(CMEK)	키는 Google Cloud내 데이터만 암호화합니다. Google Groups에 저장된 데이터에는 Cloud Key Management Service 제어가 적용되지 않습니다.
액세스 투명성	액세스 투명성은 Google 직원이 Google Cloud 프로젝트에서 수행한 작업을 로깅합니다. Google Workspace에서 생성된 액세스 투명성 로그도 검토해야 합니다. 자세한 내용은 Google Workspace 관리 도움말 문서의 액세스 투명성 로그 이벤트를 참고하세요.

콘솔

콘솔을 사용하여 Google Groups 데이터를 검색 가능하게 만들려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
데이터 스토어 페이지로 이동합니다.
새 데이터 스토어를 클릭합니다.
소스 페이지에서 Google 그룹스를 선택합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
만들기를 클릭합니다. 데이터 크기에 따라 수집에 몇 분부터 몇 시간까지 걸릴 수 있습니다. 1시간 이상 기다렸다가 데이터 스토어를 검색에 사용합니다.

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

Cloud SQL에서 가져오기

Cloud SQL에서 데이터를 수집하려면 다음 단계를 따라 Cloud SQL 액세스를 설정하고, 데이터 스토어를 만들고, 데이터를 수집합니다.

Cloud SQL 인스턴스의 스테이징 버킷 액세스 설정

Cloud SQL에서 데이터를 수집할 때는 데이터가 먼저 Cloud Storage 버킷에 스테이징됩니다. Cloud SQL 인스턴스에 Cloud Storage 버킷 액세스 권한을 부여하려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 SQL 페이지로 이동합니다.

SQL
가져올 Cloud SQL 인스턴스를 클릭합니다.
이메일 주소와 비슷한 인스턴스의 서비스 계정 식별자를 복사합니다(예: p9876-abcd33f@gcp-sa-cloud-sql.iam.gserviceaccount.com).
IAM 및 관리자 페이지로 이동합니다.

IAM 및 관리자
액세스 권한 부여를 클릭합니다.
새 주 구성원에 인스턴스의 서비스 계정 식별자를 입력하고 Cloud Storage > 스토리지 관리자 역할을 선택합니다.
저장을 클릭합니다.

Cloud SQL 데이터가 Vertex AI Search와 동일한 프로젝트에 있는 경우 Cloud SQL에서 데이터 가져오기로 이동합니다.
Cloud SQL 데이터가 Vertex AI Search 프로젝트와 다른 프로젝트에 있는 경우 다른 프로젝트에서 Cloud SQL 액세스 설정으로 이동합니다.

다른 프로젝트에서 Cloud SQL 액세스 설정

다른 프로젝트에 있는 Cloud SQL 데이터에 Vertex AI Search 액세스 권한을 부여하려면 다음 단계를 따르세요.

다음 PROJECT_NUMBER 변수를 Vertex AI Search 프로젝트 번호로 바꾼 다음 코드 블록의 콘텐츠를 복사합니다. Vertex AI Search 서비스 계정 식별자는 다음과 같습니다.
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com`
```
IAM 및 관리자 페이지로 이동합니다.

IAM 및 관리자
IAM 및 관리자 페이지에서 Cloud SQL 프로젝트로 전환하고 액세스 권한 부여를 클릭합니다.
새 주 구성원에 서비스 계정의 식별자를 입력하고 Cloud SQL > Cloud SQL 뷰어 역할을 선택합니다.
저장을 클릭합니다.

그런 다음 Cloud SQL에서 데이터 가져오기로 이동합니다.

Cloud SQL에서 데이터 가져오기

콘솔

콘솔을 사용하여 Cloud SQL의 데이터를 수집하려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
데이터 스토어 페이지로 이동합니다.
새 데이터 스토어를 클릭합니다.
소스 페이지에서 Cloud SQL을 선택합니다.
가져올 데이터의 프로젝트 ID, 인스턴스 ID, 데이터베이스 ID, 테이블 ID를 지정합니다.
찾아보기를 클릭하고 데이터를 내보낼 중간 Cloud Storage 위치를 선택한 다음 선택을 클릭합니다. 또는 gs:// 필드에 위치를 직접 입력해도 됩니다.
서버리스 내보내기를 사용 설정할지 선택합니다. 서버리스 내보내기를 사용하면 추가 비용이 발생합니다. 서버리스 내보내기에 관한 자세한 내용은 Cloud SQL 문서의 내보내기가 성능에 미치는 영향 최소화를 참고하세요.
계속을 클릭합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
만들기를 클릭합니다.
수집 상태를 확인하려면 데이터 스토어 페이지로 이동하여 데이터 스토어 이름을 클릭한 후 데이터 페이지에서 세부정보를 확인합니다. 활동 탭의 상태 열이 진행 중에서 가져오기 완료됨으로 변경되면 수집이 완료된 것입니다.

데이터 크기에 따라 수집에 몇 분 또는 몇 시간까지 걸릴 수 있습니다.

REST

명령줄을 사용하여 데이터 스토어를 만들고 Cloud SQL에서 데이터를 수집하려면 다음 단계를 따르세요.

데이터 스토어를 만듭니다.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'
```
다음을 바꿉니다.
- PROJECT_ID: 프로젝트의 ID입니다.
- DATA_STORE_ID: 데이터 스토어의 ID입니다. ID에는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- DISPLAY_NAME: 데이터 스토어의 표시 이름입니다. 이 정보는 Google Cloud 콘솔에 표시될 수 있습니다.
참고: 업종 카테고리 GENERIC는 맞춤 검색 앱을 위한 구조화된 데이터, 구조화되지 않은 데이터, 웹사이트 데이터 스토어를 만드는 데 사용됩니다.
Cloud SQL에서 데이터를 가져옵니다.
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "cloudSqlSource": {
      "projectId": "SQL_PROJECT_ID",
      "instanceId": "INSTANCE_ID",
      "databaseId": "DATABASE_ID",
      "tableId": "TABLE_ID",
      "gcsStagingDir": "STAGING_DIRECTORY"
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
다음을 바꿉니다.
- PROJECT_ID: Vertex AI Search 프로젝트의 ID입니다.
- DATA_STORE_ID: 데이터 스토어의 ID입니다. ID에는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- SQL_PROJECT_ID: Cloud SQL 프로젝트의 ID입니다.
- INSTANCE_ID: Cloud SQL 인스턴스의 ID입니다.
- DATABASE_ID: Cloud SQL 데이터베이스의 ID입니다.
- TABLE_ID: Cloud SQL 테이블의 ID입니다.
- STAGING_DIRECTORY: 선택사항입니다. Cloud Storage 디렉터리입니다(예: gs://<your-gcs-bucket>/directory/import_errors).
- RECONCILIATION_MODE: 선택사항입니다. 값은 FULL 및 INCREMENTAL입니다. 기본값은 INCREMENTAL입니다. INCREMENTAL을 지정하면 Cloud SQL의 데이터가 데이터 스토어에 점진적으로 새로고침됩니다. 이 경우 새 문서를 추가하고 기존 문서를 동일한 ID의 업데이트된 문서로 대체하는 삽입/업데이트(upsert) 작업이 실행됩니다. FULL을 지정하면 데이터 스토어에서 문서의 전체 재배치가 이루어집니다. 즉, 새 문서와 업데이트된 문서는 데이터 스토어에 추가되고 Cloud SQL에 없는 문서는 데이터 스토어에서 삭제됩니다. FULL 모드는 더 이상 필요하지 않은 문서를 자동으로 삭제하려는 경우에 유용합니다.

Python

자세한 내용은 AI Applications Python API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

문서 가져오기

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# sql_project_id = "YOUR_SQL_PROJECT_ID"
# sql_instance_id = "YOUR_SQL_INSTANCE_ID"
# sql_database_id = "YOUR_SQL_DATABASE_ID"
# sql_table_id = "YOUR_SQL_TABLE_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    cloud_sql_source=discoveryengine.CloudSqlSource(
        project_id=sql_project_id,
        instance_id=sql_instance_id,
        database_id=sql_database_id,
        table_id=sql_table_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

Spanner에서 가져오기

Spanner에서 데이터를 수집하려면 다음 단계를 따라 데이터 스토어를 만들고 Google Cloud 콘솔 또는 API를 사용하여 데이터를 수집합니다.

다른 프로젝트에서 Spanner 액세스 설정

Spanner 데이터가 Vertex AI Search와 동일한 프로젝트에 있는 경우 Spanner에서 데이터 가져오기로 건너뜁니다.

Vertex AI Search에 다른 프로젝트에 있는 Spanner 데이터에 대한 액세스 권한을 부여하려면 다음 단계를 따르세요.

다음 PROJECT_NUMBER 변수를 Vertex AI Search 프로젝트 번호로 바꾼 다음 이 코드 블록의 콘텐츠를 복사합니다. Vertex AI Search 서비스 계정 식별자는 다음과 같습니다.
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
```
IAM 및 관리자 페이지로 이동합니다.

IAM 및 관리자
IAM 및 관리자 페이지에서 Spanner 프로젝트로 전환하고 액세스 권한 부여를 클릭합니다.
새 주 구성원에 서비스 계정 식별자를 입력하고 다음 중 하나를 선택합니다.
- 가져오기 중에 Data Boost를 사용하지 않으려면 Cloud Spanner > Cloud Spanner 데이터베이스 리더 역할을 선택합니다.
- 가져오기 중에 Data Boost를 사용하려면 Cloud Spanner > Cloud Spanner 데이터베이스 관리자 역할 또는 Cloud Spanner 데이터베이스 리더 및 spanner.databases.useDataBoost 권한이 있는 커스텀 역할을 선택합니다. Data Boost에 대한 자세한 내용은 Spanner 문서의 Data Boost 개요를 참고하세요.
저장을 클릭합니다.

그런 다음 Spanner에서 데이터 가져오기로 이동합니다.

Spanner에서 데이터 가져오기

콘솔

콘솔을 사용하여 Spanner에서 데이터를 수집하려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
데이터 스토어 페이지로 이동합니다.
새 데이터 스토어를 클릭합니다.
소스 페이지에서 Cloud Spanner를 선택합니다.
가져올 데이터의 프로젝트 ID, 인스턴스 ID, 데이터베이스 ID, 테이블 ID를 지정합니다.
Data Boost 사용 여부를 선택합니다. Data Boost에 대한 자세한 내용은 Spanner 문서의 Data Boost 개요를 참고하세요.
계속을 클릭합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
만들기를 클릭합니다.
수집 상태를 확인하려면 데이터 스토어 페이지로 이동하여 데이터 스토어 이름을 클릭한 후 데이터 페이지에서 세부정보를 확인합니다. 활동 탭의 상태 열이 진행 중에서 가져오기 완료됨으로 변경되면 수집이 완료된 것입니다.

데이터 크기에 따라 수집에 몇 분 또는 몇 시간까지 걸릴 수 있습니다.

REST

명령줄을 사용하여 데이터 스토어를 만들고 Spanner에서 데이터를 수집하려면 다음 단계를 따르세요.

데이터 스토어를 만듭니다.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
}'
```
다음을 바꿉니다.
- PROJECT_ID: Vertex AI Search 프로젝트의 ID입니다.
- DATA_STORE_ID: 데이터 스토어의 ID입니다. ID에는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- DISPLAY_NAME: 데이터 스토어의 표시 이름입니다. 이 정보는 Google Cloud 콘솔에 표시될 수 있습니다.
참고: 업종 카테고리 GENERIC는 맞춤 검색 앱을 위한 구조화된 데이터, 구조화되지 않은 데이터, 웹사이트 데이터 스토어를 만드는 데 사용됩니다.
Spanner에서 데이터를 가져옵니다.
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "cloudSpannerSource": {
      "projectId": "SPANNER_PROJECT_ID",
      "instanceId": "INSTANCE_ID",
      "databaseId": "DATABASE_ID",
      "tableId": "TABLE_ID",
      "enableDataBoost": "DATA_BOOST_BOOLEAN"
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
다음을 바꿉니다.
- PROJECT_ID: Vertex AI Search 프로젝트의 ID입니다.
- DATA_STORE_ID: 데이터 스토어의 ID입니다.
- SPANNER_PROJECT_ID: Spanner 프로젝트의 ID입니다.
- INSTANCE_ID: Spanner 인스턴스의 ID입니다.
- DATABASE_ID: Spanner 데이터베이스의 ID입니다.
- TABLE_ID: Spanner 테이블의 ID입니다.
- DATA_BOOST_BOOLEAN: 선택사항입니다. Data Boost를 사용 설정할지 지정합니다. Data Boost에 대한 자세한 내용은 Spanner 문서의 Data Boost 개요를 참고하세요.
- RECONCILIATION_MODE: 선택사항입니다. 값은 FULL 및 INCREMENTAL입니다. 기본값은 INCREMENTAL입니다. INCREMENTAL을 지정하면 Spanner의 데이터가 데이터 스토어에 점진적으로 새로고침됩니다. 이 경우 새 문서를 추가하고 기존 문서를 동일한 ID의 업데이트된 문서로 대체하는 삽입/업데이트(upsert) 작업이 실행됩니다. FULL을 지정하면 데이터 스토어에서 문서의 전체 재배치가 이루어집니다. 즉, 새 문서와 업데이트된 문서는 데이터 스토어에 추가되고 Spanner에 없는 문서는 데이터 스토어에서 삭제됩니다. FULL 모드는 더 이상 필요하지 않은 문서를 자동으로 삭제하려는 경우에 유용합니다.
- AUTO_GENERATE_IDS: 선택사항입니다. 문서 ID를 자동으로 생성할지 지정합니다. true로 설정하면 페이로드의 해시에 따라 문서 ID가 생성됩니다. 생성된 문서 ID는 여러 가져오기에서 일관되지 않을 수 있습니다. 여러 가져오기에서 ID를 자동으로 생성하는 경우 문서 ID의 일관성을 유지하기 위해 reconciliationMode를 FULL로 설정하는 것이 좋습니다.
- ID_FIELD: 선택사항입니다. 문서 ID인 필드를 지정합니다.

Python

자세한 내용은 AI Applications Python API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

문서 가져오기

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# spanner_project_id = "YOUR_SPANNER_PROJECT_ID"
# spanner_instance_id = "YOUR_SPANNER_INSTANCE_ID"
# spanner_database_id = "YOUR_SPANNER_DATABASE_ID"
# spanner_table_id = "YOUR_SPANNER_TABLE_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    spanner_source=discoveryengine.SpannerSource(
        project_id=spanner_project_id,
        instance_id=spanner_instance_id,
        database_id=spanner_database_id,
        table_id=spanner_table_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

Firestore에서 가져오기

Firestore에서 데이터를 수집하려면 다음 단계를 따라 데이터 스토어를 만들고 Google Cloud 콘솔 또는 API를 사용하여 데이터를 수집합니다.

Firestore 데이터가 Vertex AI Search와 동일한 프로젝트에 있는 경우 Firestore에서 데이터 가져오기로 이동합니다.

Firestore 데이터가 Vertex AI Search 프로젝트와 다른 프로젝트에 있는 경우 Firestore 액세스 설정으로 이동합니다.

다른 프로젝트에서 Firestore 액세스 설정

Vertex AI Search에 다른 프로젝트에 있는 Firestore 데이터에 대한 액세스 권한을 부여하려면 다음 단계를 따르세요.

다음 PROJECT_NUMBER 변수를 Vertex AI Search 프로젝트 번호로 바꾼 다음 이 코드 블록의 콘텐츠를 복사합니다. Vertex AI Search 서비스 계정 식별자는 다음과 같습니다.
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
```
IAM 및 관리자 페이지로 이동합니다.

IAM 및 관리자
IAM 및 관리자 페이지에서 Firestore 프로젝트로 전환하고 액세스 권한 부여를 클릭합니다.
새 주 구성원에 인스턴스의 서비스 계정 식별자를 입력하고 Datastore > Cloud Datastore 가져오기 내보내기 관리자 역할을 선택합니다.
저장을 클릭합니다.
Vertex AI Search 프로젝트로 다시 전환합니다.

그런 다음 Firestore에서 데이터 가져오기로 이동합니다.

Firestore에서 데이터 가져오기

콘솔

콘솔을 사용하여 Firestore에서 데이터를 수집하려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
데이터 스토어 페이지로 이동합니다.
새 데이터 스토어를 클릭합니다.
소스 페이지에서 Firestore를 선택합니다.
가져올 데이터의 프로젝트 ID, 데이터베이스 ID, 컬렉션 ID를 지정합니다.
계속을 클릭합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
만들기를 클릭합니다.
수집 상태를 확인하려면 데이터 스토어 페이지로 이동하여 데이터 스토어 이름을 클릭한 후 데이터 페이지에서 세부정보를 확인합니다. 활동 탭의 상태 열이 진행 중에서 가져오기 완료됨으로 변경되면 수집이 완료된 것입니다.

데이터 크기에 따라 수집에 몇 분 또는 몇 시간까지 걸릴 수 있습니다.

REST

명령줄을 사용하여 데이터 스토어를 만들고 Firestore에서 데이터를 수집하려면 다음 단계를 따르세요.

데이터 스토어를 만듭니다.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'
```
다음을 바꿉니다.
- PROJECT_ID: 프로젝트의 ID입니다.
- DATA_STORE_ID: 데이터 스토어의 ID입니다. ID에는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- DISPLAY_NAME: 데이터 스토어의 표시 이름입니다. 이 정보는 Google Cloud 콘솔에 표시될 수 있습니다.
참고: 업종 카테고리 GENERIC는 맞춤 검색 앱을 위한 구조화된 데이터, 구조화되지 않은 데이터, 웹사이트 데이터 스토어를 만드는 데 사용됩니다.
Firestore에서 데이터를 가져옵니다.
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "firestoreSource": {
      "projectId": "FIRESTORE_PROJECT_ID",
      "databaseId": "DATABASE_ID",
      "collectionId": "COLLECTION_ID",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
다음을 바꿉니다.
- PROJECT_ID: Vertex AI Search 프로젝트의 ID입니다.
- DATA_STORE_ID: 데이터 스토어의 ID입니다. ID에는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- FIRESTORE_PROJECT_ID: Firestore 프로젝트의 ID입니다.
- DATABASE_ID: Firestore 데이터베이스의 ID입니다.
- COLLECTION_ID: Firestore 컬렉션의 ID입니다.
- RECONCILIATION_MODE: 선택사항입니다. 값은 FULL 및 INCREMENTAL입니다. 기본값은 INCREMENTAL입니다. INCREMENTAL을 지정하면 Firestore의 데이터가 데이터 스토어에 점진적으로 새로고침됩니다. 이 경우 새 문서를 추가하고 기존 문서를 동일한 ID의 업데이트된 문서로 대체하는 삽입/업데이트(upsert) 작업이 실행됩니다. FULL을 지정하면 데이터 스토어에서 문서의 전체 재배치가 이루어집니다. 즉, 새 문서와 업데이트된 문서는 데이터 스토어에 추가되고 Firestore에 없는 문서는 데이터 스토어에서 삭제됩니다. FULL 모드는 더 이상 필요하지 않은 문서를 자동으로 삭제하려는 경우에 유용합니다.
- AUTO_GENERATE_IDS: 선택사항입니다. 문서 ID를 자동으로 생성할지 지정합니다. true로 설정하면 페이로드의 해시에 따라 문서 ID가 생성됩니다. 생성된 문서 ID는 여러 가져오기에서 일관되지 않을 수 있습니다. 여러 가져오기에서 ID를 자동으로 생성하는 경우 문서 ID의 일관성을 유지하기 위해 reconciliationMode를 FULL로 설정하는 것이 좋습니다.
- ID_FIELD: 선택사항입니다. 문서 ID인 필드를 지정합니다.

Python

자세한 내용은 AI Applications Python API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

문서 가져오기

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# firestore_project_id = "YOUR_FIRESTORE_PROJECT_ID"
# firestore_database_id = "YOUR_FIRESTORE_DATABASE_ID"
# firestore_collection_id = "YOUR_FIRESTORE_COLLECTION_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    firestore_source=discoveryengine.FirestoreSource(
        project_id=firestore_project_id,
        database_id=firestore_database_id,
        collection_id=firestore_collection_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

Bigtable에서 가져오기

Bigtable에서 데이터를 수집하려면 다음 단계를 따라 데이터 스토어를 만들고 API를 사용하여 데이터를 수집합니다.

Bigtable 액세스 설정

Vertex AI Search에 다른 프로젝트에 있는 Bigtable 데이터에 대한 액세스 권한을 부여하려면 다음 단계를 따르세요.

다음 PROJECT_NUMBER 변수를 Vertex AI Search 프로젝트 번호로 바꾼 다음 이 코드 블록의 콘텐츠를 복사합니다. Vertex AI Search 서비스 계정 식별자는 다음과 같습니다.
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com`
```
IAM 및 관리자 페이지로 이동합니다.

IAM 및 관리자
IAM 및 관리자 페이지에서 Bigtable 프로젝트로 전환하고 액세스 권한 부여를 클릭합니다.
새 주 구성원에 인스턴스의 서비스 계정 식별자를 입력하고 Bigtable > Bigtable 리더 역할을 선택합니다.
저장을 클릭합니다.
Vertex AI Search 프로젝트로 다시 전환합니다.

그런 다음 Bigtable에서 데이터 가져오기로 이동합니다.

Bigtable에서 데이터 가져오기

REST

명령줄을 사용하여 데이터 스토어를 만들고 Bigtable에서 데이터를 수집하려면 다음 단계를 따르세요.

데이터 스토어를 만듭니다.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'
```
다음을 바꿉니다.
- PROJECT_ID: 프로젝트의 ID입니다.
- DATA_STORE_ID: 데이터 스토어의 ID입니다. ID에는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- DISPLAY_NAME: 데이터 스토어의 표시 이름입니다. 이 정보는 Google Cloud 콘솔에 표시될 수 있습니다.
참고: 업종 카테고리 GENERIC는 맞춤 검색 앱을 위한 구조화된 데이터, 구조화되지 않은 데이터, 웹사이트 데이터 스토어를 만드는 데 사용됩니다.
Bigtable에서 데이터를 가져옵니다.
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "bigtableSource ": {
      "projectId": "BIGTABLE_PROJECT_ID",
      "instanceId": "INSTANCE_ID",
      "tableId": "TABLE_ID",
      "bigtableOptions": {
        "keyFieldName": "KEY_FIELD_NAME",
        "families": {
          "key": "KEY",
          "value": {
            "fieldName": "FIELD_NAME",
            "encoding": "ENCODING",
            "type": "TYPE",
            "columns": [
              {
                "qualifier": "QUALIFIER",
                "fieldName": "FIELD_NAME",
                "encoding": "COLUMN_ENCODING",
                "type": "COLUMN_VALUES_TYPE"
              }
            ]
          }
         }
         ...
      }
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
다음을 바꿉니다.
- PROJECT_ID: Vertex AI Search 프로젝트의 ID입니다.
- DATA_STORE_ID: 데이터 스토어의 ID입니다. ID에는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- BIGTABLE_PROJECT_ID: Bigtable 프로젝트의 ID입니다.
- INSTANCE_ID: Bigtable 인스턴스의 ID입니다.
- TABLE_ID: Bigtable 테이블의 ID입니다.
- KEY_FIELD_NAME: 선택사항이지만 권장됩니다. Vertex AI Search에 수집한 후 row key 값에 사용할 필드 이름입니다.
- KEY: 필수 항목입니다. column family 키의 문자열 값입니다.
- ENCODING: 선택사항입니다. 유형이 STRING이 아닌 경우 값의 인코딩 모드입니다. columns에 해당 열을 나열하고 인코딩을 지정하여 특정 열을 재정의할 수 있습니다.
- COLUMN_TYPE: 선택사항입니다. 이 열 패밀리의 값 유형입니다.
- QUALIFIER: 필수 항목입니다. 열의 한정자입니다.
- FIELD_NAME: 선택사항이지만 권장됩니다. Vertex AI Search에 수집한 후 이 열에 사용할 필드 이름입니다.
- COLUMN_ENCODING: 선택사항입니다. 유형이 STRING이 아닌 경우 특정 열의 값의 인코딩 모드입니다.
- RECONCILIATION_MODE: 선택사항입니다. 값은 FULL 및 INCREMENTAL입니다. 기본값은 INCREMENTAL입니다. INCREMENTAL을 지정하면 Bigtable의 데이터가 데이터 스토어에 점진적으로 새로고침됩니다. 이 경우 새 문서를 추가하고 기존 문서를 동일한 ID의 업데이트된 문서로 대체하는 삽입/업데이트(upsert) 작업이 실행됩니다. FULL을 지정하면 데이터 스토어에서 문서의 전체 재배치가 이루어집니다. 즉, 새 문서와 업데이트된 문서는 데이터 스토어에 추가되고 Bigtable에 없는 문서는 데이터 스토어에서 삭제됩니다. FULL 모드는 더 이상 필요하지 않은 문서를 자동으로 삭제하려는 경우에 유용합니다.
- AUTO_GENERATE_IDS: 선택사항입니다. 문서 ID를 자동으로 생성할지 지정합니다. true로 설정하면 페이로드의 해시에 따라 문서 ID가 생성됩니다. 생성된 문서 ID는 여러 가져오기에서 일관되지 않을 수 있습니다. 여러 가져오기에서 ID를 자동으로 생성하는 경우 문서 ID의 일관성을 유지하기 위해 reconciliationMode를 FULL로 설정하는 것이 좋습니다.
  
  bigquerySource.dataSchema가 custom으로 설정된 경우에만 autoGenerateIds를 지정합니다. 그렇지 않으면 INVALID_ARGUMENT 오류가 반환됩니다. autoGenerateIds를 지정하지 않거나 false로 설정한 경우 idField를 지정해야 합니다. 그렇지 않으면 문서를 가져오지 못합니다.
- ID_FIELD: 선택사항입니다. 문서 ID인 필드를 지정합니다.

Python

자세한 내용은 AI Applications Python API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

문서 가져오기

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# bigtable_project_id = "YOUR_BIGTABLE_PROJECT_ID"
# bigtable_instance_id = "YOUR_BIGTABLE_INSTANCE_ID"
# bigtable_table_id = "YOUR_BIGTABLE_TABLE_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

bigtable_options = discoveryengine.BigtableOptions(
    families={
        "family_name_1": discoveryengine.BigtableOptions.BigtableColumnFamily(
            type_=discoveryengine.BigtableOptions.Type.STRING,
            encoding=discoveryengine.BigtableOptions.Encoding.TEXT,
            columns=[
                discoveryengine.BigtableOptions.BigtableColumn(
                    qualifier="qualifier_1".encode("utf-8"),
                    field_name="field_name_1",
                ),
            ],
        ),
        "family_name_2": discoveryengine.BigtableOptions.BigtableColumnFamily(
            type_=discoveryengine.BigtableOptions.Type.INTEGER,
            encoding=discoveryengine.BigtableOptions.Encoding.BINARY,
        ),
    }
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    bigtable_source=discoveryengine.BigtableSource(
        project_id=bigtable_project_id,
        instance_id=bigtable_instance_id,
        table_id=bigtable_table_id,
        bigtable_options=bigtable_options,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

PostgreSQL용 AlloyDB에서 가져오기

PostgreSQL용 AlloyDB에서 데이터를 수집하려면 다음 단계를 따라 데이터 스토어를 만들고 Google Cloud 콘솔 또는 API를 사용하여 데이터를 수집합니다.

PostgreSQL용 AlloyDB 데이터가 Vertex AI Search 프로젝트와 동일한 프로젝트에 있는 경우 PostgreSQL용 AlloyDB에서 데이터 가져오기로 이동합니다.

PostgreSQL용 AlloyDB 데이터가 Vertex AI Search 프로젝트와 다른 프로젝트에 있는 경우 PostgreSQL용 AlloyDB 액세스 설정으로 이동하세요.

다른 프로젝트에서 PostgreSQL용 AlloyDB 액세스 설정

Vertex AI Search에 다른 프로젝트에 있는 PostgreSQL용 AlloyDB 데이터에 대한 액세스 권한을 부여하려면 다음 단계를 따르세요.

다음 PROJECT_NUMBER 변수를 Vertex AI Search 프로젝트 번호로 바꾼 다음 이 코드 블록의 콘텐츠를 복사합니다. Vertex AI Search 서비스 계정 식별자는 다음과 같습니다.
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
```
PostgreSQL용 AlloyDB 데이터가 있는 Google Cloud 프로젝트로 전환합니다.
IAM 페이지로 이동합니다.

IAM
액세스 권한 부여를 클릭합니다.
새 주 구성원에 Vertex AI Search 서비스 계정 식별자를 입력하고 Cloud AlloyDB > Cloud AlloyDB 관리자 역할을 선택합니다.
저장을 클릭합니다.
Vertex AI Search 프로젝트로 다시 전환합니다.

그런 다음 PostgreSQL용 AlloyDB에서 데이터 가져오기로 이동합니다.

PostgreSQL용 AlloyDB에서 데이터 가져오기

콘솔

콘솔을 사용하여 PostgreSQL용 AlloyDB의 데이터를 수집하려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 AI 애플리케이션 페이지로 이동합니다.

AI 애플리케이션
탐색 메뉴에서 데이터 스토어를 클릭합니다.
데이터 저장소 만들기를 클릭합니다.
소스 페이지에서 AlloyDB를 선택합니다.
가져올 데이터의 프로젝트 ID, 위치 ID, 클러스터 ID, 데이터베이스 ID, 테이블 ID를 지정합니다.
계속을 클릭합니다.
데이터 스토어의 리전을 선택합니다.
데이터 스토어 이름을 입력합니다.
만들기를 클릭합니다.
수집 상태를 확인하려면 데이터 스토어 페이지로 이동하여 데이터 스토어 이름을 클릭한 후 데이터 페이지에서 세부정보를 확인합니다. 활동 탭의 상태 열이 진행 중에서 가져오기 완료됨으로 변경되면 수집이 완료된 것입니다.

데이터 크기에 따라 수집에 몇 분 또는 몇 시간까지 걸릴 수 있습니다.

REST

명령줄을 사용하여 데이터 스토어를 만들고 PostgreSQL용 AlloyDB에서 데이터를 수집하려면 다음 단계를 따르세요.

데이터 스토어를 만듭니다.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'
```
다음을 바꿉니다.
- PROJECT_ID: 프로젝트의 ID입니다.
- DATA_STORE_ID: 데이터 스토어의 ID입니다. ID에는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- DISPLAY_NAME: 데이터 스토어의 표시 이름입니다. 이 정보는 Google Cloud 콘솔에 표시될 수 있습니다.
참고: 업종 카테고리 GENERIC는 맞춤 검색 앱을 위한 구조화된 데이터, 구조화되지 않은 데이터, 웹사이트 데이터 스토어를 만드는 데 사용됩니다.
PostgreSQL용 AlloyDB에서 데이터를 가져옵니다.
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "alloydbSource": {
      "projectId": "ALLOYDB_PROJECT_ID",
      "locationId": "LOCATION_ID",
      "clusterId": "CLUSTER_ID",
      "databaseId": "DATABASE_ID",
      "tableId": "TABLE_ID",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
다음을 바꿉니다.
- PROJECT_ID: Vertex AI Search 프로젝트의 ID입니다.
- DATA_STORE_ID: 데이터 스토어의 ID입니다. ID에는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- ALLOYDB_PROJECT_ID: PostgreSQL용 AlloyDB 프로젝트의 ID입니다.
- LOCATION_ID: PostgreSQL용 AlloyDB 위치의 ID입니다.
- CLUSTER_ID: PostgreSQL용 AlloyDB 클러스터의 ID입니다.
- DATABASE_ID: PostgreSQL용 AlloyDB 데이터베이스의 ID입니다.
- TABLE_ID: PostgreSQL용 AlloyDB 테이블의 ID입니다.
- RECONCILIATION_MODE: 선택사항입니다. 값은 FULL 및 INCREMENTAL입니다. 기본값은 INCREMENTAL입니다. INCREMENTAL을 지정하면 PostgreSQL용 AlloyDB의 데이터가 데이터 스토어에 점진적으로 새로고침됩니다. 이 경우 새 문서를 추가하고 기존 문서를 동일한 ID의 업데이트된 문서로 대체하는 삽입/업데이트(upsert) 작업이 실행됩니다. FULL을 지정하면 데이터 스토어에서 문서의 전체 재배치가 이루어집니다. 즉, 새 문서와 업데이트된 문서는 데이터 스토어에 추가되고 PostgreSQL용 AlloyDB에 없는 문서는 데이터 스토어에서 삭제됩니다. FULL 모드는 더 이상 필요하지 않은 문서를 자동으로 삭제하려는 경우에 유용합니다.
- AUTO_GENERATE_IDS: 선택사항입니다. 문서 ID를 자동으로 생성할지 지정합니다. true로 설정하면 페이로드의 해시에 따라 문서 ID가 생성됩니다. 생성된 문서 ID는 여러 가져오기에서 일관되지 않을 수 있습니다. 여러 가져오기에서 ID를 자동으로 생성하는 경우 문서 ID의 일관성을 유지하기 위해 reconciliationMode를 FULL로 설정하는 것이 좋습니다.
- ID_FIELD: 선택사항입니다. 문서 ID인 필드를 지정합니다.

Python

자세한 내용은 AI Applications Python API 참고 문서를 참고하세요.

AI Applications에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

데이터 스토어 만들기


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

문서 가져오기

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine_v1 as discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# alloy_db_project_id = "YOUR_ALLOY_DB_PROJECT_ID"
# alloy_db_location_id = "YOUR_ALLOY_DB_LOCATION_ID"
# alloy_db_cluster_id = "YOUR_ALLOY_DB_CLUSTER_ID"
# alloy_db_database_id = "YOUR_ALLOY_DB_DATABASE_ID"
# alloy_db_table_id = "YOUR_ALLOY_DB_TABLE_ID"

# For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    alloy_db_source=discoveryengine.AlloyDbSource(
        project_id=alloy_db_project_id,
        location_id=alloy_db_location_id,
        cluster_id=alloy_db_cluster_id,
        database_id=alloy_db_database_id,
        table_id=alloy_db_table_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

API를 사용하여 구조화된 JSON 데이터 업로드

API를 사용하여 JSON 문서 또는 객체를 직접 업로드하려면 다음 단계를 수행합니다.

데이터를 가져오기 전에 수집할 데이터 준비를 수행합니다.

REST

명령줄을 사용하여 데이터 스토어를 만들고 구조화된 JSON 데이터를 가져오려면 다음 단계를 따르세요.

데이터 스토어를 만듭니다.
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
}'
```
다음을 바꿉니다.
- PROJECT_ID: Google Cloud 프로젝트의 ID입니다.
- DATA_STORE_ID: 만들려는 Vertex AI Search 데이터 스토어의 ID입니다. 이 ID는 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
- DATA_STORE_DISPLAY_NAME: 만들려는 Vertex AI Search 데이터 스토어의 표시 이름입니다.
참고: 업종 카테고리 GENERIC는 맞춤 검색 앱을 위한 구조화된 데이터, 구조화되지 않은 데이터, 웹사이트 데이터 스토어를 만드는 데 사용됩니다.

정형 데이터를 가져옵니다.

데이터를 업로드할 때 사용할 수 있는 몇 가지 방법이 있습니다.

JSON 문서를 업로드합니다.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
-d '{
  "jsonData": "JSON_DOCUMENT_STRING"
}'

다음을 바꿉니다.

DOCUMENT_ID: 문서의 고유 ID입니다. 이 ID는 최대 63자이며 소문자, 숫자, 밑줄, 하이픈만 포함할 수 있습니다.
JSON_DOCUMENT_STRING: JSON 문서를 단일 문자열로 나타냅니다. 이전 단계에서 제공한 JSON 스키마를 준수해야 합니다. 예를 들면 다음과 같습니다.
```
{ \"title\": \"test title\", \"categories\": [\"cat_1\", \"cat_2\"], \"uri\": \"test uri\"}
```

JSON 객체를 업로드합니다.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
-d '{
  "structData": JSON_DOCUMENT_OBJECT
}'

JSON_DOCUMENT_OBJECT를 JSON 객체 형식으로 지정된 JSON 문서로 바꿉니다. 이전 단계에서 제공한 JSON 스키마를 준수해야 합니다. 예를 들면 다음과 같습니다.

```json
{
  "title": "test title",
  "categories": [
    "cat_1",
    "cat_2"
  ],
  "uri": "test uri"
}
```

JSON 문서로 업데이트합니다.

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
-d '{
  "jsonData": "JSON_DOCUMENT_STRING"
}'

JSON 객체로 업데이트합니다.

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
-d '{
  "structData": JSON_DOCUMENT_OBJECT
}'

다음 단계

데이터 스토어를 앱에 연결하려면 앱을 만들고 검색 앱 만들기의 단계를 따라 데이터 스토어를 선택합니다.
앱과 데이터 스토어를 설정한 후 검색 결과가 표시되는 방식을 미리 보려면 검색 결과 가져오기를 참고하세요.

데이터 수집 문제 해결

데이터 수집에 문제가 있는 경우 다음 도움말을 검토하세요.

고객 관리 암호화 키를 사용 중인데 데이터 가져오기에 실패한 경우(The caller does not have permission 오류 메시지) 키의 CryptoKey 암호화/복호화 IAM 역할(roles/cloudkms.cryptoKeyEncrypterDecrypter)이 Cloud Storage 서비스 에이전트에 부여되었는지 확인합니다. 자세한 내용은 '고객 관리 암호화 키'의 시작하기 전에를 참고하세요.
고급 웹사이트 색인 생성을 사용 중인데 데이터 스토어의 문서 사용량이 예상보다 훨씬 적은 경우 색인을 생성하기 위해 지정한 URL 패턴을 검토하고 지정된 URL 패턴이 색인을 생성하려는 페이지를 포함하는지 확인하고 필요하다면 확장합니다. 예를 들어 *.en.example.com/*를 사용한 경우 색인을 생성할 사이트에 *.example.com/*를 추가해야 할 수 있습니다.

Terraform을 사용하여 데이터 스토어 만들기

Terraform을 사용하여 빈 데이터 스토어를 만들 수 있습니다. 빈 데이터 스토어를 만든 후 Google Cloud 콘솔 또는 API 명령어를 사용하여 데이터 스토어에 데이터를 수집할 수 있습니다.

Terraform 구성을 적용하거나 삭제하는 방법은 기본 Terraform 명령어를 참조하세요.

Terraform을 사용하여 빈 데이터 스토어를 만들려면 google_discovery_engine_data_store를 참조하세요.

검색 데이터 스토어 만들기 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

웹사이트 콘텐츠를 사용하여 데이터 스토어 만들기

콘솔

Python

데이터 스토어 만들기

웹사이트 가져오기

다음 단계

BigQuery에서 가져오기

BigQuery에서 한 번 가져오기

콘솔

REST

C#

데이터 스토어 만들기

문서 가져오기

Go

데이터 스토어 만들기

문서 가져오기

Java

데이터 스토어 만들기

문서 가져오기

Node.js

데이터 스토어 만들기

문서 가져오기

Python

데이터 스토어 만들기

문서 가져오기

Ruby

데이터 스토어 만들기

문서 가져오기

주기적 동기화를 사용하여 BigQuery에 연결

콘솔

다음 단계

Cloud Storage에서 가져오기

Cloud Storage에서 한 번 가져오기

콘솔

REST

C#

데이터 스토어 만들기

문서 가져오기

Go

데이터 스토어 만들기

문서 가져오기

Java

데이터 스토어 만들기

문서 가져오기

Node.js

데이터 스토어 만들기

문서 가져오기

Python

데이터 스토어 만들기

문서 가져오기

Ruby

데이터 스토어 만들기

문서 가져오기

주기적 동기화로 Cloud Storage에 연결

콘솔

다음 단계

Google Drive에 연결

시작하기 전에

Google Drive 데이터 스토어 만들기

콘솔

오류 메시지

문제 해결

다음 단계

Gmail에 연결

시작하기 전에

제한사항

Gmail 데이터 스토어 만들기

콘솔

오류 메시지

다음 단계

Google Sites에 연결

콘솔

다음 단계

Google Calendar에 연결

시작하기 전에

Google Calendar 데이터 스토어 만들기

콘솔

오류 메시지

다음 단계

검색 데이터 스토어 만들기