Cloud Storage에서 문서 가져오기
컬렉션을 사용해 정리하기
내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.
Cloud Storage에서 문서 가져오기
더 살펴보기
이 코드 샘플이 포함된 자세한 문서는 다음을 참조하세요.
코드 샘플
달리 명시되지 않는 한 이 페이지의 콘텐츠에는 Creative Commons Attribution 4.0 라이선스에 따라 라이선스가 부여되며, 코드 샘플에는 Apache 2.0 라이선스에 따라 라이선스가 부여됩니다. 자세한 내용은 Google Developers 사이트 정책을 참조하세요. 자바는 Oracle 및/또는 Oracle 계열사의 등록 상표입니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],[],[[["\u003cp\u003eThis content provides a Python code sample for importing documents from Google Cloud Storage into a Vertex AI Agent Builder data store.\u003c/p\u003e\n"],["\u003cp\u003eThe process involves setting up Application Default Credentials for authentication and configuring client options based on the data store's location.\u003c/p\u003e\n"],["\u003cp\u003eThe code sample shows how to import both unstructured documents and documents with metadata using different file formats like PDF, JSONL, and CSV, and how to select the right data schema.\u003c/p\u003e\n"],["\u003cp\u003eThe sample uses the \u003ccode\u003eImportDocumentsRequest\u003c/code\u003e with \u003ccode\u003eGcsSource\u003c/code\u003e to specify the location of the files in Cloud Storage and the type of the data, then triggers the import operation with the option of \u003ccode\u003eFULL\u003c/code\u003e or \u003ccode\u003eINCREMENTAL\u003c/code\u003e reconciliation mode.\u003c/p\u003e\n"],["\u003cp\u003eThe documentation also includes instructions for further actions, such as searching for more code samples using the Google Cloud sample browser and links to Vertex AI Agent Builder Python API documentation.\u003c/p\u003e\n"]]],[],null,["# Import documents from Cloud Storage\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Create a custom recommendations data store](/generative-ai-app-builder/docs/create-data-store-recommendations)\n- [Create a search data store](/generative-ai-app-builder/docs/create-data-store-es)\n- [Refresh structured and unstructured data](/agentspace/docs/refresh-data)\n- [Refresh structured and unstructured data](/generative-ai-app-builder/docs/refresh-data)\n\nCode sample\n-----------\n\n### Python\n\n\nFor more information, see the\n[AI Applications Python API\nreference documentation](/python/docs/reference/discoveryengine/latest).\n\n\nTo authenticate to AI Applications, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n from google.api_core.client_options import ClientOptions\n from google.cloud import discoveryengine\n\n # TODO(developer): Uncomment these variables before running the sample.\n # project_id = \"YOUR_PROJECT_ID\"\n # location = \"YOUR_LOCATION\" # Values: \"global\"\n # data_store_id = \"YOUR_DATA_STORE_ID\"\n\n # Examples:\n # - Unstructured documents\n # - `gs://bucket/directory/file.pdf`\n # - `gs://bucket/directory/*.pdf`\n # - Unstructured documents with JSONL Metadata\n # - `gs://bucket/directory/file.json`\n # - Unstructured documents with CSV Metadata\n # - `gs://bucket/directory/file.csv`\n # gcs_uri = \"YOUR_GCS_PATH\"\n\n # For more information, refer to:\n # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store\n client_options = (\n ClientOptions(api_endpoint=f\"{location}-discoveryengine.googleapis.com\")\n if location != \"global\"\n else None\n )\n\n # Create a client\n client = discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.document_service.DocumentServiceClient.html(client_options=client_options)\n\n # The full resource name of the search engine branch.\n # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}\n parent = client.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.document_service.DocumentServiceClient.html#google_cloud_discoveryengine_v1_services_document_service_DocumentServiceClient_branch_path(\n project=project_id,\n location=location,\n data_store=data_store_id,\n branch=\"default_branch\",\n )\n\n request = discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsRequest.html(\n parent=parent,\n gcs_source=discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.GcsSource.html(\n # Multiple URIs are supported\n input_uris=[gcs_uri],\n # Options:\n # - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)\n # - `custom` - Unstructured documents with custom JSONL metadata\n # - `document` - Structured documents in the discoveryengine.Document format.\n # - `csv` - Unstructured documents with CSV metadata\n data_schema=\"content\",\n ),\n # Options: `FULL`, `INCREMENTAL`\n reconciliation_mode=discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsRequest.html.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsRequest.ReconciliationMode.html.INCREMENTAL,\n )\n\n # Make the request\n operation = client.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.document_service.DocumentServiceClient.html#google_cloud_discoveryengine_v1_services_document_service_DocumentServiceClient_import_documents(request=request)\n\n print(f\"Waiting for operation to complete: {operation.operation.name}\")\n response = operation.result()\n\n # After the operation is complete,\n # get information from operation metadata\n metadata = discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsMetadata.html(operation.metadata)\n\n # Handle the response\n print(response)\n print(metadata)\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=genappbuilder)."]]