从 Cloud Storage 导入文档
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
从 Cloud Storage 导入文档
深入探索
如需查看包含此代码示例的详细文档,请参阅以下内容:
代码示例
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],[],[[["\u003cp\u003eThis content provides a Python code sample for importing documents from Google Cloud Storage into a Vertex AI Agent Builder data store.\u003c/p\u003e\n"],["\u003cp\u003eThe process involves setting up Application Default Credentials for authentication and configuring client options based on the data store's location.\u003c/p\u003e\n"],["\u003cp\u003eThe code sample shows how to import both unstructured documents and documents with metadata using different file formats like PDF, JSONL, and CSV, and how to select the right data schema.\u003c/p\u003e\n"],["\u003cp\u003eThe sample uses the \u003ccode\u003eImportDocumentsRequest\u003c/code\u003e with \u003ccode\u003eGcsSource\u003c/code\u003e to specify the location of the files in Cloud Storage and the type of the data, then triggers the import operation with the option of \u003ccode\u003eFULL\u003c/code\u003e or \u003ccode\u003eINCREMENTAL\u003c/code\u003e reconciliation mode.\u003c/p\u003e\n"],["\u003cp\u003eThe documentation also includes instructions for further actions, such as searching for more code samples using the Google Cloud sample browser and links to Vertex AI Agent Builder Python API documentation.\u003c/p\u003e\n"]]],[],null,["# Import documents from Cloud Storage\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Create a custom recommendations data store](/generative-ai-app-builder/docs/create-data-store-recommendations)\n- [Create a search data store](/generative-ai-app-builder/docs/create-data-store-es)\n- [Refresh structured and unstructured data](/agentspace/docs/refresh-data)\n- [Refresh structured and unstructured data](/generative-ai-app-builder/docs/refresh-data)\n\nCode sample\n-----------\n\n### Python\n\n\nFor more information, see the\n[AI Applications Python API\nreference documentation](/python/docs/reference/discoveryengine/latest).\n\n\nTo authenticate to AI Applications, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n from google.api_core.client_options import ClientOptions\n from google.cloud import discoveryengine\n\n # TODO(developer): Uncomment these variables before running the sample.\n # project_id = \"YOUR_PROJECT_ID\"\n # location = \"YOUR_LOCATION\" # Values: \"global\"\n # data_store_id = \"YOUR_DATA_STORE_ID\"\n\n # Examples:\n # - Unstructured documents\n # - `gs://bucket/directory/file.pdf`\n # - `gs://bucket/directory/*.pdf`\n # - Unstructured documents with JSONL Metadata\n # - `gs://bucket/directory/file.json`\n # - Unstructured documents with CSV Metadata\n # - `gs://bucket/directory/file.csv`\n # gcs_uri = \"YOUR_GCS_PATH\"\n\n # For more information, refer to:\n # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store\n client_options = (\n ClientOptions(api_endpoint=f\"{location}-discoveryengine.googleapis.com\")\n if location != \"global\"\n else None\n )\n\n # Create a client\n client = discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.document_service.DocumentServiceClient.html(client_options=client_options)\n\n # The full resource name of the search engine branch.\n # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}\n parent = client.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.document_service.DocumentServiceClient.html#google_cloud_discoveryengine_v1_services_document_service_DocumentServiceClient_branch_path(\n project=project_id,\n location=location,\n data_store=data_store_id,\n branch=\"default_branch\",\n )\n\n request = discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsRequest.html(\n parent=parent,\n gcs_source=discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.GcsSource.html(\n # Multiple URIs are supported\n input_uris=[gcs_uri],\n # Options:\n # - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)\n # - `custom` - Unstructured documents with custom JSONL metadata\n # - `document` - Structured documents in the discoveryengine.Document format.\n # - `csv` - Unstructured documents with CSV metadata\n data_schema=\"content\",\n ),\n # Options: `FULL`, `INCREMENTAL`\n reconciliation_mode=discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsRequest.html.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsRequest.ReconciliationMode.html.INCREMENTAL,\n )\n\n # Make the request\n operation = client.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.document_service.DocumentServiceClient.html#google_cloud_discoveryengine_v1_services_document_service_DocumentServiceClient_import_documents(request=request)\n\n print(f\"Waiting for operation to complete: {operation.operation.name}\")\n response = operation.result()\n\n # After the operation is complete,\n # get information from operation metadata\n metadata = discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsMetadata.html(operation.metadata)\n\n # Handle the response\n print(response)\n print(metadata)\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=genappbuilder)."]]