從 Cloud Storage 匯入文件
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
從 Cloud Storage 匯入文件
深入探索
如需包含這個程式碼範例的詳細說明文件,請參閱下列內容:
程式碼範例
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],[],[[["\u003cp\u003eThis content provides a Python code sample for importing documents from Google Cloud Storage into a Vertex AI Agent Builder data store.\u003c/p\u003e\n"],["\u003cp\u003eThe process involves setting up Application Default Credentials for authentication and configuring client options based on the data store's location.\u003c/p\u003e\n"],["\u003cp\u003eThe code sample shows how to import both unstructured documents and documents with metadata using different file formats like PDF, JSONL, and CSV, and how to select the right data schema.\u003c/p\u003e\n"],["\u003cp\u003eThe sample uses the \u003ccode\u003eImportDocumentsRequest\u003c/code\u003e with \u003ccode\u003eGcsSource\u003c/code\u003e to specify the location of the files in Cloud Storage and the type of the data, then triggers the import operation with the option of \u003ccode\u003eFULL\u003c/code\u003e or \u003ccode\u003eINCREMENTAL\u003c/code\u003e reconciliation mode.\u003c/p\u003e\n"],["\u003cp\u003eThe documentation also includes instructions for further actions, such as searching for more code samples using the Google Cloud sample browser and links to Vertex AI Agent Builder Python API documentation.\u003c/p\u003e\n"]]],[],null,["# Import documents from Cloud Storage\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Create a custom recommendations data store](/generative-ai-app-builder/docs/create-data-store-recommendations)\n- [Create a search data store](/generative-ai-app-builder/docs/create-data-store-es)\n- [Refresh structured and unstructured data](/agentspace/docs/refresh-data)\n- [Refresh structured and unstructured data](/generative-ai-app-builder/docs/refresh-data)\n\nCode sample\n-----------\n\n### Python\n\n\nFor more information, see the\n[AI Applications Python API\nreference documentation](/python/docs/reference/discoveryengine/latest).\n\n\nTo authenticate to AI Applications, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n from google.api_core.client_options import ClientOptions\n from google.cloud import discoveryengine\n\n # TODO(developer): Uncomment these variables before running the sample.\n # project_id = \"YOUR_PROJECT_ID\"\n # location = \"YOUR_LOCATION\" # Values: \"global\"\n # data_store_id = \"YOUR_DATA_STORE_ID\"\n\n # Examples:\n # - Unstructured documents\n # - `gs://bucket/directory/file.pdf`\n # - `gs://bucket/directory/*.pdf`\n # - Unstructured documents with JSONL Metadata\n # - `gs://bucket/directory/file.json`\n # - Unstructured documents with CSV Metadata\n # - `gs://bucket/directory/file.csv`\n # gcs_uri = \"YOUR_GCS_PATH\"\n\n # For more information, refer to:\n # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store\n client_options = (\n ClientOptions(api_endpoint=f\"{location}-discoveryengine.googleapis.com\")\n if location != \"global\"\n else None\n )\n\n # Create a client\n client = discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.document_service.DocumentServiceClient.html(client_options=client_options)\n\n # The full resource name of the search engine branch.\n # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}\n parent = client.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.document_service.DocumentServiceClient.html#google_cloud_discoveryengine_v1_services_document_service_DocumentServiceClient_branch_path(\n project=project_id,\n location=location,\n data_store=data_store_id,\n branch=\"default_branch\",\n )\n\n request = discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsRequest.html(\n parent=parent,\n gcs_source=discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.GcsSource.html(\n # Multiple URIs are supported\n input_uris=[gcs_uri],\n # Options:\n # - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)\n # - `custom` - Unstructured documents with custom JSONL metadata\n # - `document` - Structured documents in the discoveryengine.Document format.\n # - `csv` - Unstructured documents with CSV metadata\n data_schema=\"content\",\n ),\n # Options: `FULL`, `INCREMENTAL`\n reconciliation_mode=discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsRequest.html.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsRequest.ReconciliationMode.html.INCREMENTAL,\n )\n\n # Make the request\n operation = client.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.document_service.DocumentServiceClient.html#google_cloud_discoveryengine_v1_services_document_service_DocumentServiceClient_import_documents(request=request)\n\n print(f\"Waiting for operation to complete: {operation.operation.name}\")\n response = operation.result()\n\n # After the operation is complete,\n # get information from operation metadata\n metadata = discoveryengine.https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.types.ImportDocumentsMetadata.html(operation.metadata)\n\n # Handle the response\n print(response)\n print(metadata)\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=genappbuilder)."]]