Organiza tus páginas con colecciones
Guarda y categoriza el contenido según tus preferencias.
Cloud Data Fusion proporciona un complemento de fuente de Dataplex Universal Catalog para leer datos de entidades (tablas) de Dataplex Universal Catalog que residen en recursos de Cloud Storage o BigQuery. El complemento de fuente de Dataplex Universal Catalog te permite tratar los datos de los recursos de Cloud Storage como tablas y filtrarlos con consultas de SQL.
Antes de comenzar
Crea una instancia de Cloud Data Fusion si no tienes una. Este complemento está disponible en instancias que se ejecutan en Cloud Data Fusion 6.6 o versiones posteriores.
Los datos de origen ya deben formar parte de una zona y un recurso (ya sea un bucket de Cloud Storage o un conjunto de datos de BigQuery) de Dataplex Universal Catalog.
Para usar tablas de Cloud Storage, debes configurar un almacén de metadatos para tu lake.
Para que se puedan leer datos de las entidades de Cloud Storage, Dataproc Metastore debe estar adjunto al lake.
No se admiten los datos CSV en las entidades de Cloud Storage.
En el proyecto de Dataplex Universal Catalog, habilita el acceso privado a Google en la subred, que suele estar configurada como default, o configura internal_ip_only como false.
Limitaciones
En el caso de los recursos de Cloud Storage, este complemento no admite la lectura de archivos CSV. Admite la lectura de formatos JSON, Avro, Parquet y ORC.
En el caso de los recursos de Cloud Storage, no se aplican Partition Start Date ni Partition End Date.
Roles requeridos
Para obtener los permisos que necesitas para administrar roles, pídele a tu administrador que te otorgue los siguientes roles de IAM en el agente de servicio de Dataproc y el agente de servicio de Cloud Data Fusion (service-CUSTOMER_PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com):
En esta página, puedes administrar tus instancias.
Haz clic en Ver instancia para abrir tu instancia en la IU de Cloud Data Fusion.
Ve a la página Studio, expande el menú Source y haz clic en Dataplex.
Cómo configurar el complemento
Después de agregar este complemento a tu canalización en la página Studio, haz clic en la fuente de Dataplex Universal Catalog para configurar sus propiedades.
Para obtener más información sobre la configuración, consulta la referencia de Dataplex Source.
Opcional: Comienza con una canalización de ejemplo
Hay disponibles canalizaciones de muestra, incluida una canalización de fuente de SAP a receptor de Dataplex Universal Catalog y una canalización de fuente de Dataplex Universal Catalog a receptor de BigQuery.
Para usar una canalización de muestra, abre tu instancia en la IU de Cloud Data Fusion, haz clic en Centro de noticias > Canalizaciones y selecciona una de las canalizaciones del catálogo universal de Dataplex. Se abrirá un diálogo para ayudarte a crear la canalización.
[[["Fácil de comprender","easyToUnderstand","thumb-up"],["Resolvió mi problema","solvedMyProblem","thumb-up"],["Otro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Información o código de muestra incorrectos","incorrectInformationOrSampleCode","thumb-down"],["Faltan la información o los ejemplos que necesito","missingTheInformationSamplesINeed","thumb-down"],["Problema de traducción","translationIssue","thumb-down"],["Otro","otherDown","thumb-down"]],["Última actualización: 2025-09-05 (UTC)"],[[["\u003cp\u003eCloud Data Fusion's Dataplex Source plugin allows reading data from Dataplex entities (tables) located on Cloud Storage or BigQuery assets, treating data in Cloud Storage as tables with SQL filtering capabilities.\u003c/p\u003e\n"],["\u003cp\u003eUsing this plugin requires a Cloud Data Fusion instance version 6.6 or later, and the source data must reside in a Dataplex zone and asset.\u003c/p\u003e\n"],["\u003cp\u003eTo read from Cloud Storage, a metastore must be configured for the lake and the data must be in JSON, Avro, Parquet, or ORC formats, as CSV is not supported.\u003c/p\u003e\n"],["\u003cp\u003eSpecific IAM roles, including Dataplex Developer, Dataplex Data Reader, Dataproc Metastore Metadata User, Cloud Dataplex Service Agent, and Dataplex Metadata Reader, are required to manage roles and utilize this plugin.\u003c/p\u003e\n"],["\u003cp\u003eSample pipelines, such as SAP source to Dataplex sink and Dataplex source to BigQuery sink, are available in the Cloud Data Fusion UI under the Hub section.\u003c/p\u003e\n"]]],[],null,["# Process data with Cloud Data Fusion\n\n[Cloud Data Fusion](/data-fusion) provides a Dataplex Universal Catalog Source plugin\nto read data from Dataplex Universal Catalog entities (tables) residing on\nCloud Storage or BigQuery assets. The Dataplex Universal Catalog Source\nplugin lets you treat data in Cloud Storage assets as tables and filter\nthe data with SQL queries.\n\nBefore you begin\n----------------\n\n- [Create a Cloud Data Fusion instance](/data-fusion/docs/how-to/create-instance),\n if you don't have one. This plugin is available in instances that run in\n Cloud Data Fusion version 6.6 or later.\n\n- The source data must already be part of a Dataplex Universal Catalog\n [zone](/dataplex/docs/add-zone) and an [asset](/dataplex/docs/manage-assets)\n (either a Cloud Storage bucket or a BigQuery dataset).\n\n- To use tables from Cloud Storage, you must configure a metastore\n for your lake.\n\n- For data to be read from Cloud Storage entities,\n Dataproc Metastore must be attached to the lake.\n\n- CSV data in Cloud Storage entities isn't supported.\n\n- In the Dataplex Universal Catalog project, enable Private Google Access on the\n subnetwork, which is usually set to `default`, or set `internal_ip_only` to\n `false`.\n\n### Limitations\n\n- For Cloud Storage assets: this plugin does not support reading from\n CSV files. It supports reading from JSON, Avro, Parquet, and ORC formats.\n\n- For Cloud Storage assets: **Partition Start Date** and **Partition\n End Date** aren't applicable.\n\n### Required roles\n\n\nTo get the permissions that\nyou need to manage roles,\n\nask your administrator to grant you the\nfollowing IAM roles on the Dataproc service agent and the Cloud Data Fusion service agent (service-\u003cvar translate=\"no\"\u003eCUSTOMER_PROJECT_NUMBER\u003c/var\u003e@gcp-sa-datafusion.iam.gserviceaccount.com):\n\n- [Dataplex Developer](/iam/docs/roles-permissions/dataplex#dataplex.developer) (`roles/dataplex.developer`)\n- [Dataplex Data Reader](/iam/docs/roles-permissions/dataplex#dataplex.dataReader) (`roles/dataplex.dataReader`)\n- [Dataproc Metastore Metadata User](/iam/docs/roles-permissions/metastore#metastore.metadataUser) (`roles/metastore.metadataUser`)\n- [Cloud Dataplex Service Agent](/iam/docs/roles-permissions/dataplex#dataplex.serviceAgent) (`roles/dataplex.serviceAgent`)\n- [Dataplex Metadata Reader](/iam/docs/roles-permissions/dataplex#dataplex.metadataReader) (`roles/dataplex.metadataReader`)\n\n\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\nYou might also be able to get\nthe required permissions through [custom\nroles](/iam/docs/creating-custom-roles) or other [predefined\nroles](/iam/docs/roles-overview#predefined).\n\nAdd the plugin to your pipeline\n-------------------------------\n\n1. In the Google Cloud console, go to the Cloud Data Fusion **Instances** page.\n\n [Go to Instances](https://console.cloud.google.com/data-fusion/locations/-/instances)\n\n This page lets you manage your instances.\n2. Click **View instance** to open your instance in the Cloud Data Fusion\n UI.\n\n3. Go to the **Studio** page, expand the **Source** menu, and click **Dataplex**.\n\nConfigure the plugin\n--------------------\n\nAfter you add this plugin to your pipeline on the **Studio** page, click\nthe Dataplex Universal Catalog source to configure its properties.\n\nFor more information about configurations, see the\n[Dataplex Source](https://cdap.atlassian.net/wiki/spaces/DOCS/pages/1766817793/Google+Dataplex+Batch+Source) reference.\n\nOptional: Get started with a sample pipeline\n--------------------------------------------\n\nSample pipelines are available, including an SAP source to\nDataplex Universal Catalog sink pipeline and a Dataplex Universal Catalog source to\nBigQuery sink pipeline.\n\nTo use a sample pipeline, open your instance in the Cloud Data Fusion UI,\nclick **Hub \\\u003e Pipelines**, and select one of the\nDataplex Universal Catalog pipelines. A dialog opens to help you create the\npipeline.\n\nWhat's next\n-----------\n\n- [Ingest data with Cloud Data Fusion](/dataplex/docs/ingest-with-data-fusion) using the Dataplex Universal Catalog Sink plugin."]]