Mantieni tutto organizzato con le raccolte
Salva e classifica i contenuti in base alle tue preferenze.
La derivazione dei dati è una funzionalità di Dataflow che consente di monitorare
il modo in cui i dati vengono trasferiti nei sistemi: da dove provengono, dove vengono inviati
e a quali trasformazioni sono sottoposti.
Ogni pipeline eseguita utilizzando Dataflow ha diverse risorse
di dati associate. La derivazione di un asset di dati include la sua origine, cosa succede
e dove si sposta nel tempo. Con la derivazione dei dati, puoi monitorare
il movimento end-to-end delle risorse di dati, dall'origine alla destinazione finale.
Quando abiliti la derivazione dei dati per i tuoi job Dataflow, Dataflow acquisisce gli eventi di derivazione e li pubblica nell'API Data Lineage di Dataplex Universal Catalog .
Sign in to your Google Cloud account. If you're new to
Google Cloud,
create an account to evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
Per ottenere le autorizzazioni necessarie per visualizzare i grafici di visualizzazione della derivazione, chiedi all'amministratore di concederti i seguenti ruoli IAM:
La derivazione dei dati in Dataflow presenta le seguenti limitazioni:
La derivazione dei dati è supportata nelle versioni dell'SDK Apache Beam
2.63.0 e successive.
Devi abilitare la derivazione dei dati per ogni job.
L'acquisizione dei dati non è istantanea. Potrebbero essere necessari
alcuni minuti prima che i dati di derivazione dei job Dataflow vengano visualizzati in
Dataplex Universal Catalog.
Sono supportate le seguenti origini e destinazioni:
Apache Kafka
BigQuery
Bigtable
Cloud Storage
JDBC (Java Database Connectivity)
Pub/Sub
Spanner
I modelli Dataflow
che utilizzano queste origini e destinazioni acquisiscono e pubblicano automaticamente
gli eventi di derivazione.
Abilitare la derivazione dei dati in Dataflow
Devi abilitare la derivazione a livello di job. Per attivare la lineage dei dati,
utilizza l'enable_lineageopzione del servizio Dataflow
nel seguente modo:
Facoltativamente, puoi specificare uno o entrambi i seguenti parametri con l'opzione
del servizio:
process_id: un identificatore univoco utilizzato da Dataplex Universal Catalog per raggruppare
le esecuzioni dei job. Se non specificato, viene utilizzato il nome del job.
process_name: Un nome leggibile da una persona per il processo di derivazione dei dati.
Se non specificato, viene utilizzato il nome del job con il prefisso "Dataflow ".
Visualizza la derivazione nel Catalogo universale Dataplex
La derivazione dei dati fornisce informazioni sulle relazioni tra le risorse del progetto
e i processi che le hanno create. Puoi visualizzare le informazioni sulla derivazione dei dati nella console Google Cloud sotto forma di grafico o di una singola tabella. Puoi anche recuperare le informazioni sulla derivazione dei dati dall'API Data Lineage sotto forma di dati JSON.
Se la derivazione dei dati è attivata per un job specifico e vuoi disattivarla, annulla il job esistente ed esegui una nuova versione del job senza l'opzione di servizio enable_lineage.
[[["Facile da capire","easyToUnderstand","thumb-up"],["Il problema è stato risolto","solvedMyProblem","thumb-up"],["Altra","otherUp","thumb-up"]],[["Difficile da capire","hardToUnderstand","thumb-down"],["Informazioni o codice di esempio errati","incorrectInformationOrSampleCode","thumb-down"],["Mancano le informazioni o gli esempi di cui ho bisogno","missingTheInformationSamplesINeed","thumb-down"],["Problema di traduzione","translationIssue","thumb-down"],["Altra","otherDown","thumb-down"]],["Ultimo aggiornamento 2025-09-04 UTC."],[[["\u003cp\u003eData lineage in Dataflow tracks how data moves through your systems, including its origin, transformations, and destination, allowing for end-to-end data asset movement tracking.\u003c/p\u003e\n"],["\u003cp\u003eEnabling data lineage for Dataflow jobs captures lineage events and publishes them to the Dataplex Data Lineage API, and it is done on a per-project basis and at the job level using the \u003ccode\u003eenable_lineage\u003c/code\u003e service option.\u003c/p\u003e\n"],["\u003cp\u003eViewing lineage information in Dataplex can be done through a visualization graph or a single table in the Google Cloud console, as well as retrieving JSON data from the Data Lineage API.\u003c/p\u003e\n"],["\u003cp\u003eSupported sources and sinks for data lineage in Dataflow include Apache Kafka, BigQuery, Bigtable, Cloud Storage, JDBC, Pub/Sub, and Spanner, and the feature requires Apache Beam SDK versions 2.63.0 or later.\u003c/p\u003e\n"],["\u003cp\u003eDisabling data lineage requires cancelling the current job and running a new version without the \u003ccode\u003eenable_lineage\u003c/code\u003e service option.\u003c/p\u003e\n"]]],[],null,["# Use data lineage in Dataflow\n\nData lineage is a Dataflow feature that lets you track\nhow data moves through your systems: where it comes from, where it is passed to,\nand what transformations are applied to it.\n\nEach pipeline that you run by using Dataflow has several associated\ndata assets. The lineage of a data asset includes its origin, what happens to\nit, and where it moves over time. With data lineage, you can track\nthe end-to-end movement of your data assets, from origin to eventual destination.\n\nWhen you enable data lineage for your\nDataflow jobs, Dataflow\ncaptures lineage events and publishes them to the Dataplex Universal Catalog\n[Data Lineage API](/dataplex/docs/reference/data-lineage/rest).\n\nTo access lineage information through Dataplex Universal Catalog, see\n[Use data lineage with Google Cloud systems](/dataplex/docs/use-lineage).\n\nBefore you begin\n----------------\n\nSet up your project:\n\n\n- Sign in to your Google Cloud account. If you're new to Google Cloud, [create an account](https://console.cloud.google.com/freetrial) to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Dataplex, BigQuery, and Data lineage APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataplex.googleapis.com,bigquery.googleapis.com,datalineage.googleapis.com)\n\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Dataplex, BigQuery, and Data lineage APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataplex.googleapis.com,bigquery.googleapis.com,datalineage.googleapis.com)\n\n\u003cbr /\u003e\n\n| **Caution:** Data lineage is enabled on a per-project basis, not a per-service basis. After you enable the Data Lineage API, lineage information is automatically reported for multiple Google Cloud services in the project, depending on their product-level lineage control. For more details, see [Data lineage considerations](/dataplex/docs/lineage-considerations).\n\nIn Dataflow, you also need to enable lineage at the job level.\nSee [Enable data lineage in Dataflow](#enable-data-lineage) in\nthis document.\n\n### Required roles\n\n\nTo get the permissions that\nyou need to view lineage visualization graphs,\n\nask your administrator to grant you the\nfollowing IAM roles:\n\n- [Dataplex Catalog viewer](/iam/docs/roles-permissions/dataplex#dataplex.catalogViewer) (`roles/dataplex.catalogViewer`) on the Dataplex Universal Catalog resource project\n- [Data Lineage Viewer](/iam/docs/roles-permissions/datalineage#datalineage.viewer) (`roles/datalineage.viewer`) on the project where you use Dataflow\n- [Dataflow viewer](/iam/docs/roles-permissions/dataflow#dataflow.viewer) (`roles/dataflow.viewer`) on the project where you use Dataflow\n\n\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\nYou might also be able to get\nthe required permissions through [custom\nroles](/iam/docs/creating-custom-roles) or other [predefined\nroles](/iam/docs/roles-overview#predefined).\n\nFor more information about data lineage roles, see\n[Predefined roles for data lineage](/dataplex/docs/iam-roles#lineage-roles).\n\nSupport and limitations\n-----------------------\n\nData lineage in Dataflow has the following limitations:\n\n- Data lineage is supported in the Apache Beam SDK versions 2.63.0 and later.\n- You must enable data lineage on a per-job basis.\n- Data capture isn't instantaneous. It can take a few minutes for Dataflow job lineage data to appear in Dataplex Universal Catalog.\n- The following sources and sinks are supported:\n\n - Apache Kafka\n - BigQuery\n - Bigtable\n - Cloud Storage\n - JDBC (Java Database Connectivity)\n - Pub/Sub\n - Spanner\n\n [Dataflow templates](/dataflow/docs/guides/templates/provided-templates)\n that use these sources and sinks also automatically capture and publish\n lineage events.\n\nEnable data lineage in Dataflow\n-------------------------------\n\nYou need to enable lineage at the job level. To enable data lineage,\nuse the `enable_lineage`\n[Dataflow service option](/dataflow/docs/reference/service-options)\nas follows: \n\n### Java\n\n --dataflowServiceOptions=enable_lineage=true\n\n### Python\n\n --dataflow_service_options=enable_lineage=true\n\n### Go\n\n --dataflow_service_options=enable_lineage=true\n\n### gcloud\n\nUse the\n[`gcloud dataflow jobs run`](/sdk/gcloud/reference/dataflow/jobs/run) command\nwith the `additional-experiments` option. If you're using Flex Templates, use\nthe\n[`gcloud dataflow flex-template run`](/sdk/gcloud/reference/dataflow/flex-template/run)\ncommand. \n\n --additional-experiments=enable_lineage=true\n\nOptionally, you can specify one or both of the following parameters with the\nservice option:\n\n- `process_id`: A unique identifier that Dataplex Universal Catalog uses to group job runs. If not specified, the job name is used.\n- `process_name`: A human-readable name for the data lineage process. If not specified, the job name prefixed with `\"Dataflow \"` is used.\n\nSpecify these options as follows: \n\n### Java\n\n --dataflowServiceOptions=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\n### Python\n\n --dataflow_service_options=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\n### Go\n\n --dataflow_service_options=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\n### gcloud\n\n --additional-experiments=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\nView lineage in Dataplex Universal Catalog\n------------------------------------------\n\nData lineage provides information about the relations between your project\nresources and the processes that created them. You can view data lineage\ninformation in the Google Cloud console in the form of a graph or a\nsingle table. You can also retrieve data lineage information from the\nData Lineage API in the form of JSON data.\n\nFor more information, see\n[Use data lineage with Google Cloud systems](/dataplex/docs/use-lineage).\n\nDisable data lineage in Dataflow\n--------------------------------\n\nIf data lineage is enabled for a specific job and you want to disable\nit, cancel the existing job and run a new version of the job without the\n`enable_lineage` service option.\n\nBilling\n-------\n\nUsing data lineage in Dataflow doesn't impact your\nDataflow bill, but it might incur additional charges on your\nDataplex Universal Catalog bill. For more information, see\n[Data lineage considerations](/dataplex/docs/lineage-considerations)\nand [Dataplex Universal Catalog pricing](/dataplex/pricing).\n\nWhat's next\n-----------\n\n- Learn more about [data lineage](/dataplex/docs/about-data-lineage).\n- Learn how to [use\n data lineage](/dataplex/docs/use-lineage)."]]