This page describes how to view the data lineage generated by your Cloud Data Fusion pipelines with other data movement on Google Cloud, for discovery and governance purposes. You can view the lineage graphs for supported data sources on the Dataplex page in the console, or use the Data Lineage API to retrieve complete data lineage records.
Plugins that support data lineage in Dataplex
Cloud Data Fusion and Dataplex support asset-level lineage for the following plugins:
- Amazon S3
- BigQuery
- Cloud Spanner
- Cloud Storage
- Cloud SQL for MySQL
- Cloud SQL for PostgreSQL
- Dataplex
- FTP
- Generic Database
- MSSQL/SQL Server
- MySQL
- Oracle
- PostgreSQL
- SAP OData
- SAP ODP
- SAP Table
For more information, see Cloud Data Fusion plugins.
Before you begin
To enable viewing Cloud Data Fusion lineage graphs on the Dataplex page in the console, do the following:
Create a data pipeline that uses only the supported plugins.
Enable the Data Lineage API in the project that contains your Cloud Data Fusion instance.
Grant the Data Lineage Events Producer (
roles/datalineage.producer
) role to the Cloud Data Fusion-managed service account. For more information, see Data Catalog's predefined lineage roles.
When lineage is available
Viewing lineage in Dataplex has the following limitations:
The lineage in Dataplex is only discoverable if there is a BigQuery entity connected to the supported plugins. For more information about when data lineage graphs are available, see About data lineage.
The Data Lineage API doesn't support customer-managed encryption keys (CMEK).
Review the data lineage considerations.
View data lineage graphs
To view lineage graphs for entities across all Google Cloud services, do the following:
Go to your instance in Cloud Data Fusion and run a data pipeline that uses supported plugins.
View the lineage graphs on the Dataplex page in the console and find the asset for which you want to view lineage information.
What's next
- Learn more about data lineage.