This page describes how to view the data lineage generated by your Cloud Data Fusion pipelines with other data movement on Google Cloud, for discovery and governance purposes. You can view the lineage graphs for supported data sources on the Dataplex page in the console, or use the Data Lineage API to retrieve complete data lineage records.
Plugins that support Dataplex data lineage
Cloud Data Fusion and Dataplex support asset-level lineage for the following plugins:
- Amazon S3
- BigQuery
- BigQuery Multi Table sink (version 6.9.1 and later)
- Spanner
- Cloud Storage
- Cloud SQL for MySQL
- Cloud SQL for PostgreSQL
- Dataplex
- FTP
- Generic Database
- HTTP
- MSSQL/SQL Server
- Multiple Database Tables source (version 6.9.1 and later)
- MySQL
- Oracle
- PostgreSQL
- SAP OData
- SAP ODP
- SAP Table
For more information, see Cloud Data Fusion plugins.
Before you begin
To enable viewing Cloud Data Fusion lineage graphs on the Dataplex page in the console, do the following:
Create a data pipeline that uses only the supported plugins.
Enable the Data Lineage API in the project that contains your Cloud Data Fusion instance.
Grant the Data Lineage Events Producer role (
roles/datalineage.producer
) to the Cloud Data Fusion-managed service account, the Cloud Data Fusion API Service Agent. The process varies if your instance runs in an earlier version of Cloud Data Fusion and RBAC is enabled.6.10+ or no RBAC
If your Cloud Data Fusion instance uses version 6.10.0 or later, or your instance uses an earlier version and RBAC isn't enabled, follow these steps:
In the Google Cloud console, go to the IAM page.
Select the Include Google-provided role grants checkbox.
Select the Cloud Data Fusion API Service Agent service account and click
Edit.Click Add another role and select the Data Lineage Events Producer role.
Click Save.
<6.10 with RBAC
If your Cloud Data Fusion instance uses a version earlier than 6.10.0 and RBAC is enabled, the service account doesn't appear in the list of principals on the IAM page. You must enter the service account name manually.
To grant the required role, follow these steps:
In the Google Cloud console, go to the IAM page.
Click Grant access.
In the New principals field, enter the Cloud Data Fusion API Service Agent service account. Use the following format:
datafusion-system@TENANT_PROJECT_ID.iam.gserviceaccount.com
.Replace
TENANT_PROJECT_ID
with the tenant ID for your instance. To view the tenant project ID, go to the Instances page and click the instance name for instance details.Select the Data Lineage Events Producer role.
Click Save.
Enable Dataplex data lineage in Cloud Data Fusion
For new instances in Cloud Data Fusion, Dataplex data lineage is turned off by default. If you created the instance before January 27, 2024 with version 6.8.0 or later, it's turned on by default after completing the steps in Before you begin.
Enable Dataplex data lineage when you create an instance
Console
To enable Dataplex data lineage when you create an instance, follow these steps:
Go to the Cloud Data Fusion Instances page and click Create an instance.
When you configure the instance, expand the Advanced options section and click Enable integration with Dataplex data lineage. For more information about creating instances, see Create a public instance.
REST API
To enable Dataplex data lineage when you create an instance,
set the optional dataplex_data_lineage_integration_enabled
property to
true
:
echo '{ "description": "CDAPinstance", "dataplex_data_lineage_integration_enabled": "true"}' | curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
--data @- \
"https://datafusion.googleapis.com/v1/projects/PROJECT/locations/LOCATION/instances?instanceId=INSTANCE_NAME"
To turn it off, either set the property to false or omit the property, as lineage is turned off by default when you create a new instance.
Enable or disable Dataplex data lineage in an existing instance
Console
To enable or disable Dataplex data lineage in an existing instance in Cloud Data Fusion, follow these steps:
- View the instance details:
In the Google Cloud console, go to the Cloud Data Fusion page.
Click Instances, and then click the instance's name to go to the Instance details page.
- In the Dataplex data lineage integration field, click Edit.
- Enable or disable Dataplex data lineage, and then click Save.
REST API
To enable Dataplex data lineage in an existing instance in
Cloud Data Fusion, set the dataplex_data_lineage_integration_enabled
property to true
and include the updateMask
parameter value:
echo '{ "description": "CDAPinstance", "dataplex_data_lineage_integration_enabled": "true"}' | curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
--data @- \
"https://datafusion.googleapis.com/v1/projects/PROJECT/locations/LOCATION/instances?instanceId=INSTANCE_NAME?updateMask=dataplex_data_lineage_integration_enabled"
To disable Dataplex data lineage in an existing instance in
Cloud Data Fusion, set the dataplex_data_lineage_integration_enabled
property to false
and include the updateMask
parameter value:
echo '{ "description": "CDAPinstance", "dataplex_data_lineage_integration_enabled": "false"}' | curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
--data @- \
"https://datafusion.googleapis.com/v1/projects/PROJECT/locations/LOCATION/instances?instanceId=INSTANCE_NAME?updateMask=dataplex_data_lineage_integration_enabled"
View data lineage graphs
To view lineage graphs for entities across all Google Cloud services, do the following:
Go to your instance in Cloud Data Fusion and run a data pipeline that uses supported plugins.
View the lineage graphs on the Dataplex page in the console and find the asset for which you want to view lineage information.
Limitations
Viewing lineage in Dataplex has the following limitations:
The lineage in Dataplex is only discoverable if there is a BigQuery entity connected to the supported plugins. For more information about when data lineage graphs are available, see About data lineage.
The Data Lineage API doesn't support customer-managed encryption keys (CMEK).
Cloud Data Fusion doesn't support this feature in
me-central1
oreurope-west12
locations.Review the data lineage considerations.
What's next
- Learn more about data lineage.