Data lineage considerations

Data lineage is enabled on a per-project basis. This means that once you enable the Data Lineage API, lineage information can be automatically reported for multiple systems depending on their product-level lineage control.

Automatic lineage tracking is supported for the following systems:

Product-level lineage controls in Google Cloud supported systems
System Available lineage controls
BigQuery,
Cloud Data Fusion
There is no configurability to restrict lineage tracking to only Cloud Data Fusion or BigQuery when Data Lineage API is enabled in a project.
Cloud Composer Cloud Composer uses environment-level data lineage integration control. Data lineage is automatically enabled for all new Cloud Composer environments, provided they meet the requirements. See Data lineage with Dataplex for more information. For existing environments, you can enable or disable data lineage integration in environment settings.
Dataproc Dataproc Spark jobs can capture lineage events and publish them to the Data Lineage API. See Data lineage Dataproc integration for more information.
Vertex AI Data lineage is automatically enabled for Vertex AI artifacts and parameters, such as models, datasets, pipeline templates, and components. The lineage of a pipeline includes factors that contributed to its creation, as well as artifacts and metadata derived afterwards. See Track the lineage of pipeline artifacts for more information.

Billing impact

When you enable Data Lineage API on a project, review the impact on your billing charges since Data Lineage API is enabled on a per-project basis (see the previous section for details).

For multi-regions, such as European Union (eu), Asia (asia), and United States (us), and for BigQuery Omni, lineage processing is distributed to specific regions, and costs depend on the regions where the processing is performed (see Data Catalog pricing examples).

Data lineage compliance

  • Data lineage records metadata about data movement but doesn't capture the data itself. See data lineage information model and Data Lineage API reference for details on what fields are included in the metadata.
  • Data lineage as part of Dataplex offers VPC-SC support.
  • Dataplex at present doesn't offer the ability to use Customer Managed Encryption Keys to protect the harvested lineage metadata.

Data lineage limitations

When you select a node in the lineage graph, the node details side panel will be empty when:

  1. the resources is located in another organization, or
  2. the user is not a member of the organization hosting the resource.