Sign in to your Google Cloud account. If you're new to
Google Cloud,
create an account to evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-08-18 (世界標準時間)。"],[[["\u003cp\u003eData lineage in Dataflow tracks how data moves through your systems, including its origin, transformations, and destination, allowing for end-to-end data asset movement tracking.\u003c/p\u003e\n"],["\u003cp\u003eEnabling data lineage for Dataflow jobs captures lineage events and publishes them to the Dataplex Data Lineage API, and it is done on a per-project basis and at the job level using the \u003ccode\u003eenable_lineage\u003c/code\u003e service option.\u003c/p\u003e\n"],["\u003cp\u003eViewing lineage information in Dataplex can be done through a visualization graph or a single table in the Google Cloud console, as well as retrieving JSON data from the Data Lineage API.\u003c/p\u003e\n"],["\u003cp\u003eSupported sources and sinks for data lineage in Dataflow include Apache Kafka, BigQuery, Bigtable, Cloud Storage, JDBC, Pub/Sub, and Spanner, and the feature requires Apache Beam SDK versions 2.63.0 or later.\u003c/p\u003e\n"],["\u003cp\u003eDisabling data lineage requires cancelling the current job and running a new version without the \u003ccode\u003eenable_lineage\u003c/code\u003e service option.\u003c/p\u003e\n"]]],[],null,["Data lineage is a Dataflow feature that lets you track\nhow data moves through your systems: where it comes from, where it is passed to,\nand what transformations are applied to it.\n\nEach pipeline that you run by using Dataflow has several associated\ndata assets. The lineage of a data asset includes its origin, what happens to\nit, and where it moves over time. With data lineage, you can track\nthe end-to-end movement of your data assets, from origin to eventual destination.\n\nWhen you enable data lineage for your\nDataflow jobs, Dataflow\ncaptures lineage events and publishes them to the Dataplex Universal Catalog\n[Data Lineage API](/dataplex/docs/reference/data-lineage/rest).\n\nTo access lineage information through Dataplex Universal Catalog, see\n[Use data lineage with Google Cloud systems](/dataplex/docs/use-lineage).\n\nBefore you begin\n\nSet up your project:\n\n\n- Sign in to your Google Cloud account. If you're new to Google Cloud, [create an account](https://console.cloud.google.com/freetrial) to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Dataplex, BigQuery, and Data lineage APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataplex.googleapis.com,bigquery.googleapis.com,datalineage.googleapis.com)\n\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Dataplex, BigQuery, and Data lineage APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataplex.googleapis.com,bigquery.googleapis.com,datalineage.googleapis.com)\n\n\u003cbr /\u003e\n\n| **Caution:** Data lineage is enabled on a per-project basis, not a per-service basis. After you enable the Data Lineage API, lineage information is automatically reported for multiple Google Cloud services in the project, depending on their product-level lineage control. For more details, see [Data lineage considerations](/dataplex/docs/lineage-considerations).\n\nIn Dataflow, you also need to enable lineage at the job level.\nSee [Enable data lineage in Dataflow](#enable-data-lineage) in\nthis document.\n\nRequired roles\n\n\nTo get the permissions that\nyou need to view lineage visualization graphs,\n\nask your administrator to grant you the\nfollowing IAM roles:\n\n- [Dataplex Catalog viewer](/iam/docs/roles-permissions/dataplex#dataplex.catalogViewer) (`roles/dataplex.catalogViewer`) on the Dataplex Universal Catalog resource project\n- [Data Lineage Viewer](/iam/docs/roles-permissions/datalineage#datalineage.viewer) (`roles/datalineage.viewer`) on the project where you use Dataflow\n- [Dataflow viewer](/iam/docs/roles-permissions/dataflow#dataflow.viewer) (`roles/dataflow.viewer`) on the project where you use Dataflow\n\n\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\nYou might also be able to get\nthe required permissions through [custom\nroles](/iam/docs/creating-custom-roles) or other [predefined\nroles](/iam/docs/roles-overview#predefined).\n\nFor more information about data lineage roles, see\n[Predefined roles for data lineage](/dataplex/docs/iam-roles#lineage-roles).\n\nSupport and limitations\n\nData lineage in Dataflow has the following limitations:\n\n- Data lineage is supported in the Apache Beam SDK versions 2.63.0 and later.\n- You must enable data lineage on a per-job basis.\n- Data capture isn't instantaneous. It can take a few minutes for Dataflow job lineage data to appear in Dataplex Universal Catalog.\n- The following sources and sinks are supported:\n\n - Apache Kafka\n - BigQuery\n - Bigtable\n - Cloud Storage\n - JDBC (Java Database Connectivity)\n - Pub/Sub\n - Spanner\n\n [Dataflow templates](/dataflow/docs/guides/templates/provided-templates)\n that use these sources and sinks also automatically capture and publish\n lineage events.\n\nEnable data lineage in Dataflow\n\nYou need to enable lineage at the job level. To enable data lineage,\nuse the `enable_lineage`\n[Dataflow service option](/dataflow/docs/reference/service-options)\nas follows: \n\nJava \n\n --dataflowServiceOptions=enable_lineage=true\n\nPython \n\n --dataflow_service_options=enable_lineage=true\n\nGo \n\n --dataflow_service_options=enable_lineage=true\n\ngcloud\n\nUse the\n[`gcloud dataflow jobs run`](/sdk/gcloud/reference/dataflow/jobs/run) command\nwith the `additional-experiments` option. If you're using Flex Templates, use\nthe\n[`gcloud dataflow flex-template run`](/sdk/gcloud/reference/dataflow/flex-template/run)\ncommand. \n\n --additional-experiments=enable_lineage=true\n\nOptionally, you can specify one or both of the following parameters with the\nservice option:\n\n- `process_id`: A unique identifier that Dataplex Universal Catalog uses to group job runs. If not specified, the job name is used.\n- `process_name`: A human-readable name for the data lineage process. If not specified, the job name prefixed with `\"Dataflow \"` is used.\n\nSpecify these options as follows: \n\nJava \n\n --dataflowServiceOptions=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\nPython \n\n --dataflow_service_options=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\nGo \n\n --dataflow_service_options=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\ngcloud \n\n --additional-experiments=enable_lineage=process_id=\u003cvar translate=\"no\"\u003ePROCESS_ID\u003c/var\u003e;process_name=\u003cvar translate=\"no\"\u003eDISPLAY_NAME\u003c/var\u003e\n\nView lineage in Dataplex Universal Catalog\n\nData lineage provides information about the relations between your project\nresources and the processes that created them. You can view data lineage\ninformation in the Google Cloud console in the form of a graph or a\nsingle table. You can also retrieve data lineage information from the\nData Lineage API in the form of JSON data.\n\nFor more information, see\n[Use data lineage with Google Cloud systems](/dataplex/docs/use-lineage).\n\nDisable data lineage in Dataflow\n\nIf data lineage is enabled for a specific job and you want to disable\nit, cancel the existing job and run a new version of the job without the\n`enable_lineage` service option.\n\nBilling\n\nUsing data lineage in Dataflow doesn't impact your\nDataflow bill, but it might incur additional charges on your\nDataplex Universal Catalog bill. For more information, see\n[Data lineage considerations](/dataplex/docs/lineage-considerations)\nand [Dataplex Universal Catalog pricing](/dataplex/pricing).\n\nWhat's next\n\n- Learn more about [data lineage](/dataplex/docs/about-data-lineage).\n- Learn how to [use\n data lineage](/dataplex/docs/use-lineage)."]]