Troubleshoot data lineage issues

This document describes how to resolve issues with Data Catalog data lineage.

Project types

Data assets can reside in different projects. The following is a summary of possible projects and their asset names.

BigQuery storage project

This project stores your BigQuery data assets. You can find it in asset details as a part of Table ID, before the first dot.

In the BigQuery UI, the storage project name is shown in the
    Table ID field, before the first dot in the fully qualified table name.
Figure 1. The name of a BigQuery storage project.

Compute project

This project stores the data lineage metadata. For BigQuery, this is where you run a job. If you run a job using the Google Cloud console, you can find the compute project name in the project selector:

The BigQuery UI shows a compute project called docs-compute on
    the page where you run SQL queries.
Figure 2. The name of a compute project that runs BigQuery jobs.

When sending requests to the BigQuery API, specify the compute project in the URL, for example:

POST /bigquery/v2/projects/docs-compute/jobs HTTP/1.1
Host: bigquery.googleapis.com
User-Agent: Go-http-client/1.1
Authorization: <REDACTED 1031 BYTES>
Accept-Encoding: gzip
{
  "configuration": {
    "query": {
      "useLegacySql": false,
      "query": "CREATE OR REPLACE TABLE `docs-target.dataset.target-002` AS SELECT * FROM `docs-source.dataset.source-002`;"
    }
  },
  "jobReference": {
    "projectId": "docs-compute",
    "jobId": "docs-compute-job-id",
    "location": "us",
  }
}

Active project

This is the project from which you are viewing the data lineage. The Google Cloud console shows the active project in the project selector. If you're using the API, the active project is the project from which you're making API calls.

The BigQuery UI shows the data lineage for a
    dataset called source-001, which is in a project called docs-source.
Figure 3. The active project in the the Google Cloud console.

BigQuery data lineage not showing

The following issue occurs after running a BigQuery job. In this case, the problem can be caused by three scenarios:

  • The Data Lineage API is disabled in the active project or the compute project.
  • You don't have Data lineage Viewer (roles/datalineage.viewer) in the active or the compute project.
  • The data lineage hasn't arrived yet. Depending on the volume and complexity of the data being processed, it can take from standard 30 minutes up to 24 hours for the data lineage to display.

If you see the message "Fetching lineage failed due to missing permissions." on the bottom of the page, you are missing permissions on the active project. Otherwise you are missing permissions on the compute project.

An empty lineage graph.
Figure 4. Example of lineage not showing in BigQuery UI.

To resolve this issue, check if the Data Lineage API is enabled for the compute project. After enabling the API, you need to run a job to see the data lineage. Depending on the volume and complexity of the data being processed, it can take from standard 30 minutes up to 24 hours for the data lineage to display.

Next, check if the Data Lineage API is enabled for the active project.

When the Data Lineage API is enabled, grant Data lineage Viewer (roles/datalineage.viewer) in both the active and the compute projects.

BigQuery process metadata not showing

The following issue occurs when you open the table details pane, which doesn't show all the details like the SQL statement or the Process type property. This happens even though the data lineage displays properly.

This can happen when you don't have permissions to see metadata in the compute project.

Example:

  • BigQuery source table: docs-source.dataset.source-001
  • BigQuery target table: docs-target.dataset.target-001
  • Data lineage between docs-source.dataset.source-001 and docs-target.dataset.target-001 in compute project docs-compute
  • You have the Data lineage Viewer role for active and compute docs-compute projects.

Clicking the BigQuery process details displays the following message in the Google Cloud console:

You don't have permission to view BigQuery process metadata in project X.
In the BigQuery UI, on the Lineage tab, the Details pane shows
    an error message.
Figure 5. Example of BigQuery process details not showing in BigQuery UI.

To resolve this issue, grant the user bigquery.jobs.get permission (for example included in BigQuery Resource Viewer role) in the compute project.

BigQuery table details not showing

The following issue occurs when you open the table details pane, which shows only the Fully qualified name property. This happens even though the data lineage displays properly. This can happen when you don't have all required permissions in the table's storage projects.

Example:

  • BigQuery table docs-source.dataset.source-001
  • BigQuery table docs-target.dataset.target-001
  • Data lineage between docs-source.dataset.source-001 and docs-target.dataset.target-001 with compute project docs-compute
  • You have the Data lineage Viewer role for the active and compute docs-compute projects

In this case, when you click on BigQuery node details, you can see a message Entry with this fully qualified name is not available in the Data Catalog.

BigQuery table details not showing.
Figure 6. Example of BigQuery table details not showing in BigQuery UI.

To resolve this issue, grant the bigquery.tables.get permissions (for example included in BigQuery Data Viewer role) in the storage project.