Work with Data Catalog

Data Catalog is a feature of Dataplex that integrates with BigQuery by automatically cataloging metadata about BigQuery resources like tables, datasets, views, and models. This document describes how to search these resources, view data lineage, and add tags by using Data Catalog.

Search for BigQuery resources

To use Data Catalog to search for BigQuery datasets, tables, and starred projects, follow these steps:

  1. In the Google Cloud console, go to the Dataplex Search page.

    Go to Search

  2. In the Search field, enter a query, and then click Search.

    Data Catalog search lets you find data across your projects and organizations.

    To refine your search parameters, use the Filters panel. For example, in the Systems section, select the BigQuery checkbox. The results are filtered to BigQuery systems.

You can perform basic searches in Data Catalog through the Google Cloud console. For more information about searching in the Google Cloud console, see Open a public dataset.

Data lineage

Data lineage is a Dataplex feature that lets you track how data moves through your systems: where it comes from, where it is passed to, and what transformations are applied to it. You can access the data lineage feature directly from BigQuery.

Enabling data lineage in your BigQuery project causes Dataplex to automatically record lineage information for tables created by the following operations:

Before you begin

In this section, you enable the Data Lineage API and grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Enable data lineage

  1. In the Google Cloud console, on the project selector page, select the project that contains the resources for which you want to track lineage.

    Go to project selector

  2. Enable the Data Lineage API and Data Catalog APIs.

    Enable the APIs

Required IAM roles

Lineage information is tracked automatically when you enable the Data Lineage API.

To get the permissions that you need to view lineage visualization graphs, ask your administrator to grant you the following IAM roles:

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

For more information, see Data lineage roles.

View lineage graphs in BigQuery

To view the data lineage visualization graph from BigQuery follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer panel, expand your project and dataset, then select a table.

  3. Click the Lineage tab.

    Data lineage tab.

    Your data lineage visualization graph is displayed.

    Data lineage graph.

  4. Optional: Select a node to view additional details about the entities or processes involved in constructing lineage information.

For more information about data lineage, see About data lineage.

Tags and tag templates

Tags let organizations create, search, and manage metadata for all their data entries in a unified service.

This section explains two key Data Catalog concepts:

  • Tags let you provide context for a data entry by attaching custom metadata fields.

  • Tag templates are reusable structures that you can use to rapidly create new tags.

Tags

Data Catalog provides two types of tags: private tags and public tags.

Private tags

Private tags provide strict access controls. You can search or view the tags and the data entries associated with the tags only if you are granted the required view permissions on both the private tag template and the data entries.

Searching for private tags in the Data Catalog page requires that you use the tag: search syntax or the search filters.

Private tags are suitable for scenarios where you need to store some sensitive information in the tag and you want to apply additional access restrictions beyond checking whether the user has the permissions to view the tagged entry.

Public tags

Public tags provide less strict access control for searching and viewing the tag as compared to private tags. Any user who has the required view permissions for a data entry can view all the public tags associated with it. View permissions for public tags are only required when you perform a search in Data Catalog using the tag: syntax or when you view an unattached tag template.

Public tags support both simple search and search with predicates in the Data Catalog search page. When you create a tag template, the option to create a public tag template is the default and recommended option in the Google Cloud console.

For example, let's assume you have a public tag template called employee data that you used to create tags for three data entries called Name, Location, and Salary. Among the three data entries, only members of a specific group called HR can view the Salary data entry. The other two data entries have view permissions for all employees of the company.

If any employee who is not a member of the HR group uses the Data Catalog search page and searches with the word employee, the search result displays only Name and Location data entries with the associated public tags.

Public tags are useful for a broad set of scenarios. Public tags support simple search and search with predicates, while private tags support only search with predicates.

Tag templates

To start tagging metadata, you first need to create one or more tag templates. A tag template can be a public or private tag template. When you create a tag template, the option to create a public tag template is the default and recommended option in the Google Cloud console. A tag template is a group of metadata key-value pairs called fields. Having a set of templates is similar to having a database schema for your metadata.

You can structure your tags by topic. For example:

  • A data governance tag with fields for data governor, retention date, deletion date, PII (yes or no), data classification (public, confidential, sensitive, regulatory)
  • A data quality tag with fields for quality issues, update frequency, SLO information
  • A data usage tag with fields for top users, top queries, average daily users

You can then mix and match tags, using only the tags relevant for each data asset and your business needs.

To help you get started, Data Catalog includes a gallery of sample tag templates to illustrate common tagging use cases. Use these examples to learn about the power of tagging, for inspiration, or as a starting point for creating your own tagging infrastructure.

To use a tag template gallery, perform the following steps:

  1. In the Google Cloud console, go to the Dataplex Tag templates page.

    Go to Tag templates

  2. Click Create tag template.

    The template gallery is displayed as part of the Create template page.

After you select a template from the gallery, you can use it just like any other tag template. You can add or delete attributes and change anything in the template to suit your business needs. You can then search for the template fields and values using Data Catalog.

For more information about tags and tag templates, see Tags and tag templates.

Regional resources

Every tag template and tag is stored in a particular Google Cloud region. You can use a tag template to create a tag in any region, so you don't need to create copies of your template if you have metadata entries spread across multiple regions.