Documenting data assets at scale is difficult, especially when they are used by different groups in an organization with varying needs. Often each group will create their own set of documentation and metadata to describe the same data, resulting in duplicated effort and incomplete information. Data Catalog solves this problem with tags, which enable organizations to create, search, and manage metadata for all their data assets in a unified service.
This page explains two key Data Catalog concepts: tags, which allow you to provide context for a data asset by attaching custom metadata fields, and tag templates, reusable structures that can be used to rapidly create new tags.
Tags are sometimes called "business metadata". Adding tags to a data asset helps provide meaningful context to anyone who needs to use the asset. For example, a tag could tell you who is responsible for a particular data asset, whether it contains personally identifiable information (PII), the data retention policy for the asset, a data quality score, etc.
Tags contain one or more fields where information can be stored. The fields in a tag are defined by a tag template, and each field can be used to store one or more values. Every tag is an instance of a tag template, which can be applied to an entire data asset, or to particular tables or columns. A tag on a column could tell you, for example, if that column contains PII, whether it's been deprecated, or what formula was used to calculate a certain value.
The following diagram shows a sample customer table
cust_tbl, with several
business metadata tags attached to the table and its columns.
To start tagging data, you first need to create one or more tag templates. A tag template is a group of metadata key-value pairs called fields. Having a set of templates is similar to having a database schema for your metadata.
This allows you to structure your tags by topic. For example:
- A data governance tag with fields for: data governor, retention date, deletion date, PII (yes or no), data classification (public, confidential, sensitive, regulatory)
- A data quality tag with fields for: quality issues, update frequency, SLO information
- A data usage tag with fields for: top users, top queries, average daily users
You can then mix and match tags, using only the tags relevant for each data asset and your business needs.
To learn how to create tag templates, see the quickstart Tagging tables
Each field contains an ID, a display name and a type. The type can be a
enum (enumeration), or
datetime. When the type is
enum, the template also stores the allowed values for the field.
Here is an example tag template from the quickstart, containing multiple field types:
And here is a tag created from the template, with values provided for each field:
Fields are stored in the template as an ordered set, where the order represents the relative importance of a field relative to the other fields.
Fields are optional unless marked as required. A required field must be given a value when the template is used, while an optional field can be left empty.
Tag template gallery
To help you get started, Data Catalog includes a gallery of sample tag templates to illustrate common tagging use cases. Use these examples to learn about the power of tagging, for inspiration, or as a starting point for creating your own tagging infrastructure.
You can find the tag template gallery by clicking CREATE then selecting Create tag template. The template gallery is displayed at the top of the Create template page.
After you've selected a template from the gallery, you can use it just like any other tag template. You can add and delete attributes, and change anything in the template to suit your business needs. You can then search for the template fields and values using Data Catalog.
Tags and their metadata can contain sensitive information, and data governance teams might want certain tags to be visible only to select groups of users. Data Catalog provides access control on templates, and these settings extend to all tags created using that template.
You can set up templates with many access control configurations, for example:
- A template that only the template creator can use to create tags
- A template that creates tags that are only visible to a select set of users
- A template that a select set of users can use to create tags that are only visible to another (possibly identical) set of users
Access to a tag template is granted or denied with IAM roles. These provide
permissions to create, edit, and use the template. For example, the
grants permission to use a tag template to tag resources.
See Data Catalog Identity and Access Management for more information.
Using tag templates in multiple projects
Everything in Google Cloud lives in a project, including your tag templates. However, you
can use tag templates from one project to create tags in another, as long as you
authorize the other project to use the templates. There are predefined IAM roles
to help implement this, such as the
For example: If project A grants the
TagTemplate User role to a service
account owned by project B, this allows project B to create tags using its
templates. Project A can also authorize the same service account to modify the
created tags using the
Tag Editor role.
If project A does not authorize project B, project B cannot tag its own data resources using project A's tag templates—it must create its own templates.
Best practice: We recommend that templates be created in a central project if they are relevant to more than one project. Also, your data governance team should own the shared tag templates and maintain them on behalf of the organization.
Every tag template and tag is stored in a particular GCP region. You can use a tag template to create a tag in any region, so you don't need to create copies of your template if you have data assets spread across multiple regions.