This page describes how to automatically apply Dataplex tags to BigQuery tables after Sensitive Data Protection profiles those tables. This page also provides example queries that you can use to find tagged data across your organization and projects.
This feature is useful if you want to enrich your manually curated metadata in Dataplex with insights gathered from Sensitive Data Protection data profiles. The generated tags include the following insights:
- Information types (infoTypes) detected in the columns of the table
- Calculated sensitivity level of the table
- Calculated data risk level of the table
Insights from Sensitive Data Protection data profiles can help you use Dataplex to discover sensitive and high-risk data in your organization. Use these insights to help you make informed decisions about how to manage and govern your data.
If you want to send the results of inspection jobs—not data profiling operations—to Dataplex, see Send Sensitive Data Protection inspection results to Data Catalog instead.
About data profiles
You can configure Sensitive Data Protection to automatically generate profiles about data across an organization, folder, or project. Data profiles contain metrics and metadata about your data and help you determine where sensitive and high-risk data reside. Sensitive Data Protection reports these metrics at various levels of detail. For information about the types of data you can profile, see Supported resources.
About Dataplex and Data Catalog
Dataplex is a Google Cloud service that unifies distributed data and automates data management and governance for that data. Data Catalog is a fully managed, scalable metadata management service within Dataplex.
Data Catalog lets you use tags and tag templates to attach business metadata to your data. You can then search and manage all metadata for your organization or project in a unified service. For more information, see Tags and tag templates.
How it works
If your discovery scan configuration has the Send to Dataplex as tags action enabled, Sensitive Data Protection does the following each time it profiles your data. This action is only applied to new and updated profiles. Existing profiles that aren't updated aren't sent to Dataplex.
Creates a private tag template containing the schema of the tags that will be attached to your BigQuery tables. For information about the name, ID, and location of the tag template, see Tag template details.
Only principals with the proper roles and permissions can view the tag template.
Creates a tag for each BigQuery table that you profile. The tag is based on the newly created tag template.
For example, a resulting tag attached to a table can have the following metadata:
Display name Value Column Insights
ccn: CREDIT_CARD_NUMBER
first_name: PERSON_NAME
last_name: PERSON_NAME
ssn: US_SOCIAL_SECURITY_NUMBER
email: EMAIL_ADDRESS
Column Sensitivity
ccn: HIGH
first_name: MODERATE
last_name: MODERATE
favorite_animal: LOW
ssn: HIGH
email: MODERATE
id: LOW
Data Risk Level
HIGH
Other InfoTypes
PHONE_NUMBER
Predicted InfoTypes
CREDIT_CARD_NUMBER,US_SOCIAL_SECURITY_NUMBER,EMAIL_ADDRESS,PERSON_NAME
Profile Last Generated
DATE at TIME
Sensitive Data Profile
organizations/ORGANIZATION_ID/locations/REGION/tableDataProfiles/TABLE_DATA_PROFILE_ID
Sensitivity Score
HIGH
A table has two tags if it was profiled through both of the following:
- An organization-level or folder-level scan configuration
- A project-level scan configuration
After the tables are tagged, you can search Dataplex for all data in your organization or project with specific tag values.
Tag template details
The template name, template ID, and the project where the new tag template is stored depend on the resource that the scan configuration pertains to.
- If the scan configuration is an organization-level or folder-level
configuration, the tag template is stored in the service agent
container. The name of
the tag template is
Sensitive Data Profile
. Its template ID issensitive_data_profile
. - If the scan configuration is a project-level configuration, the tag
template is stored in the project to be profiled. The name of
the tag template is
Sensitive Data Profile (Project)
. Its template ID issensitive_data_profile_project
.
Pricing
For information about how other Google Cloud services may charge you for exporting data profiles, see Pricing for exporting data profiles.
Automatically tag BigQuery tables based on data profiles
Create a scan configuration. Alternatively, edit an existing scan configuration.
- To create a scan configuration at the organization or folder level, see Profile data in an organization or folder.
- To create a scan configuration at the project level, see Profile data in a single project.
In the Add actions step, make sure Send to Dataplex as tags is turned on.
- If you're creating a scan configuration, this action is enabled by default.
- If you're editing a scan configuration, you must enable this action.
After the data is profiled and tagged, you can start searching for tagged data in Dataplex.
Roles and permissions for viewing tags
Dataplex search results show you only the data that you have access to. You need the following Identity and Access Management (IAM) roles or permissions to search for the tags that are attached to your BigQuery tables.
Purpose | Predefined role | Relevant permissions |
---|---|---|
View the private tag template | Data Catalog TagTemplate Viewer (roles/datacatalog.tagTemplateViewer ) |
datacatalog.tagTemplates.getTag |
View the tags applied to BigQuery tables | BigQuery Metadata Viewer (roles/bigquery.metadataViewer ) |
bigquery.datasets.get bigquery.tables.get |
For more information about Dataplex roles, see Roles to view public and private tags.
For information about granting a predefined role, see Grant a single role. If you want to use a custom role instead of a predefined role, make sure that the custom role has the relevant permissions. For more information, see a Create a custom role.
Find the generated tag template
In the Google Cloud console, go to the Dataplex Tag Templates page.
In the list, find the tag template. For information about the name, ID, and location of the tag template, see Tag template details.
Optional: To find the tag template that was generated by a given discovery scan configuration, enter the following in the Filter field:
name:PROJECT_ID.TAG_TEMPLATE_ID
Replace the following:
- PROJECT_ID: the ID of the project that is associated with the scan configuration. If you profiled your data at the organization or folder level, enter the project ID of the service agent container.
- TAG_TEMPLATE_ID:
sensitive_data_profile
if the scan configuration is for an organization or a folder;sensitive_data_profile_project
if the scan configuration is for a project.
Find the generated tag for a given table data profile
In the Google Cloud console, go to the Dataplex Search page.
In the Search field, enter the following:
name:TABLE_ID tag:PROJECT_ID.TAG_TEMPLATE_ID
Replace the following:
- TABLE_ID: the ID of the table that was profiled.
- PROJECT_ID: the ID of the project that contains the tag template. If you profiled your data at the organization or folder level, enter the project ID of the service agent container.
- TAG_TEMPLATE_ID:
sensitive_data_profile
if the scan configuration is for an organization or a folder;sensitive_data_profile_project
if the scan configuration is for a project.
In the list that appears, click the table ID. The details of the BigQuery table appear along with any
Sensitive Data Profile
orSensitive Data Profile (Project)
tags attached to it.A table has two tags if it was profiled through both of the following:
- An organization-level or folder-level scan configuration
- A project-level scan configuration
For information about how to perform a search through the Data Catalog API, see How to search for data assets.
Example search queries
This section provides example search queries that you can use in Dataplex to find data in your organization or project with specific tag values.
You can find only the data that you have access to. Data access is controlled through IAM permissions. For more information, see Roles and permissions for viewing tags on this page.
You can enter these queries in the Dataplex Search page in the Google Cloud console.
For information about how to form the queries, see Data Catalog search syntax. For information about how to perform a search through the Data Catalog API, see How to search for data assets.
Find all tables that are tagged using the new tag template
tag:PROJECT_ID.TAG_TEMPLATE_ID
Replace the following:
- PROJECT_ID: the ID of the project that contains the tag template. If you profiled your data at the organization or folder level, enter the project ID of the service agent container.
- TAG_TEMPLATE_ID:
sensitive_data_profile
if the scan configuration is for an organization or a folder;sensitive_data_profile_project
if the scan configuration is for a project.
The succeeding examples on this page don't include the project ID, so you might get results associated with various discovery scan configurations. To limit your results to a particular scan configuration, add the project ID to the query as shown in this example.
Find all tables that were last profiled before a given date
tag:TAG_TEMPLATE_ID.profile_last_generated<DATE
Replace the following:
- TAG_TEMPLATE_ID:
sensitive_data_profile
if the scan configuration is for an organization or a folder;sensitive_data_profile_project
if the scan configuration is for a project. - DATE: a date in the format
YYYY-MM-DD
—for example,2023-01-15
.
Find all tables with a given table-level sensitivity score
tag:TAG_TEMPLATE_ID.sensitivity_score=SENSITIVITY_SCORE
Replace the following:
- TAG_TEMPLATE_ID:
sensitive_data_profile
if the scan configuration is for an organization or a folder;sensitive_data_profile_project
if the scan configuration is for a project. - SENSITIVITY_SCORE: one of
HIGH
,MODERATE
, orLOW
.
For more information, see Data risk and sensitivity levels.
Find all tables with a given data risk level
tag:TAG_TEMPLATE_ID.data_risk_level=DATA_RISK_LEVEL
Replace the following:
- TAG_TEMPLATE_ID:
sensitive_data_profile
if the scan configuration is for an organization or a folder;sensitive_data_profile_project
if the scan configuration is for a project. - DATA_RISK_LEVEL: one of
HIGH
,MODERATE
, orLOW
.
For more information, see Data risk and sensitivity levels.
Find all tables that contain a given predicted infoType
tag:TAG_TEMPLATE_ID.predicted_info_types:INFOTYPE
Replace the following:
- TAG_TEMPLATE_ID:
sensitive_data_profile
if the scan configuration is for an organization or a folder;sensitive_data_profile_project
if the scan configuration is for a project. - INFOTYPE: the infoType—for example,
PERSON_NAME
.
For a list of all built-in infoTypes, see InfoType detector reference.
For more information, see Predicted infoType in the Metrics reference.
Find all tables that partially contain a given infoType
tag:TAG_TEMPLATE_ID.other_info_types:INFOTYPE
Replace the following:
- TAG_TEMPLATE_ID:
sensitive_data_profile
if the scan configuration is for an organization or a folder;sensitive_data_profile_project
if the scan configuration is for a project. - INFOTYPE: the infoType—for example,
PERSON_NAME
.
For a list of all built-in infoTypes, see InfoType detector reference.
For more information, see Other infoTypes in the Metrics reference.
Find all tables that contain a given column with a given predicted infoType
tag:TAG_TEMPLATE_ID.column_insights:COLUMN_NAME:INFOTYPE
Replace the following:
- TAG_TEMPLATE_ID:
sensitive_data_profile
if the scan configuration is for an organization or a folder;sensitive_data_profile_project
if the scan configuration is for a project. - COLUMN_NAME: the name of the column in the BigQuery table.
- INFOTYPE: the infoType—for example,
PERSON_NAME
.
For a list of all built-in infoTypes, see InfoType detector reference.
For more information, see Predicted infoType in the Metrics reference.
Find all tables that contain a given column with a given column-level sensitivity score
tag:TAG_TEMPLATE_ID.column_sensitivity:COLUMN_NAME:SENSITIVITY_SCORE
Replace the following:
- TAG_TEMPLATE_ID:
sensitive_data_profile
if the scan configuration is for an organization or a folder;sensitive_data_profile_project
if the scan configuration is for a project. - COLUMN_NAME: the name of the column in the BigQuery table.
- SENSITIVITY_SCORE: one of
HIGH
,MODERATE
, orLOW
.
For more information, see Data risk and sensitivity levels.
Truncated tag values
If the column heading data of a BigQuery table exceeds 10 MB, the
resulting tag might show [TRUNCATED]
in the Column Insights
or Column
Sensitivity
field. In this case, we recommend that you go to
Sensitive Data Protection to review the table data
profile and
associated column data profiles.