Tag a BigQuery table by using Data Catalog
This quickstart helps you complete the following tasks:
Create a BigQuery dataset and table.
Create a tag template with a schema that defines five tag fields of distinct types. These are
string
,double
,boolean
,enumerated
, andrichtext
.Lookup the Data Catalog entry for your table.
In the Google Cloud console, create business metadata for your entry that includes an overview, data steward, and a tag.
Data Catalog lets you search and tag entries such as BigQuery tables with metadata. Some examples of metadata that you can use for tagging include public and private tags, data stewards, and rich text overview.
Before you begin
- Set up your project.
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Enable the Data Catalog and BigQuery APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Enable the Data Catalog and BigQuery APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
Add a public data entry to your project
Data Catalog entries include data resources such as a BigQuery dataset or a Pub/Sub topic.
Add a public dataset to your project.
In Google Cloud console, go to the BigQuery page.
In the Explorer section, click Add data and select Public datasets from the list.
In the Marketplace panel, search for
New York taxi trips
and click the relevant search result.Click View Dataset.
Create a dataset and a table
Create a dataset.
In the Google Cloud console, open the BigQuery page.
In the Explorer panel, select the project where you want to create the dataset.
Click the
Actions icon and click Create dataset.In the Create dataset page, fill in the following details:
- For Dataset ID, enter
demo_dataset
. - For Data location, select
us (multiple regions in United States)
. - Enable table expiration and specify the number of days.
- For Encryption, leave the Google-managed encryption key option selected.
Click Create dataset.
- For Dataset ID, enter
Copy a publicly accessible table to
demo_dataset
.In the Google Cloud console, open the BigQuery page.
In the Explorer pane, search for
tlc_yellow_trips
tables (click Broaden search to all projects if required) and select one of them, such astlc_yellow_trips_2017
. Then click Copy.In the Copy table pane, fill in the following information:
- In the Project name drop-down list, select your project.
- In the Dataset name drop-down list, select
demo_dataset
. - For the Table name, enter
trips
, then click Copy.
In the Explorer pane, confirm that the
trips
table is listed indemo_dataset
.
You add Data Catalog tags to the table in the next section.
Create a public tag template and attach a tag for your entry
You must be the dataset owner to attach a tag to a table in the dataset. For more information about public and private tags, see Public and private tags.
In a tag template, tag fields are optional. You don't have to provide a value for a field when attaching a tag to a Data Catalog entry. However, if a template defines a field as required, you must provide a value for the field. If the value is not provided, an error is generated.
You can use lower case letters and underscores to define field names. The tag template fields created in this example are demo fields and aren't auto-updated or synced with BigQuery.
Console
Go to the Dataplex > Tag Templates page. Click Create tag template and enter the following details: Click Add field to add 5 fields.
Use the following table and keep Field description empty. Add values Click Create. The Template details page lists all the information about the tag template. To attach a tag to For Choose search platform, select Data Catalog as the search mode. In the search box, enter Click the Click Attach tags. In the Attach tags panel, enter the following details: Click Save. The tag fields are now listed in the Tags section in the BigQuery table details.
Demo Tag Template
.
Field display name
Field ID
Required field
Type
Source of data asset
source
Yes
String
Rows in the asset
num_rows
No
Double
Has PII
has_pii
No
Boolean
PII type
pii_type
No
Enumerated
EMAIL_ADDRESS
, US_SOCIAL_SECURITY_NUMBER
, and NONE
.
Context
context
No
Richtext
demo_dataset
, go to the Dataplex search page.demo_dataset
.
In the search result, you see the demo_dataset
dataset and the trips
table.trips
table.
A BigQuery table details page opens.
trips
.Demo Tag Template
.
Copied from tlc_yellow_trips_2017
113496874
FALSE
NONE
gcloud
Run the
gcloud data-catalog tag-templates create
command shown below to create a tag template with the following five tag fields:
display_name:
Source of data asset
id:
source
required:
TRUE
type:
String
display_name:
Number of rows in the data asset
id:
num_rows
required:
FALSE
type:
Double
display_name:
Has PII
id:
has_pii
required:
FALSE
type:
Boolean
display_name:
PII type
id:
pii_type
required:
FALSE
type:
Enumerated
values:
# -------------------------------
# Create a Tag Template.
# -------------------------------
gcloud data-catalog tag-templates create demo_template \
--location=us-central1 \
--display-name="Demo Tag Template" \
--field=id=source,display-name="Source of data asset",type=string,required=TRUE \
--field=id=num_rows,display-name="Number of rows in the data asset",type=double \
--field=id=has_pii,display-name="Has PII",type=bool \
--field=id=pii_type,display-name="PII type",type='enum(EMAIL_ADDRESS|US_SOCIAL_SECURITY_NUMBER|NONE)'
# -------------------------------
# Lookup the Data Catalog entry for the table.
# -------------------------------
ENTRY_NAME=$(gcloud data-catalog entries lookup '//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET/tables/TABLE' --format="value(name)")
# -------------------------------
# Attach a Tag to the table.
# -------------------------------
# Create the Tag file.
cat > tag_file.json << EOF
{
"source": "BigQuery",
"num_rows": 1000,
"has_pii": true,
"pii_type": "EMAIL_ADDRESS"
}
EOF
gcloud data-catalog tags create --entry=${ENTRY_NAME} \
--tag-template=demo_template --tag-template-location=us-central1 --tag-file=tag_file.json
Go
Before trying this sample, follow the Go setup instructions in the
Data Catalog quickstart using
client libraries.
For more information, see the
Data Catalog Go API
reference documentation.
To authenticate to Data Catalog, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the
Data Catalog quickstart using
client libraries.
For more information, see the
Data Catalog Java API
reference documentation.
To authenticate to Data Catalog, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the
Data Catalog quickstart using
client libraries.
For more information, see the
Data Catalog Node.js API
reference documentation.
To authenticate to Data Catalog, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Python
Before trying this sample, follow the Python setup instructions in the
Data Catalog quickstart using
client libraries.
For more information, see the
Data Catalog Python API
reference documentation.
To authenticate to Data Catalog, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
REST & CMD LINE
If you do not have access to Cloud Client libraries for your language or
want to test the API using REST requests, see the following examples
and refer to the
Data Catalog REST API
documentation.
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, expand one of these options: You should receive a JSON response similar to the following:
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, expand one of these options: You should receive a JSON response similar to the following:
Before using any of the request data,
make the following replacements:
HTTP method and URL:
Request JSON body:
To send your request, expand one of these options: You should receive a JSON response similar to the following:REST
Create a tag template.
POST https://datacatalog.googleapis.com/v1/projects/project-id/locations/region/tagTemplates?tagTemplateId=demo_tag_template
{
"displayName":"Demo Tag Template",
"fields":{
"source":{
"displayName":"Source of data asset",
"isRequired": "true",
"type":{
"primitiveType":"STRING"
}
},
"num_rows":{
"displayName":"Number of rows in data asset",
"isRequired": "false",
"type":{
"primitiveType":"DOUBLE"
}
},
"has_pii":{
"displayName":"Has PII",
"isRequired": "false",
"type":{
"primitiveType":"BOOL"
}
},
"pii_type":{
"displayName":"PII type",
"isRequired": "false",
"type":{
"enumType":{
"allowedValues":[
{
"displayName":"EMAIL_ADDRESS"
},
{
"displayName":"US_SOCIAL_SECURITY_NUMBER"
},
{
"displayName":"NONE"
}
]
}
}
}
}
}
{
"name":"projects/project-id/locations/us-central1/tagTemplates/demo_tag_template",
"displayName":"Demo Tag Template",
"fields":{
"num_rows":{
"displayName":"Number of rows in data asset",
"isRequired": "false",
"type":{
"primitiveType":"DOUBLE"
}
},
"has_pii":{
"displayName":"Has PII",
"isRequired": "false",
"type":{
"primitiveType":"BOOL"
}
},
"pii_type":{
"displayName":"PII type",
"isRequired": "false",
"type":{
"enumType":{
"allowedValues":[
{
"displayName":"EMAIL_ADDRESS"
},
{
"displayName":"NONE"
},
{
"displayName":"US_SOCIAL_SECURITY_NUMBER"
}
]
}
}
},
"source":{
"displayName":"Source of data asset",
"isRequired":"true",
"type":{
"primitiveType":"STRING"
}
}
}
}
Lookup the Data Catalog
entry-id
for your BigQuery table
GET https://datacatalog.googleapis.com/v1/entries:lookup?linkedResource=//bigquery.googleapis.com/projects/project-id/datasets/demo_dataset/tables/trips
Request body is empty.
{
"name": "projects/project-id/locations/US/entryGroups/@bigquery/entries/entry-id",
"type": "TABLE",
"schema": {
"columns": [
{
"type": "STRING",
"description": "A code indicating the TPEP provider that provided the record. 1= ",
"mode": "REQUIRED",
"column": "vendor_id"
},
...
]
},
"sourceSystemTimestamps": {
"createTime": "2019-01-25T01:45:29.959Z",
"updateTime": "2019-03-19T23:20:26.540Z"
},
"linkedResource": "//bigquery.googleapis.com/projects/project-id/datasets/demo_dataset/tables/trips",
"bigqueryTableSpec": {
"tableSourceType": "BIGQUERY_TABLE"
}
}
Create a tag from the template and attach it to your BigQuery table
POST https://datacatalog.googleapis.com/v1/projects/project-id/locations/region/entryGroups/@bigquery/entries/entry-id/tags
{
"template":"projects/project-id/locations/us-central1/tagTemplates/demo_tag_template",
"fields":{
"source":{
"stringValue":"Copied from tlc_yellow_trips_2017"
},
"num_rows":{
"doubleValue":113496874
},
"has_pii":{
"boolValue":false
},
"pii_type":{
"enumValue":{
"displayName":"NONE"
}
}
}
}
{
"name":"projects/project-id/locations/US/entryGroups/@bigquery/entries/entry-id/tags/tag-id",
"template":"projects/project-id/locations/us-central1/tagTemplates/demo_tag_template",
"fields":{
"pii_type":{
"displayName":"PII type",
"enumValue":{
"displayName":"NONE"
}
},
"has_pii":{
"displayName":"Has PII",
"boolValue":false
},
"source":{
"displayName":"Source of data asset",
"stringValue":"Copied from tlc_yellow_trips_2017"
},
"num_rows":{
"displayName":"Number of rows in data asset",
"doubleValue":113496874
}
},
"templateDisplayName":"Demo Tag Template"
}
Create an overview for your entry
Within Google Cloud console, you can use rich text to describe an entry in your Data Catalog project.
To create an overview for the
trips
table, go to the Dataplex search page.For Choose search platform, select Data Catalog as the search mode.
In the search box, enter
demo_dataset
.In the search result, you see the
demo_dataset
dataset and thetrips
table.Click the
trips
table.A BigQuery table details page opens.
Click Add overview and enter some text. You can additionally include images and rich formatted text.
Click Save.
Add a data steward for your entry
Within Google Cloud console, you can add one or more data stewards to an entry in your Data Catalog project. A data steward for a data entry can be contacted to request more information about the data entry.
To create an overview for the
trips
table, repeat the first 3 steps from the previous section.Click the Edit Steward icon and add in one or more email addresses.
You can add a user with a non-Google email account.
Click Save.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Delete the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete the dataset
If necessary, go to the BigQuery page.
In the Explorer panel, search for the
demo_dataset
dataset you created.Click the
Actions option and click Delete dataset.Confirm your delete action.
Delete the tag template
Go to the Data Catalog > Templates page.
Select Demo Tag Template.
In the row, click the
Actions option and click Delete this template.Confirm your delete action.
What's next
Learn more about Data Catalog.
Learn about technical metadata and business metadata.
Learn about tag templates, public tags, and private tags in Tags and tag templates.
Browse the Overview of APIs and Client Libraries.