This document describes how to use the Dataplex Attribute Store.
The Dataplex Attribute Store is an extensible infrastructure that lets you specify policy-related behaviors on the associated resources. Dataplex administrators can use the Attribute Store to define how certain data should be treated, by associating data with attributes.
Using Attribute Store, you can add multiple attributes to an object, such as a column. The Attribute Store merges the behaviors of all the attributes associated with an object and presents it as a single policy on the underlying resource.
You can set attributes to published datasets. Published datasets refer to the datasets created by Dataplex from the discovered tables in a bucket asset.
The following policy behaviors are supported:
- Resource specifications: specifies access to a resource, such as a table
- Column specifications: specifies access to a column in a BigQuery table
You can use the Attribute Store to define an attribute hierarchy called a taxonomy. In a taxonomy, a child attribute inherits specifications from the parent attributes hierarchy. Specifications from parent the child merge into a unified list, which is propagated to the resource.
You can use the Dataplex Attribute Store to perform the following:
- Create taxonomies.
- Create attributes and organize them in a hierarchy.
- Associate one or more attributes to tables.
- Associate one or more attributes to columns.
Terminology
This section describes the terminology used in this document.
Attribute taxonomy
A data taxonomy is a hierarchy of attributes. In a taxonomy, the attributes in parent nodes allow attributes below them (child attributes) to inherit and add the behavior specifications of parent attributes to their own.
For example:
If an attribute named PII
has a resource specification group-a@company.com
and a child attribute of PII
named Social Security numbers
has a resource
specification group-b@company.com
, then the resource specifications applied to
the policies where the attribute Social Security numbers
are associated, will
be group-a@company.com
and group-b@company.com
.
When you define an attribute, you can choose whether it's a parent or a child attribute. When defining a child attribute, you must specify its parent attribute.
Column specifications
The behavior specifications for columns. It specifies people or groups who have reader access to columns. If you associate an attribute containing a column specification with a table's column, it adds a BigQuery column policy tag to that column.
Resource specifications
The permissions for people or groups to access resources (tables). If you associate an attribute with resource specification, Dataplex propagates IAM roles to the specified users to access the tables associated with the attribute.
Before you begin
Limitations
Dataplex propagates the column specification policies as BigQuery policy tags. BigQuery has a limitation of one policy tag per column. If a policy tag already exists on a column, Dataplex throws an error in the Governance log on the Manage tab.
Quotas
The following are the quotas and limits that apply to the Dataplex Attribute Store:
Limit | Default |
---|---|
Maximum number of taxonomies in a region | 100 |
Maximum number of attributes in all taxonomies in a region | 10,000 |
Maximum number of attributes that can be associated with a resource (table) | 50 |
Maximum number of attributes that can be associated with a column | 100 |
Maximum depth per data attribute tree in an attribute taxonomy | 4 |
Required roles
To get the permissions that you need to use Dataplex attribute store, ask your administrator to grant you the following IAM roles on the project:
-
Manage taxonomies and attributes:
Dataplex Taxonomy Admin (
roles/dataplex.taxonomyAdmin
) -
View bindings associated to resources and attributes:
Dataplex Taxonomy Viewer (
roles/dataplex.taxonomyViewer
) -
Create and manage binding resources in a project:
-
Dataplex Binding Admin (
roles/dataplex.bindingAdmin
) -
Dataplex Admin (
roles/dataplex.admin
on Zone resource)
-
Dataplex Binding Admin (
-
Manage resource and data access specifications:
Dataplex Security Admin (
roles/dataplex.securityAdmin
)
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain the permissions required to use Dataplex attribute store. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to use Dataplex attribute store:
-
Manage taxonomies and attributes:
-
dataplex.datataxonomies.*
-
dataplex.dataattributes.*
(exceptdataplex.dataattributes.configureResourceAccess
anddataplex.dataattributes.configureDataAccess
)
-
-
View bindings associated to resources and attributes:
-
dataplex.datataxonomies.get
-
dataplex.datataxonomies.list
-
dataplex.dataattributes.get
-
dataplex.dataattributes.list
-
dataplex.dataattributebindings.get
-
dataplex.dataattributebindings.list
-
-
Create and manage binding resources in a project:
dataplex.dataattributebindings.*
-
Manage resource and data access specifications:
-
dataplex.datataxonomies.configureResourceAccess
-
dataplex.datataxonomies.configureDataAccess
-
You might also be able to get these permissions with custom roles or other predefined roles.
Example use cases
Consider a company named ACME
that has three kinds of data:
Red
data that is sensitiveGreen
data that is restricted, but less sensitive- Uncategorized data
The Dataplex administrator of ACME
creates the following set of
attributes:
Attribute:
Red
- Column specifications:
secrets_team@acme
with read permission - Resource specifications:
secrets_team@acme
andtenured_employees@acme
with read permission
- Column specifications:
Attribute:
Green
- Column specifications:
full_time_employees@acme
with read permission - Resource specifications:
full_time_employees@acme
with edit permission
- Column specifications:
The attributes Red
and Green
control the access behavior to the resources
(tables) depending on the attributes associated with the tables and their columns.
Consider a table with the following columns:
- ID
- Zip code
- Name
- Address
- $Value
Use case 1: Associate the same attribute with the table and a column
If you associate the attribute Red
with the table and its column Name,
then Dataplex propagates the following policies:
- Employees in
secrets_team@acme
andtenured_employees@acme
can read the table, see its metadata, and query it. - Only employees in
secrets_team@acme
can query the column Name, as it's further protected by column specifications.
Use case 2: Combine attributes
Consider the following associations:
- Associate the attributes
Red
andGreen
with the table. - Associate the attributes
Red
andGreen
with the column Name. - Associate the attribute
Red
with the column $Value.
In this case, Dataplex propagates the following policies:
- Employees in
secrets_team@acme
,tenured_employees@acme
, andfull_time_employees@acme
can access the table. This is because Dataplex merges the resource specifications of the attributesRed
andGreen
. - Employees in both
secrets_team@acme
andfull_time_employees@acme
can access the column Name. This is because Dataplex merges the column specifications of the attributesRed
andGreen
. - Only employees in
secrets_team@acme
can query the column $Value.
Use case 3: Organize attributes in a hierarchy
You can organize attributes in a hierarchy by specifying the subtypes of attributes. Consider the following set of attributes:
Parent attribute 1:
Attribute: PII
- Column specifications:
secrets_team@acme
- Resource specifications:
secrets_team@acme
andtenured_employees@acme
Child attribute of PII
:
Attribute: Email
- Column specifications:
email_comm@acme
- Resource specifications:
email_comm@acme
Parent attribute 2:
Attribute: Financial
- Column specifications:
full_time_employees@acme
- Resource specifications:
full_time_employees@acme
Consider the following associations:
- Associate the attributes
Email
andFinancial
with the table. - Associate the attributes
Email
andFinancial
with the column Name. - Associate the attribute
PII
with the column $Value.
In this case, Dataplex propagates the following policies:
- Employees in
secrets_team@acme
,tenured_employees@acme
,full_time_employees@acme
, andemail_comm@acme
can access the table. This is because Dataplex merges the resource specifications of the attributesFinancial
andEmail
, and the attributeEmail
inherits the specifications from the attributePII
. - Employees in
secrets_team@acme
,email_comm@acme
,full_time_employees@acme
can access the column Name. This is because Dataplex merges the column specifications of the attributesFinancial
andEmail
. - Only employees in
secrets_team@acme
can query the column $Value.
Set up attributes
To create an attribute, you must first create a taxonomy, and then create the parent and child data attributes.
Create a data attribute taxonomy
In the Google Cloud console, go to the Dataplex Attribute Store page.
Click Create Taxonomy.
Enter the Taxonomy name, ID, and Description.
Select a region.
Click Submit.
The new taxonomy appears on the Data Taxonomies page.
Create a parent attribute
In the Google Cloud console, go to the Dataplex Attribute Store page.
On the Data Taxonomies page, click the taxonomy in which you want to create the parent attribute.
On the Taxonomy details page, click Add data attribute.
Select Create parent data attribute.
Enter a name, an ID, and a description for the parent attribute.
Optional: Set up attribute specifications.
Set up resource specifications:
- Click Manage Permissions for Resource.
- Click Add.
- In the New principals field, enter the email address of a person or a group who needs access to the resource.
- Select the required Roles and click Save.
- Click Save.
Set up column specifications:
- Click Manage Permissions for Column.
- Click Add.
- In the New principals field, enter the email address of a person or a group who needs access to the column.
- Select the required Roles and click Save.
- Click Save.
Click Create.
Create a child attribute
In the Google Cloud console, go to the Dataplex Attribute Store page.
On the Data Taxonomies page, click the taxonomy in which you want to create the child attribute.
On the Taxonomy details page, click Add data attribute.
Select Create child data attribute.
Select a Parent data attribute for the child attribute you're creating.
Enter a name, an ID, and a description for the child attribute.
Optional: Set up attribute specifications.
Set up resource specifications:
- Click Manage Permissions for Resource.
- Click Add.
- In the New principals field, enter the email address of a person or a group who needs access to the resource.
- Select the required Roles and click Save.
- Click Save.
Set up column specifications:
- Click Manage Permissions for Column.
- Click Add.
- In the New principals field, enter the email address of a person or a group who needs access to the column.
- Select the required Roles and click Save.
- Click Save.
Click Create.
Update Attribute Store resources
Update taxonomy details
In the Google Cloud console, go to the Dataplex Attribute Store page.
Click the taxonomy you want to update.
Click Edit.
Edit the taxonomy name and its description as needed.
Click Submit.
Update attribute details
In the Google Cloud console, go to the Dataplex Attribute Store page.
Click the taxonomy that contains the attribute you want to update.
Click the attribute you want to update.
To update the attribute name and description, click Edit.
- If you're updating a parent attribute, you have an option to update it to a child attribute, and the other way around. Select the options accordingly.
- Edit the attribute name and its description as needed.
- Click Update.
To update resource specifications for the attribute, click
Edit for Resource specifications.To add a new principal, follow these steps:
- Click Add.
- In the New Principals field, enter the email address of a person or a group who needs access to the resource.
- Select the required Roles.
- Click Save.
To update an existing principal, follow these steps:
- For the principal you want to update, click Edit.
- Select the required Roles.
- Click Save.
To remove an existing principal, follow these steps:
- Select the principal you want to remove.
- Click Remove.
To update column specifications for the attribute, click
Edit for Column specifications.To add a new principal, follow these steps:
- Click Add.
- In the New Principals field, enter the email address of a person or a group who needs access to the column.
- Select the required Roles.
- Click Save.
To update an existing principal, follow these steps:
- For the principal you want to update, click Edit.
- Select the required Roles.
- Click Save.
To remove an existing principal, follow these steps:
- Select the principal you want to remove.
- Click Remove.
Associate attributes with resources
Associate an attribute with a table
In the Google Cloud console, go to the Dataplex Attribute Store page.
Click the taxonomy that contains the attribute.
Click the attribute you want to associate a table with.
Click the Resources tab.
Click Add Resources.
Select a table from the list.
Click Select.
Associate an attribute with a column
In the Google Cloud console, go to the Dataplex Attribute Store page.
Search and select the table for which you want to associate an attribute with a column.
Click the Schema and Column Tags tab.
For the column you want to associate an attribute with, in the Policy Tags, click
Add.Select the taxonomy that contains the attribute.
Select the attribute.
Click Attach.
What's next
- Learn more about Dataplex security.
- Learn more about Policy management in Dataplex.
- Learn more about Dataplex IAM roles.