Use Dataplex Attribute Store

This document shows how to use the Dataplex Attribute Store.

The Dataplex Attribute Store is an extensible infrastructure that lets you specify policy-related behaviors on the associated resources. Dataplex administrators can use the Attribute Store to define how certain data should be treated, by associating data with attributes.

The key benefit of leveraging the Attribute Store is that you can add multiple attributes to an object, such as a column. The Attribute Store merges the behaviors of all the attributes associated with an object and presents it as a single policy on the underlying resource.

You can set attributes to published datasets. Published datasets refer to the datasets created by Dataplex from the discovered tables in a bucket asset.

The following policy behaviors are supported:

  • Resource specifications: Specifies access to a resource, such as a table.
  • Column specifications: Specifies access to a column in a BigQuery table.

You can use the Attribute Store to define an attribute hierarchy called a taxonomy. In a taxonomy, a child attribute inherits specifications from the parent attributes hierarchy. Specifications from parent the child merge into a unified list, which is propagated to the resource.

You can use the Dataplex Attribute Store to perform the following:

  • Create taxonomies.
  • Create attributes and organize them in a hierarchy.
  • Associate one or more attributes to tables.
  • Associate one or more attributes to columns.

Terminology

The following terminology is used in this document:

Attribute taxonomy

A data taxonomy is a hierarchy of attributes. In a taxonomy, the attributes in parent nodes allow attributes below them (child attributes) to inherit and add the behavior specifications of parent attributes to their own.

For example: If an attribute named PII has a resource specification group-a@company.com and a child attribute of PII named Social Security numbers has a resource specification group-b@company.com, then the resource specifications applied to the policies where the attribute Social Security numbers are associated, will be group-a@company.com and group-b@company.com.

When you define an attribute, you can choose whether it's a parent or a child attribute. When defining a child attribute, you must specify its parent attribute.

Column specifications

The behavior specifications for columns. It specifies people or groups who have reader access to columns. If you associate an attribute containing a column specification with a table's column, it adds a BigQuery column policy tag to that column.

Resource specifications

The permissions for people or groups to access resources (tables). If you associate an attribute with resource specification, Dataplex propagates IAM roles to the specified users to access the tables associated with the attribute.

Before you begin

Limitations

Dataplex propagates the column specification policies as BigQuery policy tags. BigQuery has a limitation of one policy tag per column. If a policy tag already exists on a column, Dataplex throws an error in the Governance log on the Manage tab.

Quotas

The following are the quotas and limits that apply to the Dataplex Attribute Store:

Limit Default
Maximum number of taxonomies in a region 100
Maximum number of attributes in all taxonomies in a region 10,000
Maximum number of attributes that can be associated with a resource (table) 50
Maximum number of attributes that can be associated with a column 100
Maximum depth per data attribute tree in an attribute taxonomy 4

Required roles and permissions

To get the permissions that you need to use Dataplex attribute store, ask your administrator to grant you the following IAM roles on the project:

For more information about granting roles, see Manage access.

These predefined roles contain the permissions required to use Dataplex attribute store. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to use Dataplex attribute store:

  • Manage taxonomies and attributes:
    • dataplex.datataxonomies.*
    • dataplex.dataattributes.* (except dataplex.dataattributes.configureResourceAccess and dataplex.dataattributes.configureDataAccess)
  • View bindings associated to resources and attributes:
    • dataplex.datataxonomies.get
    • dataplex.datataxonomies.list
    • dataplex.dataattributes.get
    • dataplex.dataattributes.list
    • dataplex.dataattributebindings.get
    • dataplex.dataattributebindings.list
  • Create and manage binding resources in a project: dataplex.dataattributebindings.*
  • Manage resource and data access specifications:
    • dataplex.datataxonomies.configureResourceAccess
    • dataplex.datataxonomies.configureDataAccess

You might also be able to get these permissions with custom roles or other predefined roles.

Example use cases

Consider a company named ACME that has three kinds of data:

  • Red data that is sensitive.
  • Green data that is restricted, but less sensitive.
  • Uncategorized data.

The Dataplex administrator of ACME creates the following set of attributes:

  • Attribute: Red

    • Column specifications: secrets_team@acme with read permission
    • Resource specifications: secrets_team@acme and tenured_employees@acme with read permission
  • Attribute: Green

    • Column specifications: full_time_employees@acme with read permission
    • Resource specifications: full_time_employees@acme with edit permission

This image contains the column and resource specifications for the attributes Red and Green.

The attributes Red and Green control the access behavior to the resources (tables) depending on the attributes associated with the tables and their columns.

Consider a table with the following columns:

  • ID
  • Zip code
  • Name
  • Address
  • $Value

Use case 1: Associate the same attribute with the table and a column

This image shows the attribute Red being associated with the table and the column Name.

If you associate the attribute Red with the table and its column Name, then Dataplex propagates the following policies:

  • Employees in secrets_team@acme and tenured_employees@acme can read the table, see its metadata, and query it.
  • Only employees in secrets_team@acme can query the column Name, as it's further protected by column specifications.

Use case 2: Combine attributes

Consider the following associations:

  • Associate the attributes Red and Green with the table.
  • Associate the attributes Red and Green with the column Name.
  • Associate the attribute Red with the column $Value.

This image shows the attributes Red and Green being associated with the table and the column Name, and the attribute Red being associated with the column $value

In this case, Dataplex propagates the following policies:

  • Employees in secrets_team@acme, tenured_employees@acme, and full_time_employees@acme can access the table. This is because Dataplex merges the resource specifications of the attributes Red and Green.
  • Employees in both secrets_team@acme and full_time_employees@acme can access the column Name. This is because Dataplex merges the column specifications of the attributes Red and Green.
  • Only employees in secrets_team@acme can query the column $Value.

Use case 3: Organize attributes in a hierarchy

You can organize attributes in a hierarchy by specifying the subtypes of attributes. Consider the following set of attributes:

Parent attribute 1:
Attribute: PII

  • Column specifications: secrets_team@acme
  • Resource specifications: secrets_team@acme and tenured_employees@acme

Child attribute of PII:
Attribute: Email

  • Column specifications: email_comm@acme
  • Resource specifications: email_comm@acme

Parent attribute 2:
Attribute: Financial

  • Column specifications: full_time_employees@acme
  • Resource specifications: full_time_employees@acme

This image shows an example of attributes hierarchy.

Consider the following associations:

  • Associate the attributes Email and Financial with the table.
  • Associate the attributes Email and Financial with the column Name.
  • Associate the attribute PII with the column $Value.

This image shows how attributes in a hierarchy can be associated with the table and columns.

In this case, Dataplex propagates the following policies:

  • Employees in secrets_team@acme, tenured_employees@acme, full_time_employees@acme, and email_comm@acme can access the table. This is because Dataplex merges the resource specifications of the attributes Financial and Email, and the attribute Email inherits the specifications from the attribute PII.
  • Employees in secrets_team@acme, email_comm@acme, full_time_employees@acme can access the column Name. This is because Dataplex merges the column specifications of the attributes Financial and Email.
  • Only employees in secrets_team@acme can query the column $Value.

Set up attributes

To create an attribute, you must first create a taxonomy, and then create the parent and child data attributes.

Create a data attribute taxonomy

  1. In the Google Cloud console, go to the Dataplex Attribute Store page.

    Go to Attributes Store

  2. Click Create Taxonomy.

  3. Enter the Taxonomy name, ID, and Description.

  4. Select a region.

  5. Click Submit.

    The new taxonomy appears on the Data Taxonomies page.

Create a parent attribute

  1. In the Google Cloud console, go to the Dataplex Attribute Store page.

    Go to Attributes Store

  2. On the Data Taxonomies page, click the taxonomy in which you want to create the parent attribute.

  3. On the Taxonomy details page, click Add data attribute.

  4. Select Create parent data attribute.

  5. Enter a name, an ID, and a description for the parent attribute.

  6. Optional: Set up attribute specifications.

    1. Set up resource specifications:

      1. Click Manage Permissions for Resource.
      2. Click Add.
      3. In the New principals field, enter the email address of a person or a group who needs access to the resource.
      4. Select the required Roles and click Save.
      5. Click Save.
    2. Set up column specifications:

      1. Click Manage Permissions for Column.
      2. Click Add.
      3. In the New principals field, enter the email address of a person or a group who needs access to the column.
      4. Select the required Roles and click Save.
      5. Click Save.
  7. Click Create.

Create a child attribute

  1. In the Google Cloud console, go to the Dataplex Attribute Store page.

    Go to Attributes Store

  2. On the Data Taxonomies page, click the taxonomy in which you want to create the child attribute.

  3. On the Taxonomy details page, click Add data attribute.

  4. Select Create child data attribute.

  5. Select a Parent data attribute for the child attribute you're creating.

  6. Enter a name, an ID, and a description for the child attribute.

  7. Optional: Set up attribute specifications.

    1. Set up resource specifications:

      1. Click Manage Permissions for Resource.
      2. Click Add.
      3. In the New principals field, enter the email address of a person or a group who needs access to the resource.
      4. Select the required Roles and click Save.
      5. Click Save.
    2. Set up column specifications:

      1. Click Manage Permissions for Column.
      2. Click Add.
      3. In the New principals field, enter the email address of a person or a group who needs access to the column.
      4. Select the required Roles and click Save.
      5. Click Save.
  8. Click Create.

Update Attribute Store resources

Update taxonomy details

  1. In the Google Cloud console, go to the Dataplex Attribute Store page.

    Go to Attributes Store

  2. Click the taxonomy you want to update.

  3. Click Edit.

  4. Edit the taxonomy name and its description as needed.

  5. Click Submit.

Update attribute details

  1. In the Google Cloud console, go to the Dataplex Attribute Store page.

    Go to Attributes Store

  2. Click the taxonomy that contains the attribute you want to update.

  3. Click the attribute you want to update.

  4. To update the attribute name and description, click Edit.

    1. If you're updating a parent attribute, you have an option to update it to a child attribute, and the other way around. Select the options accordingly.
    2. Edit the attribute name and its description as needed.
    3. Click Update.
  5. To update resource specifications for the attribute, click for Resource specifications.

    1. To add a new principal, follow these steps:

      1. Click Add.
      2. In the New Principals field, enter the email address of a person or a group who needs access to the resource.
      3. Select the required Roles.
      4. Click Save.
    2. To update an existing principal, follow these steps:

      1. Click for the principal you want to update.
      2. Select the required Roles.
      3. Click Save.
    3. To remove an existing principal, follow these steps:

      1. Select the principal you want to remove.
      2. Click Remove.
  6. To update column specifications for the attribute, click for Column specifications.

    1. To add a new principal, follow these steps:

      1. Click Add.
      2. In the New Principals field, enter the email address of a person or a group who needs access to the column.
      3. Select the required Roles.
      4. Click Save.
    2. To update an existing principal, follow these steps:

      1. Click for the principal you want to update.
      2. Select the required Roles.
      3. Click Save.
    3. To remove an existing principal, follow these steps:

      1. Select the principal you want to remove.
      2. Click Remove.

Associate attributes with resources

Associate an attribute with a table

  1. In the Google Cloud console, go to the Dataplex Attribute Store page.

    Go to Attributes Store

  2. Click the taxonomy that contains the attribute.

  3. Click the attribute you want to associate a table with.

  4. Click the Resources tab.

  5. Click Add Resources.

  6. Select a table from the list.

  7. Click Select.

Associate an attribute with a column

  1. In the Google Cloud console, go to the Dataplex Attribute Store page.

    Go to Attributes Store

  2. Search and select the table for which you want to associate an attribute with a column.

  3. Click the Schema and Column Tags tab.

  4. Click in the Policy Tags for the column you want to associate an attribute with.

  5. Select the taxonomy that contains the attribute.

  6. Select the attribute.

  7. Click Attach.

What's next