Introduction to dynamic data masking

BigQuery supports dynamic data masking at the column level. You can use data masking to selectively obscure column data for groups of users, while still allowing access to the column.

When you use data masking in combination with column-level access control, you can configure a range of access to column data, from full access to no access, based on the requirements of different groups of users. For example, for tax ID data, you might want to grant your accounting group full access, your analyst group masked access, and your sales group no access.

Benefits

Data masking provides the following benefits:

  • It makes sharing data easier. You can mask sensitive columns to make it possible to share tables with larger groups.
  • There's no need to modify existing queries by excluding the columns that the user cannot access, unlike column-level access control. When you configure data masking, existing queries automatically mask column data based on the roles the user has been granted.
  • It makes it easy to apply data access policies at scale. You can write a data policy once, associate it with a policy tag, and apply the policy tag to any number of columns.
  • It enables attribute-based access control. A policy tag attached to a column provides contextual data access, which is determined by the data policy and the principals associated with that policy tag.

Data masking workflow

Figure 1 shows the workflow for configuring data masking:

To enable data masking, you must create a taxonomy, create data policies for the policy tags in the taxonomy, and then associate the policy tags with table columns. Figure 1. Data masking components.

You configure data masking with the following steps:

  1. Set up a taxonomy and one or more policy tags in Data Catalog.
  2. Configure data policies for the policy tags. A data policy maps a data masking rule and one or more principals, which represent users or groups, to the policy tag.

    When creating a data policy by using the Google Cloud console, you can create the data masking rule and specify the principals in one step. When creating a data policy by using the BigQuery Data Policy API, you create the data policy and data masking rule in one step, and specify the principals for the data policy in a second step.

  3. Assign the policy tags to columns in BigQuery tables to apply the data policies.

The policy tag that is associated with a data policy can also be associated with one or more principals and the Data Catalog Fine-Grained Reader role in order to enable access to the original, unmasked column data.

Figure 2 shows how these components work together:

Data Catalog policy tags are associated with data policies to configure data masking, and then associated with BigQuery table columns to enable masking. Figure 2. Data masking components.

For more information about role interaction, see How Masked Reader and Fine-Grained Reader roles interact. For more information about policy tag inheritance, see Roles and policy tag hierarchy.

Data masking rules

When you use data masking, a data masking rule is applied to a column at query runtime, based on the role of the user running the query. The data masking rule determines the type of data masking applied to the column data.

You can use the following data masking rules:

  • Nullify. Returns NULL instead of the column value. Use this when you want to hide both the value and the data type of the column. When this data masking rule is applied to a column, it makes it less useful in query JOIN operations for users with Masked Reader access. This is because a NULL value isn't sufficiently unique to be useful when joining tables.
  • Default Masking Value. Returns a default masking value for the column based on the column's data type. Use this when you want to hide the value of the column but reveal the data type. When this data masking rule is applied to a column, it makes it less useful in query JOIN operations for users with Masked Reader access. This is because a default value isn't sufficiently unique to be useful when joining tables.

    The following table shows the default masking value for each data type:

    Data type Default masking value
    STRING ""
    BYTES b''
    INTEGER 0
    FLOAT 0.0
    NUMERIC 0
    BOOLEAN FALSE
    TIMESTAMP 0001-01-01 00:00:00 UTC
    DATE 0001-01-01
    TIME 00:00:00
    DATETIME 0001-01-01T00:00:00
    GEOGRAPHY POINT(0 0)
    BIGNUMERIC 0
    ARRAY []
    STRUCT

    NOT_APPLICABLE

    Policy tags can't be applied to columns that use the STRUCT data type, but they can be associated with the leaf fields of such columns.

    JSON NULL
  • Hash (SHA256). Returns the column's value after it has been run through the SHA256 hash function. Use this when you want the end user to be able to use this column in a JOIN operation for a query. You can only use this rule with columns that use the STRING or BYTES data types.

    The SHA256 function used in data masking is type preserving, so the hash value it returns has the same data type as the column value. For example, the hash value for a STRING column value will also have a STRING data type.

Roles and permissions

Roles for managing taxonomies and policy tags

You need the Data Catalog Policy Tag Admin role to create and manage taxonomies and policy tags.

Role/ID Permissions Description
Data Catalog Policy Tag Admin/datacatalog.policyTagAdmin datacatalog.categories.getIamPolicy
datacatalog.categories.setIamPolicy
datacatalog.taxonomies.create
datacatalog.taxonomies.delete
datacatalog.taxonomies.get
datacatalog.taxonomies.getIamPolicy
datacatalog.taxonomies.list
datacatalog.taxonomies.setIamPolicy
datacatalog.taxonomies.update
resourcemanager.projects.get
resourcemanager.projects.list

Applies at the project level.

This role grants the ability to do the following:

  • Create, read, update, and delete taxonomies and policy tags.
  • Get and set IAM policies on policy tags.

Roles for creating and managing data policies

You need one of the following BigQuery roles to create and manage data policies.

Role/ID Permissions Description
BigQuery Admin/bigquery.admin

BigQuery Data Owner/bigquery.dataOwner
bigquery.dataPolicies.create
bigquery.dataPolicies.delete
bigquery.dataPolicies.get
bigquery.dataPolicies.getIamPolicy
bigquery.dataPolicies.list
bigquery.dataPolicies.setIamPolicy
bigquery.dataPolicies.update

The bigquery.dataPolicies.create and bigquery.dataPolicies.list permissions apply at the project level. The other permissions apply at the data policy level.

This role grants the ability to do the following:

  • Create, read, update, and delete data policies.
  • Get and set IAM policies on data policies.

Roles for attaching policy tags to columns

You need the datacatalog.taxonomies.get and bigquery.tables.setCategory permissions to attach policy tags to columns. datacatalog.taxonomies.get is included in the Data Catalog Policy Tags Admin and Project Viewer roles. bigquery.tables.setCategory is included in the Data Catalog Admin and Data Owner roles.

Roles for querying masked data

You need the BigQuery Masked Reader role to query the data from a column that has data masking applied.

Role/ID Permissions Description
Masked Reader/bigquerydatapolicy.maskedReader bigquery.dataPolicies.maskedGet

Applies at the data policy level.

This role grants the ability to view the masked data of a column that is associated with a data policy.

Additionally, a user must have appropriate permissions to query the table. For more information, see Required permissions.

How Masked Reader and Fine-Grained Reader roles interact

Data masking builds on top of column-level access control. For a given column, it is possible to have some users with the BigQuery Masked Reader role that allows them to read masked data, some users with the Data Catalog Fine-Grained Reader role that allows them to read unmasked data, some users with both, and some users with neither. These roles interact as follows:

  • User with both Fine-Grained Reader and Masked Reader roles: what the user sees depends on where in the policy tag hierarchy each role is granted. For more information, see Authorization inheritance in a policy tag hierarchy.
  • User with Fine-Grained Reader role: can see unmasked column data.
  • User with Masked Reader role: can see masked column data.
  • User with neither role: permission denied.

In the case where a table has several columns that are secured and/or masked, in order to run a SELECT * FROM statement on that table, a user must be a member of appropriate groups such that they are granted Masked Reader or Fine-Grained Reader roles on all of these columns.

A user who is not granted these roles must instead specify only columns that they have access to in the SELECT statement, or use SELECT * EXCEPT (restricted_columns) FROM to exclude the secured or masked columns.

Authorization inheritance in a policy tag hierarchy

Roles are evaluated starting at the policy tag associated with a column, and then checked at each ascending level of the taxonomy, until the user either is determined to have appropriate permissions or the top of the policy tag hierarchy is reached.

For example, take the policy tag and data policy configuration shown in Figure 3:

Evaluating user access when Masked Reader is granted at a higher level of the taxonomy and Fine-Grained Reader is granted at a lower level of the taxonomy.

Figure 3. Policy tag and data policy configuration.

You have a table column that is annotated with the Financial policy tag, and a user who is a member of both the ftes@example.com and analysts@example.com groups. When this user runs a query that includes the annotated column, their access is determined by the hierarchy defined in the Data Catalog taxonomy. Because the user is granted the Data Catalog Fine-Grained Reader role by the Financial policy tag, the query returns unmasked column data.

If another user who is only a member of the ftes@example.com role runs a query that includes the annotated column, the query returns column data that has been hashed using the SHA256 algorithm, because the user is granted the BigQuery Masked Reader role by the Confidential policy tag, which is the parent of the Financial policy tag.

A user who is not a member of either of those roles gets an access denied error if they try to query the annotated column.

In contrast with the preceding scenario, take the policy tag and data policy configuration shown in Figure 4:

Evaluating user access when Fine-Grained Reader is granted at a higher level of the taxonomy and Masked Reader is granted at a lower level of the taxonomy.

Figure 4. Policy tag and data policy configuration.

You have the same situation as shown in Figure 3, but the user is granted the Fine-Grained Reader role at a higher level of the policy tag hierarchy, and the Masked Reader role at a lower level of the policy tag hierarchy. Because of this, the query returns masked column data for this user. This happens even though the user is granted the Fine-Grained Reader role further up the tag hierarchy, because the service uses the first assigned role it encounters as it ascends the policy tag hierarchy to check for user access.

If you want to create a single data policy and have it apply to several levels of a policy tag hierarchy, you can set the data policy on the policy tag that represents the topmost hierarchy level to which it should apply. For example, take a taxonomy with the following structure:

  • Policy tag 1
    • Policy tag 1a
      • Policy tag 1ai
    • Policy tag 1b
      • Policy tag 1bi
      • Policy tag 1bii

If you want a data policy to apply to all of these policy tags, set the data policy on policy tag 1. If you want a data policy to apply to policy tag 1b and its children, set the data policy on policy tag 1b.

Using data masking with incompatible features

When you use BigQuery features that aren't compatible with data masking, the service treats the masked column as a secured column, and only grants access to users who have the Data Catalog Fine-Grained Reader role.

For example, take the policy tag and data policy configuration shown in Figure 5:

The policy tag associated with the column is evaluated to determine if the user has permission to access unmasked data.

Figure 5. Policy tag and data policy configuration.

You have a table column that is annotated with the Financial policy tag, and a user who is a member of the analysts@example.com group. When this user tries to access the annotated column through one of the incompatible features, they get an access denied error. This is because they are granted the BigQuery Masked Reader by Financial policy tag, but in this case, they must have the Data Catalog Fine-Grained Reader role. Because the service has already determined an applicable role for the user, it does not continue to check farther up the policy tag hierarchy for additional permissions.

Data masking example with output

To see how tags, principals, and roles work together, consider this example.

At example.com, basic access is granted through the data-users@example.com group. All employees who need regular access to BigQuery data are members of this group, which is assigned all the necessary permissions to read from tables as well as the BigQuery Masked Reader role.

Employees are assigned to additional groups that provide access to secured or masked columns where that is required for their work. All members of these additional groups are also members of data-users@example.com. You can see how these groups are associated with appropriate roles in Figure 6:

Policy tags and data policies for example.com.

Figure 6. Policy tags and data policies for example.com.

The policy tags are then associated with table columns, as shown in Figure 7:

Example.com policy tags associated with table columns.

Figure 7. Example.com policy tags associated with table columns.

Given the tags that are associated with the columns, running SELECT * FROM Accounts; leads to the following results for the different groups:

  • data-users@example.com: This group has been granted the BigQuery Masked Reader role on both the PII and Confidential policy tags. The following results are returned:

    SSN Priority Lifetime value Creation date Email
    NULL "" 0 March 8, 1983 NULL
    NULL "" 0 December 29, 2009 NULL
    NULL "" 0 July 14, 2021 NULL
    NULL "" 0 May 5, 1997 NULL
  • accounting@example.com: This group has been granted the Data Catalog Fine-Grained Reader role on the SSN policy tag. The following results are returned:

    SSN Priority Lifetime value Creation date NULL
    123-45-6789 "" 0 March 8, 1983 NULL
    234-56-7891 "" 0 December 29, 2009 NULL
    345-67-8912 "" 0 July 14, 2021 NULL
    456-78-9123 "" 0 May 5, 1997 NULL
  • sales-exec@example.com: This group has been granted the Data Catalog Fine-Grained Reader role on the Confidential policy tag. The following results are returned:

    SSN Priority Lifetime value Creation date Email
    NULL High 90,000 March 8, 1983 NULL
    NULL High 84,875 December 29, 2009 NULL
    NULL Medium 38,000 July 14, 2021 NULL
    NULL Low 245 May 5, 1997 NULL
  • fin-dev@example.com: This group has been granted the BigQuery Masked Reader role on the Financial policy tag. The following results are returned:

    SSN Priority Lifetime value Creation date Email
    NULL "" Zmy9vydG5q= March 8, 1983 NULL
    NULL "" GhwTwq6Ynm= December 29, 2009 NULL
    NULL "" B6y7dsgaT9= July 14, 2021 NULL
    NULL "" Uh02hnR1sg= May 5, 1997 NULL
  • All other users: Any user who does not belong to one of the listed groups gets an access denied error, because they haven't been granted the Data Catalog Fine-Grained Reader or BigQuery Masked Reader roles. To query the Accounts table, they must instead specify only columns that they have access to in the SELECT * EXCEPT (restricted_columns) FROM Accounts to exclude the secured or masked columns.

Billing considerations

Data masking might indirectly affect the number of bytes processed, and therefore affect the billing of the query. If a user queries a column that is masked for them with the Nullify or Default Masking Value rules, then that column isn't scanned at all, resulting in fewer bytes processed.

Restrictions and limitations

Regions

Data masking is limited to the US and EU multi-regions.

Data policy management

  • You can only create one data policy per policy tag.
  • All data policies must reside in the same project as their associated policy tags.

Using the Hash (SHA256) data masking rule

  • Only the Hash (SHA256) data masking rule allows a masked column to still be used in the JOIN operation for a query.
  • You can only apply the Hash (SHA256) data masking rule to columns that use the STRING or BYTES data types.

Setting access control

Once a taxonomy has a data policy associated with at least one of its policy tags, access control is automatically enforced. If you want to turn off access control, you must first delete all of the data policies associated with the taxonomy.

SQL dialects

Legacy SQL is not supported.

Compatibility with other BigQuery features

BigQuery API

Not compatible with the tabledata.list method. To call tabledata.list, you need full access to all of the columns returned by this method. The Data Catalog Fine-Grained Reader role grants appropriate access.

BigQuery Storage Read API

Not compatible. To instantiate a read session, you need full access to all of the columns returned by the CreateReadSessionRequest call. The Data Catalog Fine-Grained Reader role grants appropriate access.

BigQuery BI Engine

Not compatible. You need full access to all columns that you reference when using BI Engine. The Data Catalog Fine-Grained Reader role grants appropriate access.

Copy jobs

Not compatible. To copy a table from source to the destination, you need to to have full access to all of the columns on the source table. The Data Catalog Fine-Grained Reader role grants appropriate access.

Data export

Compatible. If you have the BigQuery Masked Reader role, then the exported data is masked. If you have the Data Catalog Fine-Grained Reader role, then the exported data is not masked.

Row-level security

Compatible. Data masking is applied on top of row-level security. For example, if there is a row access policy applied on location = "US" and location is masked, then users are able to see rows where location = "US" but the location field is masked.

Searching in BigQuery

Partly compatible. You can call the SEARCH function on indexed or unindexed columns that have data masking applied.

When you call the SEARCH function on columns that have data masking applied, you must use search criteria compatible with your level of access. For example, if you have Masked Reader access with a Hash (SHA256) data masking rule, you would use the hash value in your SEARCH clause, similar to the following:

SELECT * from myDataset.Customers WHERE SEARCH(Email, "sg172y34shw94fujaweu");

If you have Fine-Grained Reader access, you would use the actual column value in your SEARCH clause, similar to the following:

SELECT * from myDataset.Customers WHERE SEARCH(Email, "jane.doe@example.com");

Searching is less likely to be useful if you have Masked Reader access to a column where the data masking rule used is Nullify or Default Masking Value. This is because the masked results you would use as search criteria, such as NULL or "", aren't sufficiently unique to be useful.

When searching on an indexed column that has data masking applied, the search index is only used if you have Fine-Grained Reader access to the column.

Snapshots

Not compatible. To create a snapshot of a table, you need full access to all of the columns on the source table. The Data Catalog Fine-Grained Reader role grants appropriate access.

Table renaming

Compatible. Table renaming is not affected by data masking.

Time travel

Compatible with both time decorators and the FOR SYSTEM_TIME AS OF option in SELECT statements. The policy tags for the current dataset schema are applied to the retrieved data.

Query caching

Partly compatible. BigQuery caches query results for approximately 24 hours, although the cache is invalidated if changes are made to the table data or schema before that. In the following circumstance, it is possible that a user who does not currently have the Data Catalog Fine-Grained Reader role granted on a column can still see the column data when they run a query:

  1. A user has been granted the Data Catalog Fine-Grained Reader role on a column.
  2. The user runs a query that includes the restricted column and the data is cached.
  3. Within 24 hours of Step 2, the user is granted the BigQuery Masked Reader role, and has the Data Catalog Fine-Grained Reader role revoked.
  4. Within 24 hours of Step 2, the user runs that same query, and the cached data is returned.

Wildcard table queries

Not compatible. You need full access to all of the referenced columns on all of the tables matching the wildcard query. The Data Catalog Fine-Grained Reader role grants appropriate access.

What's next