Introduction to BigQuery Column-level security

BigQuery provides fine-grained access to sensitive columns using policy tags, or type-based classification, of data. Using BigQuery Column-level security, you can create policies that check, at query time, whether a user has proper access. For example, a policy can enforce access checks such as:

  • You must be in group:high-access to see the columns containing TYPE_SSN.

Column-level security workflow

Workflow

To restrict data access at the column level:

  1. Define a taxonomy and data policy tags. Use Data Catalog to create and manage a taxonomy and policy tags for your data. For guidelines, see Best practices for policy tags.

  2. Assign policy tags to your BigQuery columns. In BigQuery, use schema annotations to assign a policy tag to each column where you want to restrict access.

  3. Manage access on the policy tags. Use Identity and Access Management (IAM) policies to restrict access to each policy tag. The policy is in effect for each column that belongs to the policy tag.

When a user tries to access column data at query time, BigQuery checks the column policy tag and its policy to see if the user is authorized to access the data.

Example use case

Consider an organization that needs to classify sensitive data into two categories: High and Medium.

Policy tags

To set up column level security, a data steward, who has the appropriate permissions, would perform the following steps to set up a hierarchy of data classification.

  1. The data steward creates a taxonomy named "Business criticality" in Data Catalog. The taxonomy includes the nodes, or policy tags High and Medium.

  2. The data steward decides the policy for the High node will include access for a group named high-tier-access.

  3. The data steward creates more levels of nodes in the taxonomy, under High and Medium. The lowest level node is a leaf node, such as the employee_ssn leaf node. The data steward can create a different access policy for the employee_ssn leaf node, or not.

  4. The data steward assigns a policy tag to specific table columns. In this example, the data steward assigns the High access policy to the employee_ssn column in a table.

  5. In the Current schema page of the console, the data steward can see the the policy tag that governs a particular column. In this example, the employee_ssn column is under the High policy tag, so when viewing the the schema for employee_ssn, the console displays the taxonomy name and the policy tag in the Policy tags field: Business criticality:High.

    Policy tag UI

    For details on using the console to set a policy tag, see Set a policy tag on a column.

    Alternatively, you can set the policy tag using the bq update command. The names field of policyTags includes the ID of the High policy tag, projects/project-id/locations/location/taxonomies/taxonomy-id/policyTags/policytag-id:

    [
     ...
     {
       "name": "ssn",
       "type": "STRING",
       "mode": "REQUIRED",
       "policyTags": {
         "names": ["projects/project-id/locations/location/taxonomies/taxonomy-id/policyTags/policytag-id",]
       }
     },
     ...
    ]
    

    For details on using the bq update command to set a policy tag, see Set a policy tag on a column.

  6. The admin performs similar steps for the Medium policy tag.

With this fine-grained access, you can manage access to many columns by controlling only a small number of data classification policy tags.

For details about these steps, see Restricting access with BigQuery column-level security.

Roles used with column-level security

The following Data Catalog roles are used for BigQuery column-level security.

Role name and ID Permissions included Applies to Description
Policy Tag Admin
datacatalog.categoryAdmin datacatalog.categories.getIamPolicy
datacatalog.categories.setIamPolicy
datacatalog.taxonomies.create
datacatalog.taxonomies.delete
datacatalog.taxonomies.get
datacatalog.taxonomies.getIamPolicy
datacatalog.taxonomies.list
datacatalog.taxonomies.setIamPolicy
datacatalog.taxonomies.update
resourcemanager.projects.get
resourcemanager.projects.list
Project This role has permission to do the following:
  • Create, read, update, and delete taxonomies.
  • Set IAM policies on policy tags.
Typically, a user with this role is in charge of setting data governance policies for the organization.

Fine-Grained Reader
datacatalog.categoryFineGrainedReader datacatalog.categories.fineGrainedGet Policy tag This role has permission to access the content of BigQuery columns restricted by a policy tag.

Typically, a user with this role is querying the data.

To learn more about Data Catalog roles, see Data Catalog Identity and Access Management (IAM).

Impact of writes

To read data from a column that is protected by Column-level security, the user is always required to have read permission, via the fine-grained read access, on the policy tags for the column.

This applies to:

  • Tables, including wildcard tables
  • Views
  • Copying tables

To write data to a row for a column that is protected by Column-level security, the user requirement depends on the type of write.

If the write operation is an insert, fine-grained read access is not required. However, the user won't have access to read the data that was inserted, unless the user has fine-grained read access.

If the write operation is an update, delete, or merge, the user can't perform the operation unless the user has fine-grained read access.

A user can't load data from local files or from Cloud Storage unless the user has fine-grained read access. However, a user can load data from streaming, because streaming loads do not check policy tags. The user won't have access to read the data that was loaded from a stream, unless the user has fine-grained read access.

For more information, see Impact on writes with BigQuery Column-level security.

Querying tables

If a user has dataset access and has the Data Catalog Fine-Grained Reader role, the column data is available to the user. The user runs a query as normal.

If a user has dataset access but does not have the Data Catalog Fine-Grained Reader role, the column data is not available to the user. If such a user runs SELECT *, the user will receive an error which lists the column(s) that the user cannot access. To resolve the error, you can either:

  • Modify the query to exclude the column(s) that the user cannot access. For example, if the user does not have access to the ssn column, but does have access to the remaining columns, the user can run the following query:

    SELECT * EXCEPT (ssn) FROM ...
    

    In the example above, the EXCEPT clause excludes the ssn column.

  • Ask a Data Catalog Administrator to add the user as a Data Catalog Fine-Grained Reader to the relevant data class. The error message provides the full name of the policy tag for which the user would need access.

Querying views

The impact of column-level security on views is independent of whether or not the view is an authorized view. In both cases, column-level security is enforced transparently.

If the view is not an authorized view:

If the user has column-level access to the columns in the view's underlying tables, then the user can query the columns in the view. Otherwise, the user cannot query the columns in the view.

If the view is an authorized view:

The access is controlled by only the column-level security on the columns in the view's underlying tables. Table-level and dataset-level IAM policies, if any, are not used to check access. If the user has access to the policy tags used in the authorized view's underlying tables, the user can query the columns in the authorized view.

The following diagram shows how access to a view is evaluated.

Accessing views

Impact of snapshots

BigQuery allows users to query a table in an earlier state. This capability allows you to query the rows from a previous snapshot of the table. It also allows you to roll the table back to the date the snapshot was taken.

In legacy SQL, you query a snapshot by using snapshot decorators on the table name. In standard SQL, you query a snapshot by using the FOR SYSTEM_TIME AS OF clause on the table.

The time limit to query a snapshot is the most recent of:

  • the last 7 days
  • the table creation time

You can't query a previous table if the table was deleted and recreated with the same table name.

To see how snapshots work with column-level security, assume you have the following in place:

  • you have an origin point, which is an existing table at the current time.

  • you have the snapshot of the same table.

  • the schema of the snapshot is identical to, or a subset of, the schema of the origin. That is, all columns of the snapshot are included within the origin.

With these assumptions, the permissions checked are against the latest column-level security on the origin (current) table. If the user is allowed to read the current columns, then the user will be able to query a snapshot of those columns.

If the schema of the origin and the snapshot differ for the columns in the query, a request to retrieve a snapshot will fail.

Location considerations

When you choose a location for your taxonomy, consider the following limitations.

Policy tags

Taxonomies are regional resources, like BigQuery datasets and tables. When you create a taxonomy, you specify the region, or location, for the taxonomy.

You can create a taxonomy and apply policy tags to tables in all regions where BigQuery is available. However, to apply policy tags from a taxonomy to a table column, the taxonomy itself and the table must exist in the same regional location.

Although you cannot apply a policy tag to a table column that exists in a different location, you can copy the taxonomy to another location by explicitly replicating it there.

If you want to use the same taxonomy and policy tags across multiple regional locations, learn more about replicating taxonomies in Managing policy tags across locations.

Organizations

You can't use references across organizations. A table and any policy tags that you want to apply to its columns must exist within the same organization.

Limitations

  • If you overwrite to a destination table, any existing policy tags are removed from the table, unless you use the --destination_schema flag to specify a schema with policy tags. The following example shows how to use --destination_schema.

    bq query --destination_table mydataset.mytable2 \
      --use_legacy_sql=false --destination_schema=schema.json \
      'SELECT * FROM mydataset.mytable1'
    
  • A column can have only one policy tag.

  • A table can have at most 1,000 unique policy tags.

  • You can't use legacy SQL if you enabled column-level security. Any legacy SQL queries are rejected if there are any policy tags on the target tables.

Pricing

Column-level security requires the use of both BigQuery and Data Catalog.For pricing information about these products, see the following topics:

Audit logging

When table data with policy tags is read, we save the referenced policy tags in Cloud Logging. However, the policy tag check is not associated with the query that triggered the check.

Through Cloud Logging, auditors can understand who has what kind of access to which categories of sensitive data. For more information, see Auditing policy tags.

For more information about logging in BigQuery, see Introduction to BigQuery monitoring.

For more information about logging in Google Cloud, see Cloud Logging.

What's next