Introduction to BigQuery Column-level security

BigQuery provides fine-grained access to sensitive columns using policy tags, or type-based classification, of data. Using BigQuery Column-level security, you can create policies that check, at query time, whether a user has proper access. For example, a policy can enforce access checks such as:

  • You must be in group:high-access to see the columns containing TYPE_SSN.

Column-level security workflow

Workflow

To restrict data access at the column level:

  1. Define a taxonomy and data policy tags. Use Data Catalog to create and manage a taxonomy and policy tags for your data. For guidelines, see Best practices for policy tags.

  2. Assign policy tags to your BigQuery columns. In BigQuery, use schema annotations to assign a policy tag to each column where you want to restrict access.

  3. Manage access on the policy tags. Use Identity and Access Management (IAM) policies to restrict access to each policy tag. The policy is in effect for each column that belongs to the policy tag.

When a user tries to access column data at query time, BigQuery checks the column policy tag and its policy to see if the user is authorized to access the data.

Example use case

Consider an organization that needs to classify sensitive data into three categories: High, Medium, and Low.

Policy tags

To set up column level security:

  1. A data governance admin defines the Medium and High policy tags in Data Catalog.

  2. For the High access policy tag, the admin grants access to the high-tier-access group.

  3. The admin assigns columns to the High policy tags. For example, if the ssn column is in the High policy tag, the schema for ssn contains a policyTags field. The following shows how a policy tag appears in the Current schema page.

    Policy tag UI

    For details on using the console to set a policy tag, see Set a policy tag on a column.

    Alternatively, you can set the policy tag using the bq update command. The names field of policyTags includes the ID of the High policy tag, projects/project- id/locations/location/taxonomies/taxonomy-id/policyTags/policytag-id:

    [
     ...
     {
       "name": "ssn",
       "type": "STRING",
       "mode": "REQUIRED",
       "policyTags": {
         "names": ["projects/project-id/locations/location/taxonomies/taxonomy-id/policyTags/policytag-id",]
       }
     },
     ...
    ]
    

    For details on using the bq update command to set a policy tag, see Set a policy tag on a column.

  4. The admin performs similar steps for the Medium policy tag.

With this fine-grained access, you can manage access to many columns by controlling only a small number of data classification policy tags.

For details about these steps, see Restricting access with BigQuery column-level security.

Roles used with column-level security

The following Data Catalog roles are used for BigQuery column-level security.

Role Title Applies to Description
roles/
datacatalog.categoryAdmin
Policy Tag Admin Project Has permission to:
  • create, read, update, and delete taxonomies
  • set IAM policies on policy tags
This role does not have access to read column content.

Typically a user with this role is in charge of setting data governance policies for the organization.
roles/
datacatalog.categoryFineGrainedReader
Fine-Grained Reader Policy tag Has permission to access BigQuery columns tagged with the corresponding policy tag.

Typically a user with this role is querying the data.

Impact of writes

To read data from a column that is protected by Column-level security, the user is always required to have read permission, via the fine-grained read access, on the policy tags for the column.

This applies to:

  • Tables, including wildcard tables
  • Views
  • Copying tables

To write data to a row for a column that is protected by Column-level security, the user requirement depends on the type of write.

If the write operation is an insert, fine-grained read access is not required. However, the user won't have access to read the data that was inserted, unless the user has fine-grained read access.

If the write operation is an update, delete, or merge, the user can't perform the operation unless the user has fine-grained read access.

A user can't load data from local files or from Cloud Storage unless the user has fine-grained read access. However, a user can load data from streaming, because streaming loads do not check policy tags. The user won't have access to read the data that was loaded from a stream, unless the user has fine-grained read access.

For more information, see Impact on writes with BigQuery Column-level security.

Querying tables

If a user has dataset access and has the Data Catalog Fine-Grained Reader role, the column data is available to the user. The user runs a query as normal.

If a user has dataset access but does not have the Data Catalog Fine-Grained Reader role, the column data is not available to the user. If such a user runs SELECT *, the user will receive an error which lists the column(s) that the user cannot access. To resolve the error, you can either:

  • Modify the query to exclude the column(s) that the user cannot access. For example, if the user does not have access to the ssn column, but does have access to the remaining columns, the user can run the following query:

    SELECT * EXCEPT (ssn) FROM ...
    

    In the example above, the EXCEPT clause excludes the ssn column.

  • Ask a Data Catalog Administrator to add the user as a Data Catalog Fine-Grained Reader to the relevant data class. The error message provides the full name of the policy tag for which the user would need access.

Querying views

The impact of column-level security on views is independent of whether or not the view is an authorized view. In both cases, column-level security is enforced transparently.

If the view is not an authorized view:

If the user has column-level access to the columns in the view's underlying tables, then the user can query the columns in the view. Otherwise, the user cannot query the columns in the view.

If the view is an authorized view:

The access is controlled by only the column-level security on the columns in the view's underlying tables. Table-level and dataset-level IAM policies, if any, are not used to check access. If the user has access to the policy tags used in the authorized view's underlying tables, the user can query the columns in the authorized view.

The following diagram shows how access to a view is evaluated.

Accessing views

Impact of snapshots

BigQuery allows users to query a table in an earlier state. This capability allows you to query the rows from a previous snapshot of the table. It also allows you to roll the table back to the date the snapshot was taken.

In legacy SQL, you query a snapshot by using snapshot decorators on the table name. In standard SQL, you query a snapshot by using the FOR SYSTEM_TIME AS OF clause on the table.

The time limit to query a snapshot is the most recent of:

  • the last 7 days
  • the table creation time

You can't query a previous table if the table was deleted and recreated with the same table name.

To see how snapshots work with column-level security, assume you have the following in place:

  • you have an origin point, which is an existing table at the current time.

  • you have the snapshot of the same table.

  • the schema of the snapshot is identical to, or a subset of, the schema of the origin. That is, all columns of the snapshot are included within the origin.

With these assumptions, the permissions checked are against the latest column-level security on the origin (current) table. If the user is allowed to read the current columns, then the user will be able to query a snapshot of those columns.

If the schema of the origin and the snapshot differ for the columns in the query, a request to retrieve a snapshot will fail.

Limitations

  • If you overwrite to a destination table, any existing policy tags are removed from the table, unless you use the --destination_schema flag to specify a schema with policy tags. The following example shows how to use --destination_schema.

    bq query --destination_table mydataset.mytable2 \
      --use_legacy_sql=false --destination_schema=schema.json \
      'SELECT * FROM mydataset.mytable1'
    
  • A column can have only one policy tag.

  • A table can have at most 1,000 unique policy tags.

  • You can't use legacy SQL if you enabled column-level security. Any legacy SQL queries are rejected if there are any policy tags on the target tables.

  • You can't use references across organizations. That is, a table and any policy tags used for the table must exist within the same organization.

What's next