Data Analytics

Introducing BigQuery column-level security: new fine-grained access controls

We’re announcing a key capability to help organizations govern their data in Google Cloud. Our new BigQuery column-level security controls are an important step toward placing policies on data that differentiate between classes. This allows for compliance with regulations that mandate such distinction, such as GDPR or CCPA. 

BigQuery already lets organizations provide controls to data containers, satisfying the principle of “least privilege.” But there is a growing need to separate access to certain classes of data—for example, PHI (patient health information) and PII (personally identifiable information)—so that even if you have access to a table, you are still barred from seeing any sensitive data in that table. 

This is where column-level security can help. With column-level security, you can define the data classes used by your organization. BigQuery column-level security is available as a new policy tag applied to columns in the BigQuery schema pane, and managed in a hierarchical taxonomy in Data Catalog. 

The taxonomy is usually composed of two levels: 

  • A root node, where a data class is defined, and 
  • Leaf nodes, where the policy tag is descriptive of the data type (for example, phone number or mailing address).

The aforementioned abstraction layer lets you manage policies at the root nodes, where the recommended practice is to use those nodes as data classes; and manage/tag individual columns via leaf nodes, where the policy tag is actually the meaning of the content of the column. 

Organizations and teams working in highly regulated industries need to be especially diligent with sensitive data. “BigQuery’s column-level security allows us to simplify sharing data and queries while giving us comfort that highly secure data is only available to those who truly need it,” says Ben Campbell, data architect at Prosper Marketplace.

Here’s how column-level security looks in BigQuery:

bq example.jpg

In the above example, the organization has three broad categories of data sensitivity: restricted, sensitive, and unrestricted. For this specific organization, both PHI and PII are highly restricted, while financial data is sensitive. You will notice that individual info types, such as the ones detectable by Google Cloud Data Loss Prevention (DLP), are in the leaf nodes. This allows you to move a leaf node (or an intermediate node) from a restricted data class to a less sensitive one. If you manage policies on the root nodes, you will not need to re-tag columns to change the policy applied to them. This allows you to reflect changes in regulations or compliance requirements by moving leaf nodes. For example, you can take "Zipcode" from "Unrestricted Data," move it to "PII," and immediately restrict access to such data.

Learn more about BigQuery column-level security

You’ll be able to see the relevant policies that are applied to BigQuery’s columns within the BigQuery schema pane. If attempting to query a column you do not have access to (which is clearly indicated by the banner notice as well as the grayed-out nature of the field), the access will be securely denied. Access control applies to every method used to access BigQuery data (API, Views, etc.). 

Here’s what that looks like:

drug_and_treatments.jpg
Schema of BigQuery table. All but the first two columns have policy tags imposing column-level access restrictions. This user does not have access to them.

We’re always working to enhance BigQuery’s (and Google Cloud’s) data governance capabilities to provide more controls around access, on-access data transformation, and data retention, and provide a holistic view of your data governance across Google Cloud’s various storage systems. You can try the capability out now.