Jump to Content
Security & Identity

Announcing new BigQuery capabilities to help secure sensitive data

June 21, 2022
Deepti Mandava

Product Manager, Google Cloud

In order to better serve their customers and users, digital applications and platforms continue to store and use sensitive data such as Personally Identifiable Information (PII), genetic and biometric information, and credit card information. Many organizations that provide data for analytics use cases face evolving regulatory and privacy mandates, ongoing risks from data breaches and data leakage, and a growing need to control data access. 

Data access control and masking of sensitive information is even more complex for large enterprises that are building massive data ecosystems. Copies of datasets often are created to manage access to different groups. Sometimes, copies of data are obfuscated while other copies aren’t. This creates an inconsistent approach to protecting data, which can be expensive to manage. To fully address these concerns, sensitive data needs to be protected with the right defense mechanism at the base table itself so that data can be kept secure throughout its entire lifecycle.

Today, we’re excited to introduce two new capabilities in BigQuery that add a second layer of defense on top of access controls to help secure and manage sensitive data. 

1. General availability of BigQuery column-level encryption functions

BigQuery column-level encryption SQL functions enable you to encrypt and decrypt data at the column level in BigQuery. These functions unlock use cases where data is natively encrypted in BigQuery and must be decrypted when accessed. It also supports use cases where data is externally encrypted, stored in BigQuery, and must then be decrypted when accessed. SQL functions support industry standard encryption algorithms AES-GCM (non-deterministic) and AES-SIV (deterministic).  Functions supporting AES-SIV allow for grouping, aggregation, and joins on encrypted data. 

In addition to these SQL functions, we also integrated BigQuery with Cloud Key Management Service (Cloud KMS). This gives you additional control, and allows you to manage your encryption keys in KMS and enables on-access secure key retrieval as well as detailed logging. An additional layer of envelope encryption enables generations of wrapped key sets to decrypt data. Only users with permission to access the Cloud KMS key and the wrapped keyset can unwrap the keyset and decrypt the ciphertext. 

“Enabling dynamic field level encryption is paramount for our data fabric platform to manage highly secure, regulated assets with rigorous security policies complying with several regulations including FedRAMP, PCI, GDPR, CCPA and more. BigQuery column-level encryption capability provides us with a secure path for decrypting externally encrypted data in BigQuery unblocking analytical use cases across more than 800+ analysts,” said Kumar Menon, CTO of Equifax.

Users can also leverage available SQL functions to support both non-deterministic encryption and deterministic encryption to enable joins and grouping of encrypted data columns.

The following query sample uses non-deterministic SQL functions to decrypt ciphertext.

Loading...

The following query sample uses deterministic SQL functions to decrypt ciphertext.

Loading...

2. Preview of dynamic data masking in BigQuery

Extending BigQuery’s column-level security, dynamic data masking allows you to obfuscate sensitive data and control user access while mitigating the risk of data leakage. This capability selectively masks column level data at query time based on the defined masking rules, user roles and privileges. Masking eliminates the need to duplicate data and allows you to define different masking rules on a single copy of data to desensitize data, simplify user access to sensitive data, and protect against compliance, privacy regulations, or confidentiality issues. 

Dynamic data masking allows for different transformations of underlying sensitive data to obfuscate data at query time. Masking rules can be defined on the policy tag in the taxonomy to grant varying levels of access based on the role and function of the user and the type of sensitive data. Masking adds to the existing access controls to allow customers a wide gamut of options around controlling access. An administrator can grant a user full access, no access or partial access with a particular masked value based on data sharing use case.

https://storage.googleapis.com/gweb-cloudblog-publish/images/BigQuery_XhjE3xI.max-2000x2000.jpg

For the preview of data masking, three different masking policies are being supported: 

  1. ALWAYS_NULL. Nullifies the content regardless of column data types.

  2. SHA256. Applies SHA256 to STRING or BYTES data types. Note that the same restrictions apply to the SHA256 function.

  3. Default_VALUE. Returns the default value based on the data type.

A user must first have all of the permissions necessary to run a query job against a BigQuery table to query it. In addition, for users to view the masked data of a column tagged with a policy tag they need to have a MaskedReader role.

When to use dynamic data masking vs encryption functions?

Common scenarios for using data masking or column level encryption are: 

  • protect against unauthorized data leakage 

  • access control management 

  • compliance against data privacy laws for PII, PHI, PCI data

  • create safe test datasets

Specifically, masking can be used for real-time transactions whereas encryption provides additional security for data at rest or in motion where real-time usability is not required.  

Any masking policies or encryption applied on the base tables are carried over to authorized views and materialized views, and masking or encryption is compatible with other security features such as row-level security. 

These newly added BQ security features along with automatic DLP can help to scan your data across your entire organization, give you visibility into where sensitive data is stored, and enable you to manage access and usability of data for different use cases across your user base. We’re always working to enhance BigQuery’s (and Google Cloud’s) data governance capabilities, to enable end to end management of your sensitive data. With the new releases, we are adding deeper protections for your data in BigQuery.

Posted in