Jump to Content
Data Analytics

How to deploy Tink for BigQuery encryption on-prem and in the cloud

January 18, 2023
Daryus Medora

Strategic Cloud Engineer, Data & Analytics

Oscar Pulido

Strategic Cloud Engineer, Data & Analytics

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

Data security is a key focus for organizations moving their data warehouses from on-premises to cloud-first systems, such as BigQuery. In addition to storage-level encryption, whether using Google-managed or customer-managed keys, BigQuery also provides column-level encryption. Using BigQuery's SQL AEAD functions, organizations can enforce a more granular level of encryption to help protect sensitive customer data, such as government identity or credit card numbers, and help comply with security requirements.

While BigQuery provides column-level encryption in the cloud, many organizations operate in hybrid-cloud environments. To prevent a scenario where data needs to be decrypted and re-encrypted each time it moves between locations, Google Cloud offers a consistent and interoperable encryption mechanism. This enables deterministically-encrypted data (which maintains referential integrity) to be immediately joined with on-prem tables for anonymized analytics.

To achieve a BigQuery-compatible encryption on-prem, customers can use Tink, a Google-developed open-source cryptography library. BigQuery uses Tink to implement its SQL AEAD functions. We can use the Tink library directly to encrypt data on-prem in a way that can later be decrypted using BigQuery SQL in the cloud, and decrypt BigQuery's column level-encrypted data outside of BigQuery. 

For our customers who want to use Tink with BigQuery, we have put together a few helpful Python utilities and samples in the BigQuery Tink Toolkit GitHub repo. Let's first walk through an example of how to use Tink directly to encrypt or decrypt on-prem data using the same keyset used for BigQuery, followed by how the BigQuery Tink Toolkit can help simplify working with Tink. 

To start, we need to retrieve the Tink keyset. We'll assume that KMS-wrapped keysets are being used. These keysets need to be stored in BigQuery to use with BigQuery SQL.  If needed, they can also be replicated to a secondary store on-prem.

lang-py
Loading...

Now that we have the encrypted keyset, we need to unwrap it to retrieve the usable Tink keyset. If Cloud KMS is not accessible from on-prem, the unwrapped keyset will need to be maintained in a secure keystore on-prem.

lang-py
Loading...

We can now use the keyset to generate a Tink primitive. This can be used to encrypt or decrypt data with the associated keyset. Note that different primitives should be used depending on whether the keyset is for a deterministic or nondeterministic key.

lang-py
Loading...

Once we have our cipher, we can use it to encrypt or decrypt data as needed.

lang-py
Loading...

We have provided the CipherManager class to help simplify this process, which handles four actions:

  1. Retrieving the required keysets from a BigQuery table

  2. Unwrapping those keysets

  3. Creating a Tink cipher for each column

  4. Providing a consistent interface to call encrypt and decrypt. 

We have also included a sample Spark job that shows how to use CipherManager to encrypt or decrypt columns for a given table. We hope these come in handy - happy Tinkering.

Posted in