Column-level encryption with Cloud KMS

You can use Cloud Key Management Service (Cloud KMS) to encrypt the keys that in turn encrypt the values within BigQuery tables. You can use the AEAD encryption functions with Cloud KMS keysets or wrapped keysets to provide a second layer of protection at the column level.

Introduction

To provide an extra layer of protection, Cloud KMS encrypts your data encryption key (DEK) with a second key encryption key (KEK). In BigQuery, referencing an encrypted keyset instead of a plaintext keyset helps reduce the risk of key exposure. The KEK is a symmetric encryption keyset that is stored securely in Cloud KMS and managed using Identity and Access Management (IAM) roles and permissions.

BigQuery supports deterministic and non-deterministic encryption functions. With deterministic encryption, if both the pieces of stored data and the additional authenticated data (optional) are identical, then the ciphertext is identical. This allows for support of aggregation and joins based on the encrypted column. With non-deterministic encryption, the stored ciphertext is unique regardless of the encrypted data, which prevents clustering, aggregation, and joins.

At query execution time, you provide the Cloud KMS resource path of the KEK and the ciphertext from the wrapped DEK. BigQuery calls Cloud KMS to unwrap the DEK, and then uses that key to decrypt the data in your query. The non-wrapped version of the DEK is only stored in memory for the duration of the query, and then destroyed.

If you use Cloud KMS in a region where Cloud External Key Manager is supported, you can use Cloud EKM based keys in Cloud KMS.

Use cases

Use cases for encryption with Cloud KMS keys include the following:

  • Externally encrypted data that needs to be stored in BigQuery without storing the keyset in plaintext. Your data can then be exported from the table or decrypted with a SQL query.
  • "Double access control" on encrypted data in BigQuery. A user must be granted permission to both the table and the encryption key to read data in cleartext.
User Permission Matrix
Permission on Table No Permission on Table
Permissions on Key Read and decrypt encrypted data. No access.
No Permissions on Key Read encrypted data. No access.

If a user has permission to access the KMS key and has access to the wrapped keyset, SQL functions can unwrap the keyset and decrypt the ciphertext. Users can also use the Cloud KMS REST API or CLI to unwrap the keyset.
The following query sample uses KMS SQL functions to decrypt non-deterministic ciphertext:

SELECT
  AEAD.DECRYPT_STRING(
    KEYS.KEYSET_CHAIN(@kms_resource_name, @first_level_keyset),
    ciphertext,
    additional_authenticated_data)
FROM
  ciphertext_table
WHERE
  ...

Use-case example

Assume an implementation where zip codes are considered sensitive information. Zip code data can be inserted into the BigQuery table using the AEAD encrypt function, thereby encrypting the Zipcode column. In this example, we use the AEAD.ENCRYPT function with the wrapped keyset management function. The KEYS.KEYSET_CHAIN function encrypts the digital encryption key with the KEK, and the AEAD.ENCRYPT function passes the information to KMS.

The keyset chain for encryption and decryption ensures that the data encryption key (DEK) is encrypted or wrapped with a KEK and passed with that KEK. The wrapped DEK is decrypted or unwrapped within the SQL function, and then used to encrypt or decrypt data.

The AEAD non-deterministic function can decrypt data when it is accessed by using the function in the query that is being run on the table.

image

The AEAD deterministic function can decrypt data when it is accessed by using the function in the query that is being run on the table and supports aggregation and joins using the encrypted data.

image

Non-deterministic function syntax

The supported syntax for using non-deterministic functions includes the following:

AEAD.ENCRYPT(
  KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
  plaintext,
  additional_authenticated_data)
AEAD.DECRYPT_STRING(
  KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
  ciphertext,
  additional_authenticated_data)
AEAD.DECRYPT_BYTES(
  KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
  ciphertext,
  additional_authenticated_data)

See AEAD.DECRYPT_BYTES, AEAD.ENCRYPT, AEAD.DECRYPT_STRING, and KEYS.KEYSET_CHAIN function syntax.

Deterministic function syntax

The supported syntax for using deterministic functions includes the following:

DETERMINISTIC_ENCRYPT(
  KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
  plaintext,
  additional_data)
DETERMINISTIC_DECRYPT_STRING(
  KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
  ciphertext,
  additional_data)
DETERMINISTIC_DECRYPT_BYTES(
  KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
  ciphertext,
  additional_data)

See DETERMINISTIC_DECRYPT_BYTES, DETERMINISTIC_ENCRYPT, DETERMINISTIC_DECRYPT_STRING, and KEYS.KEYSET_CHAIN function syntax.

Roles and permissions

For a list of roles for Cloud KMS, see Cloud KMS permissions and roles.

Limitations

Encryption with Cloud KMS has the following limitations and restrictions:

  • The Cloud KMS keys are restricted to the same region or multi-region as the query. Using global Cloud KMS keys is disallowed for reliability reasons.

  • It is not possible to rotate a wrapped keyset using the KEYS.ROTATE_KEYSET function.

  • The constant parameters in a BigQuery query are visible to users in the diagnostic query plan. This factor can affect the kms_resource_name and first_level_keyset parameters of the KEYSET_CHAIN function. Keys are never exposed in plaintext, and permission to the Cloud KMS key is required to decrypt the wrapped keyset. This approach ensures that keys are not exposed through the diagnostic query plan unless the user has permission to decrypt the keyset.

  • Column-level encryption has the following limitations when used with type-based security classifications:

    • Column-level security: Users can only decrypt or encrypt data on columns that they are allowed to access.

    • Row-level security: Users can only decrypt data on rows that they are allowed to access.

  • Column-level SQL functions have no significant impact on performance when compared to the performance of raw encryption functions where the key data is sent in plaintext.

Before you begin

To work with Cloud KMS keys, keysets, encrypted tables, deterministic, and non-deterministic functions, you need to do the following if you have not already done so:

  1. Create a Google Cloud project.

  2. Create a BigQuery dataset.

  3. Create a Cloud KMS key ring.

  4. Create a Cloud KMS key for an encrypted column with the software or Hardware Security Module (HSM) protection level.

  5. Grant user permissions to work with Cloud KMS keys, encryption, and decryption.

Take note of the following concepts, as they are referenced in the next sections:

  • PROJECT_ID: The name of the Google Cloud project.

  • DATASET_NAME: The name of the BigQuery dataset.

  • LOCATION_ID: The location of the BigQuery dataset.

  • TABLE_NAME: The name of the BigQuery table.

  • KEY_RING_ID: The name of the Cloud KMS key ring.

  • KEY_ID: The name of the Cloud KMS key.

  • KMS_KEY: Cloud KMS key (KEK) in this format:

    'gcp-kms://projects/PROJECT_ID/locations/LOCATION_ID/keyRings/KEY_RING_ID/cryptoKeys/KEY_ID'
    

    Here is an example of a Cloud KMS key:

    'gcp-kms://projects/myProject/locations/us/keyRings/myKeyRing/cryptoKeys/myKeyName'
    
  • KMS_KEY_SHORT: Similar to KMS_KEY but in this format:

    projects/PROJECT_ID/locations/LOCATION_ID/keyRings/KEY_RING_ID/cryptoKeys/KEY_ID
    
  • KEYSET_DECODED: A decoded keyset as a BYTES sequence. The output looks similar to that for a decoded wrapped keyset.

    Although keyset functions return keysets as bytes, the user output is displayed as an encoded string. To convert an encoded keyset to a decoded keyset, see Decode a Cloud KMS keyset

  • KEYSET_ENCODED: An encoded keyset as a STRING. The output looks similar to that for an encoded wrapped keyset.

    To convert an encoded keyset to a decoded keyset, see Decode a Cloud KMS keyset

  • WRAPPED_KEYSET_DECODED: A decoded wrapped keyset as a BYTES sequence. Here is an example of what the output for this looks like:

    b'\x0a$\x00\xa6\xee\x12Y\x8d|l"\xf7\xfa\xc6\xeafM\xdeefy\xe9\x7f\xf2z\xb3M\
    xf6"\xd0\xe0Le\xa8\x8e\x0fR\xed\x12\xb7\x01\x00\xf0\xa80\xbd\xc1\x07Z\\
    \xd0L<\x80A0\x9ae\xfd(9\x1e\xfa\xc8\x93\xc7\xe8\...'
    

    Although wrapped keyset functions return wrapped keysets as bytes, the user output is displayed as an encoded string. To convert an encoded wrapped keyset to a decoded wrapped keyset, see Decode a Cloud KMS keyset

  • WRAPPED_KEYSET_ENCODED: An encoded wrapped keyset as a STRING. Here is an example of what the output for this looks like:

    'CiQApu4SWTozQ7lNwITxpEvGlo5sT2rv1tyuSv3UAMtoTq/lhDwStwEA8KgwvX7CpVVzhWWMkRw
    WZNr3pf8uBIlzHeunCy8ZsQ6CofQYFpiBRBB6k/QqATbiFV+3opnDk/6dBL/S8OO1WoDC+DdD9
    uzEFwqt5D20lTXCkGWFv1...'
    

    To convert an encoded wrapped keyset to a decoded wrapped keyset, see Decode a Cloud KMS keyset

Key management

The following sections contain common tasks you can perform with Cloud KMS keys.

Create a keyset

You can create wrapped keysets or raw keysets. To do this, complete the steps in the following sections.

Create a raw keyset

Run the following query to create a keyset with a key of type DETERMINISTIC_AEAD_AES_SIV_CMAC_256.

SELECT KEYS.NEW_KEYSET('DETERMINISTIC_AEAD_AES_SIV_CMAC_256') AS raw_keyset

Create a wrapped keyset

Run the following query to create a Cloud KMS wrapped keyset with a key of type DETERMINISTIC_AEAD_AES_SIV_CMAC_256.

SELECT KEYS.NEW_WRAPPED_KEYSET(
  KMS_KEY,
  'DETERMINISTIC_AEAD_AES_SIV_CMAC_256')

Decode a keyset

Although SQL functions that return keysets produce the keysets in BYTES format, the user-displayed result is encoded and displayed in STRING format. If you would like to convert this encoded string to a decoded bytes sequence that you can use as literal key encryption functions, use the following query.

Decode a wrapped keyset

Run the following query to decode a Cloud KMS wrapped keyset.

SELECT FORMAT('%T', FROM_BASE64(WRAPPED_KEYSET_ENCODED'))

Decode a raw keyset

Run the following query to decode a raw keyset.

SELECT FORMAT('%T', FROM_BASE64(KEYSET_ENCODED'))

Rewrap a wrapped keyset

Run the following query to rewrap a Cloud KMS wrapped keyset with a new Cloud KMS key. KMS_KEY_CURRENT represents the new KMS_KEY that is used to encrypt the keyset. KMS_KEY_NEW represents the new KMS_KEY that is used to encrypt the keyset.

SELECT KEYS.REWRAP_KEYSET(
  KMS_KEY_CURRENT,
  KMS_KEY_NEW,
  WRAPPED_KEYSET_DECODED)

Rotate a wrapped keyset

Run the following query to rotate a Cloud KMS wrapped keyset with a key of type DETERMINISTIC_AEAD_AES_SIV_CMAC_256.

SELECT KEYS.ROTATE_WRAPPED_KEYSET(
  KMS_KEY,
  WRAPPED_KEYSET_DECODED,
  'DETERMINISTIC_AEAD_AES_SIV_CMAC_256')

Generate a raw keyset from a wrapped keyset

Some encryption functions require a raw keyset. To decrypt a Cloud KMS wrapped keyset to produce a raw keyset, complete the following steps.

  1. Create a wrapped keyset.

  2. In the bq command-line tool enter the following commands to save a wrapped keyset in a file called keyset_to_unwrap, decrypt they wrapped keyset, and produce the output in the KEYSET_DECODED format:

    echo WRAPPED_KEYSET_ENCODED | base64 -d > /tmp/decoded_wrapped_key
    
    gcloud kms decrypt \
    --ciphertext-file=/tmp/decoded_wrapped_key \
    --key=KMS_KEY_SHORT \
    --plaintext-file=/tmp/keyset_to_unwrap.dec \
    --project=PROJECT_ID
    
    od -An --format=o1 /tmp/keyset_to_unwrap.dec | tr ' ' '\'
    

Generate a wrapped keyset from a raw keyset

Some encryption functions require a Cloud KMS wrapped keyset. To encrypt a raw keyset to produce a wrapped keyset, complete the following steps.

  1. Create a raw keyset.

  2. In the bq command-line tool enter the following commands to save a raw keyset in a file called keyset_to_wrap, encrypt they raw keyset, and produce the output in the WRAPPED_KEYSET_DECODED format:

    echo KEYSET_ENCODED | base64 -d > /tmp/decoded_key
    
    gcloud kms encrypt \
    --plaintext-file=/tmp/decoded_key \
    --key=KMS_KEY_SHORT \
    --ciphertext-file=/tmp/keyset_to_wrap.dec \
    --project=PROJECT_ID
    
    od -An --format=o1 /tmp/keyset_to_wrap.dec | tr ' ' '\'
    

Get the number of keys in a keyset

Run the following query to get the number of keys in a raw keyset.

  1. If you are working with a wrapped keyset, first generate a raw keyset.

  2. Run this query with the raw keyset:

    SELECT KEYS.KEYSET_LENGTH(KEYSET_DECODED) as key_count;
    

Get a JSON representation of a keyset

Run the following query to view a JSON representation of a raw keyset.

  1. If you are working with a wrapped keyset, first generate a raw keyset.

  2. Run this query with the raw keyset:

    SELECT KEYS.KEYSET_TO_JSON(KEYSET_DECODED);
    

Encryption and decryption

You can use raw keysets or wrapped keysets to encrypt a column in a table. You can also choose to use deterministic or non-deterministic encryption on your columns. The examples in this section use wrapped keysets but you could replace the wrapped keysets with raw keysets.

Deterministically encrypt a column with a wrapped keyset

Run the following query to create a table and store a Cloud KMS wrapped keyset with deterministic encryption in a column called encrypted_content.

  1. Create a wrapped keyset.

  2. Encrypt a column with the wrapped keyset.

    CREATE OR REPLACE TABLE DATASET_NAME.TABLE_NAME AS
      SELECT DETERMINISTIC_ENCRYPT(
        KEYS.KEYSET_CHAIN(KMS_KEY, WRAPPED_KEYSET_DECODED),
        'plaintext',
        '') AS encrypted_content
    

Deterministically decrypt a column with a wrapped keyset

Run the following query to deterministically decrypt a column that contains encrypted content, using a Cloud KMS wrapped keyset. This query assumes you are referencing a table with a column called encrypted_content.

SELECT DETERMINISTIC_DECRYPT_STRING(
  KEYS.KEYSET_CHAIN(KMS_KEY, WRAPPED_KEYSET_DECODED),
  encrypted_content,
  '')
FROM DATASET_NAME.TABLE_NAME

Non-deterministically encrypt a column with a wrapped keyset

See Deterministically encrypt a column with a wrapped keyset, but replace DETERMINISTIC_ENCRYPT with AEAD.ENCRYPT. Make sure that your keyset is of type AEAD_AES_GCM_256.

Non-deterministically decrypt a column with a wrapped keyset

See Deterministically decrypt a column with a wrapped keyset, but replace DETERMINISTIC_DECRYPT_STRING with AEAD.DECRYPT_STRING. Make sure that your keyset is of type AEAD_AES_GCM_256.

What's next

  • Learn more about Cloud KMS. This topic includes conceptual information about column-level encryption for Google Cloud.
  • Learn more about AEAD encryption for BigQuery. This topic includes conceptual information about column-level encryption specifically for BigQuery.
  • Learn more about AEAD encryption functions for BigQuery. This topic contains all of the SQL functions that you can use for column-level encryption in BigQuery.