Stay organized with collections Save and categorize content based on your preferences.

SQL column-level encryption with a Cloud KMS key

You can use Cloud Key Management Service (Cloud KMS) to encrypt the keys that in turn encrypt the values within BigQuery tables. You can use the AEAD encryption functions to provide a second layer of protection at the column level.

Overview

To provide an extra layer of protection, Cloud KMS encrypts your data encryption key (DEK) with a second key encryption key (KEK). In BigQuery, referencing an encrypted keyset instead of a plaintext keyset helps reduce the risk of key exposure. The KEK is a symmetric encryption keyset that is stored securely in Cloud KMS and managed using Identity and Access Management (IAM) roles and permissions.

BigQuery supports deterministic and non-deterministic encryption functions. With deterministic encryption, if both the pieces of stored data and the additional authenticated data (optional) are identical, then the ciphertext is identical. This allows for support of aggregation and joins based on the encrypted column. With non-deterministic encryption, the stored ciphertext is unique regardless of the encrypted data, which prevents clustering, aggregation, and joins.

At query execution time, you provide the Cloud KMS resource path of the KEK and the ciphertext from the wrapped DEK. BigQuery calls Cloud KMS to unwrap the DEK, and then uses that key to decrypt the data in your query. The unwrapped version of the DEK is only stored in memory for the duration of the query, and then destroyed.

Use cases

Use cases for encryption with Cloud KMS keys include the following:

  • Externally encrypted data that needs to be stored in BigQuery without storing the keyset in plaintext. Your data can then be exported from the table or decrypted with a SQL query.
  • "Double access control" on encrypted data in BigQuery. A user must be granted permission to both the table and the encryption key to read data in cleartext.
User Permission Matrix
Permission on Table No Permission on Table
Permissions on Key Read and decrypt encrypted data. No access.
No Permissions on Key Read encrypted data. No access.

If a user has permission to access the KMS key and has access to the wrapped keyset, SQL functions can unwrap the keyset and decrypt the ciphertext. Users can also use the Cloud KMS REST API or CLI to unwrap the keyset.
The following query sample uses KMS SQL functions to decrypt non-deterministic ciphertext:

SELECT
  AEAD.DECRYPT_STRING(KEYS.KEYSET_CHAIN(
    @kms_resource_name,
    @first_level_keyset),
    ciphertext,
    additional_authenticated_data)
FROM
  ciphertext_table
WHERE
  ...

Use-case example

Assume an implementation where zip codes are considered sensitive information. Zip code data can be inserted into the BigQuery table using the AEAD encrypt function, thereby encrypting the Zipcode column. In this example, we use the AEAD.ENCRYPT function with the wrapped keyset management function. The KEYS.KEYSET_CHAIN function encrypts the digital encryption key with the KEK, and the AEAD.ENCRYPT function passes the information to KMS.

The keyset chain for encryption and decryption ensures that the data encryption key (DEK) is encrypted or wrapped with a KEK and passed with that KEK. The wrapped DEK is decrypted or unwrapped within the SQL function, and then used to encrypt or decrypt data.

The AEAD non-deterministic function can decrypt data when it is accessed by using the function in the query that is being run on the table.

image

The AEAD deterministic function can decrypt data when it is accessed by using the function in the query that is being run on the table and supports aggregation and joins using the encrypted data.

image

Non-deterministic function syntax

The supported syntax for using non-deterministic functions includes the following:

AEAD.ENCRYPT(KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
    plaintext, additional_authenticated_data)
AEAD.DECRYPT_STRING(KEYS.KEYSET_CHAIN(kms_resource_name,
    first_level_keyset), ciphertext, additional_authenticated_data)
AEAD.DECRYPT_BYTES(KEYS.KEYSET_CHAIN(kms_resource_name,
    first_level_keyset), ciphertext, additional_authenticated_data)

See AEAD.DECRYPT_BYTES, AEAD.ENCRYPT, AEAD.DECRYPT_STRING, and KEYS.KEYSET_CHAIN function syntax.

Deterministic function syntax

The supported syntax for using deterministic functions includes the following:

DETERMINISTIC_ENCRYPT(
  KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
  plaintext,
  additional_data)
DETERMINISTIC_DECRYPT_STRING(
  KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
  ciphertext,
  additional_data)
DETERMINISTIC_DECRYPT_BYTES(
  KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
  ciphertext,
  additional_data)

See DETERMINISTIC_DECRYPT_BYTES, DETERMINISTIC_ENCRYPT, DETERMINISTIC_DECRYPT_STRING, and KEYS.KEYSET_CHAIN function syntax.

Roles and permissions

Grant IAM roles that give users the necessary permissions to perform each task in this document.

Encryption with Cloud KMS keys requires the following roles and permissions:

  • Administrators must have the roles/cloudkms.cryptoKeyEncrypter role to create keyset chains.
  • Users must have the roles/cloudkms.cryptoKeyDecrypterViaDelegation role to use SQL encryption functions. For example:

    gcloud projects
    add-iam-policy-binding ${PROJECT} --member=user:someone@foo.com --role=roles/cloudkms.cryptoKeyDecrypterViaDelegation
    

Limitations

Encryption with Cloud KMS has the following limitations and restrictions:

  • The Cloud KMS keys are restricted to the same region or multi-region as the query. Using global Cloud KMS keys is disallowed for reliability reasons.

  • It is not possible to rotate a wrapped keyset using the KEYS.ROTATE_KEYSET function.

  • The constant parameters in a BigQuery query are visible to users in the diagnostic query plan. This factor can affect the kms_resource_name and first_level_keyset parameters of the KEYSET_CHAIN function. Keys are never exposed in plaintext, and permission to the Cloud KMS key is required to decrypt the wrapped keyset. This approach ensures that keys are not exposed through the diagnostic query plan unless the user has permission to decrypt the keyset.

  • Column-level encryption has the following limitations when used with type-based security classifications:

    • Column-level security: Users can only decrypt or encrypt data on columns that they are allowed to access.

    • Row-level security: Users can only decrypt data on rows that they are allowed to access.

  • Column-level SQL functions have no significant impact on performance when compared to the performance of raw encryption functions where the key data is sent in plaintext.

Encrypt a column in an encrypted table

The following steps create an encrypted table and then encrypt a column in that table using SQL functions.

The first-level keyset is the data encryption keyset wrapped by Cloud KMS. It is enforced as a constant and is in the BYTES format.

  1. Create a dataset.

  2. In the dataset, create an encrypted table.

    CREATE TABLE IF NOT EXISTS DATASET_NAME.TABLE_NAME(
     plaintext STRING,
     encrypted_data BYTES);
    
  3. Create a key ring.

  4. Create Cloud KMS keys for an encrypted column with the software or Hardware Security Module (HSM) protection level.

  5. Grant user permissions for the Cloud KMS keys. Administrators must have the roles/cloudkms.cryptoKeyEncrypter role in order to create keyset chains in the next step. Users must have the roles/cloudkms.cryptoKeyDecrypterViaDelegation role to use SQL encryption functions.

    gcloud projects add-iam-policy-binding  PROJECT_ID \
        --member=user:user_name@example.com \
        --role=roles/cloudkms.cryptoKeyDecrypterViaDelegation
    
  6. Create a raw keyset. This generates a keyset of 256 bits.

    bq --project_id=PROJECT_ID query --use_legacy_sql=false \
     "SELECT KEYS.NEW_KEYSET('AEAD_AES_GCM_256') AS raw_keyset"
    
  7. Create and format a wrapped keyset (that is, encrypt a raw keyset with a KMS key) with an encryption utility. The following sample runs on a terminal running on a macOS or Linux computer. Windows users should generate a keyset using a third-party utility.

    echo "CPPpiNMFEmQKWAowdHlwZS5nb29…" |base64 --decode > /tmp/decoded_key
    gcloud kms encrypt --plaintext-file=/tmp/decoded_key \
       --key=projects/PROJECT_ID/locations/LOCATION_NAME/keyRings/KEY_RING_ID/cryptoKeys/KEY_ID \
       --ciphertext-file=/tmp/bankaccounts_wrapped
    od -An --format=o1 /tmp/bankaccounts_wrapped | tr -d '\n'|tr ' ' '\'
    
  8. Encrypt a column with a given keyset chain using SQL encrypt functions.

    DECLARE KMS_RESOURCE_NAME STRING;
    DECLARE FIRST_LEVEL_KEYSET BYTES;
    SET KMS_RESOURCE_NAME = "gcp-kms://projects/PROJECT_ID/locations/LOCATION_NAME/keyRings/KEY_RING_ID/cryptoKeys/KEY_ID";
    # Set the first level keyset equal to the output from the previous step
    SET FIRST_LEVEL_KEYSET = b"\012\044\000\031\261\001\270\114......";
    INSERT INTO DATASET_NAME.TABLE_NAME
    SELECT "example_plaintext",
     AEAD.ENCRYPT(
       KEYS.KEYSET_CHAIN(KMS_RESOURCE_NAME, FIRST_LEVEL_KEYSET),
       "example_plaintext",
       "additional_data");
    

Decrypt a column in an encrypted table

The following steps decrypt columns in an encrypted table by using SQL functions.

The first-level keyset is the data encryption keyset wrapped by the Cloud KMS. It is enforced as a constant and is in the BYTES format.

  1. Encrypt a column in an encrypted table.

  2. Now that the column is encrypted, decrypt the column with the same keyset chain using SQL decrypt functions.

    DECLARE KMS_RESOURCE_NAME STRING;
    DECLARE FIRST_LEVEL_KEYSET BYTES;
    SET KMS_RESOURCE_NAME = "gcp-kms://projects/PROJECT_ID/locations/LOCATION_NAME/keyRings/KEY_RING_ID/cryptoKeys/KEY_ID";
    SET FIRST_LEVEL_KEYSET = b"\012\044\000\031\261\001\270\114......";
    SELECT
     AEAD.DECRYPT_STRING(
       KEYS.KEYSET_CHAIN(KMS_RESOURCE_NAME, FIRST_LEVEL_KEYSET),
       encrypted_data,
       "additional_data"),
     CAST(AEAD.DECRYPT_BYTES(
       KEYS.KEYSET_CHAIN(KMS_RESOURCE_NAME, FIRST_LEVEL_KEYSET),
       encrypted_data,
       CAST("additional_data" AS BYTES)) AS STRING)
    FROM DATASET_NAME.TABLE_NAME;