SQL column-level encryption with Cloud KMS keys

You can use Cloud Key Management Service (Cloud KMS) to encrypt the keys that in turn encrypt the values within BigQuery tables. You can use the AEAD encryption functions to provide a second layer of protection at the column level.

This additional layer of protection encrypts your data encryption key (DEK) with a second key encryption key (KEK). Referencing an encrypted keyset in BigQuery instead of a plaintext keyset reduces the risk of key exposure. The KEK is a symmetric encryption keyset that is stored securely in Cloud KMS and managed using Identity and Access Management (IAM) roles and permissions.

At query execution time, you provide the Cloud KMS resource path of the KEK and the ciphertext from the wrapped DEK. BigQuery calls Cloud KMS to unwrap the DEK, and then uses that key to decrypt the data in your query. The unwrapped version of the DEK is only stored in memory for the duration of the query, and then destroyed.

Use cases

Use cases for encryption with Cloud KMS keys include the following:

  • Externally encrypted data that needs to be stored in BigQuery without storing the keyset in plaintext. Your data can then be exported from the table or decrypted with a SQL query.
  • "Double access control" on encrypted data in BigQuery. A user must be granted permission to both the table and the encryption key to read data in clear text.
User Permission Matrix
Permission on Table No Permission on Table
Permissions on Key Read and decrypt encrypted data. No access.
No Permissions on Key Read encrypted data. No access.

If a user has permission to access the KMS key and has access to the wrapped keyset, SQL functions can unwrap the keyset and decrypt the ciphertext. Users can also use the Cloud KMS REST API or CLI to unwrap the keyset.
The following query sample uses KMS SQL functions to decrypt cipher text

SELECT
  AEAD.DECRYPT_STRING(KEYS.KEYSET_CHAIN(
    @kms_resource_name,
    @first_level_keyset),
    ciphertext,
    additional_data)
FROM
  ciphertext_table
WHERE
  ...

Use-case example

Assume an implementation where zip codes are considered sensitive information. Zip code data can be inserted into the BigQuery table using the AEAD encrypt function, thereby encrypting the Zipcode column. In this example, we use the AEAD.ENCRYPT function with the wrapped keyset management function. The KEYS.KEYSET_CHAIN function encrypts the digital encryption key with the KEK, and the AEAD.ENCRYPT function passes the information to KMS.

The keyset chain for encryption and decryption ensures that the data encryption key (DEK) is encrypted or wrapped with a KEK and passed with that KEK. The wrapped DEK is decrypted or unwrapped within the SQL function, and then used to encrypt or decrypt data.

The AEAD decrypt function can decrypt data when it is accessed by using the function in the query that is being run on the table.

image

Use column-level encryption

A keyset chain for encryption and decryption ensures that the data encryption key (DEK) is encrypted or wrapped with a KEK and passed with that KEK. The wrapped DEK is decrypted or unwrapped within the SQL function, and then used to encrypt or decrypt data.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Required permissions

Encryption with Cloud KMS keys requires the following roles and permissions:

  • Administrators must have the cloudkms.cryptoKeyEncrypter role to create keyset chains.
  • Users must have the cloudkms.cryptoKeyDecrypterViaDelegation role to use SQL encryption functions. For example:

    gcloud projects
    add-iam-policy-binding ${PROJECT} --member=user:someone@foo.com --role=roles/cloudkms.cryptoKeyDecrypterViaDelegation
    

Function syntax

The syntax of the available functions include the following:

  • AEAD.ENCRYPT(KEYS.KEYSET_CHAIN(@kms_resource_name, @first_level_keyset), plaintext, additional_data) - See AEAD.ENCRYPT_STRING function syntax
  • AEAD.DECRYPT_STRING(KEYS.KEYSET_CHAIN(@kms_resource_name, @first_level_keyset), ciphertext, additional_data) - See AEAD.DECRYPT_STRING function syntax
  • AEAD.DECRYPT_BYTES(KEYS.KEYSET_CHAIN(@kms_resource_name, @first_level_keyset), ciphertext, additional_data) - See AEAD.DECRYPT_BYTES function syntax

Where:

Name Description
kms_resource_name The function accepts kms_resource_name as a string parameter in the Tink format:
gcp-kms://projects/{ID}/locations/{ID}/...
first_level_keyset The first-level key set is the data encryption keyset wrapped by the KMS key. It is enforced as a constant and is in BYTES format.

Example

The following steps create an encrypted table and encrypt and decrypt columns using SQL functions. This example assumes that the table is encrypted.

  1. Create an encrypted table in a dataset.

    CREATE TABLE IF NOT EXISTS DATASET_NAME.TABLE_NAME(
    plaintext STRING,
    encrypted_data BYTES);
    

  2. Create Cloud KMS keys for an encrypted column with software protection level or hardware security model (HSM) protection level.

    projects/PROJECT_ID/locations/LOCATION_NAME/keyRings/KEY_RING_ID/cryptoKeys/KEY_ID
    
  3. Grant user permissions for the Cloud KMS keys. Administrators must have the cloudkms.cryptoKeyEncrypter role in order to create keyset chains in the next step. Users must have the cloudkms.cryptoKeyDecrypterViaDelegation role to use SQL encryption functions.

    $ gcloud projects add-iam-policy-binding  PROJECT_ID --member=user:user_name@example.com --role=roles/cloudkms.cryptoKeyDecrypterViaDelegation
    
  4. Create a raw keyset. This generates a keyset of 256 bits.

    $ bq --project_id=PROJECT_ID query --use_legacy_sql=false
    "SELECT KEYS.NEW_KEYSET('AEAD_AES_GCM_256') AS raw_keyset"
    
  5. Create and format a wrapped keyset (i.e. encrypt a raw keyset with a KMS key) with an encryption utility. The following sample runs on a terminal running on a macOS or Linux computer. Windows users should generate a keyset using a third-party utility.

    $ echo "CPPpiNMFEmQKWAowdHlwZS5nb29…" |base64 --decode > /tmp/decoded_key
    $ gcloud kms encrypt --plaintext-file=/tmp/decoded_key --key=projects/PROJECT_ID/locations/LOCATION_NAME/keyRings/KEY_RING_ID/cryptoKeys/KEY_ID --ciphertext-file=/tmp/bankaccounts_wrapped
    $ od -An --format=o1 /tmp/bankaccounts_wrapped |tr -d '\n'|tr ' ' '\'
    
  6. Encrypt a column with a given keyset chain using SQL encrypt functions.

    DECLARE KMS_RESOURCE_NAME STRING;
    DECLARE FIRST_LEVEL_KEYSET BYTES;
    SET KMS_RESOURCE_NAME =  "gcp-kms://projects/PROJECT_ID/locations/LOCATION_NAME/keyRings/KEY_RING_ID/cryptoKeys/KEY_ID";
    # Set the first level keyset equal to the output from the previous step
    SET FIRST_LEVEL_KEYSET = b"\012\044\000\031\261\001\270\114\176\037\055\330\......"
    INSERT INTO DATASET_NAME.TABLE_NAME
    SELECT "example_plaintext", AEAD.ENCRYPT(KEYS.KEYSET_CHAIN(@KMS_RESOURCE_NAME, @FIRST_LEVEL_KEYSET), "example_plaintext", "additional_data")
    

Now that the column is encrypted, we can decrypt the column with the same keyset chain using SQL decrypt functions.

   DECLARE KMS_RESOURCE_NAME STRING;
   DECLARE FIRST_LEVEL_KEYSET BYTES;
   SET KMS_RESOURCE_NAME =  "gcp-kms://projects/PROJECT_ID/locations/LOCATION_NAME/keyRings/KEY_RING_ID/cryptoKeys/KEY_ID";
   SET FIRST_LEVEL_KEYSET = b"\012\044\000\031\261\001\270\114\176\037\055\330......"
   SELECT AEAD.DECRYPT_STRING(KEYS.KEYSET_CHAIN(@KMS_RESOURCE_NAME, @FIRST_LEVEL_KEYSET), encrypted_data, "additional_data"),
   CAST(AEAD.DECRYPT_BYTES(KEYS.KEYSET_CHAIN(KMS_RESOURCE_NAME, FIRST_LEVEL_KEYSET), encrypted_data, CAST("additional_data" AS BYTES)) AS STRING)
   FROM DATASET_NAME.TABLE_NAME;
   

Limitations

Encryption with Cloud KMS has the following limitations and restrictions:

Locations

The Cloud KMS keys are restricted to the same region or multi-region as the query. Using global Cloud KMS keys is disallowed for reliability reasons.

Key rotation

It is not possible to rotate a wrapped keyset using the KEYS.ROTATE_KEYSET function at this time.

Security and privacy concerns

Today, the constant parameters in a BigQuery query are visible to users in the diagnostic query plan. This can affect the kms_resource_name and first_level_keyset parameters of the KEYSET_CHAIN function. Keys are never exposed in plain text, and permission to the Cloud KMS key is required to decrypt the wrapped keyset. This ensures that keys are not exposed through the diagnostic query plan unless the user has permission to decrypt the keyset.

Interoperability with other security features

Column-level encryption has the following limitations when used with type-based security classifications:

  • Column-level security: Using encrypt and decrypt functions with column-level security is complementary. Users must have access to the column in order to encrypt or decrypt data in that column.
  • Row-level Security: Using encrypt and decrypt functions with row-level security is complementary. Users can only access the rows that they are allowed to see to decrypt the data.

Performance

Column-level SQL functions have no significant impact on performance when compared to the performance of raw encryption functions where the key data is sent in plain text.