Use customer-managed encryption keys

By default, Google Cloud automatically encrypts data when it is at rest using encryption keys managed by Google. If you have specific compliance or regulatory requirements related to the keys that protect your data, you can use customer-managed encryption keys (CMEK) for Dataform repositories.

This guide describes using CMEK for Dataform and walks you through how to enable CMEK encryption of Dataform repositories.

For more information about CMEK in general, including when and why to enable it, see the CMEK documentation.

CMEK encryption of repository data

When you apply CMEK encryption to a Dataform repository, all Dataform-managed customer data in that repository is encrypted at rest using the CMEK protection key set for the repository. This data includes the following:

  • Git repository content of the Dataform repository and its workspaces
  • Compiled SQL queries and compilation errors
  • Stored SQL queries of workflow actions
  • Error details of executed workflow actions

Dataform uses CMEK protection keys in the following scenarios:

  • During every operation that requires decryption of customer data stored at rest. These operations include, but are not limited to the following:
  • During every operation that requires storing customer data at rest. These operations include, but are not limited to, the following:

Dataform manages the encryption of customer data associated only with Dataform resources. Dataform does not manage encryption of customer data that is created in BigQuery through execution of Dataform workflows. To encrypt data created and stored in BigQuery, configure CMEK for BigQuery.

Supported keys

Dataform supports the following types of CMEK keys:

Key availability varies by key type and region. For more information about the geographical availability of CMEK keys, see Cloud KMS locations.

Restrictions

Dataform supports CMEK with the following restrictions:

  • You can't apply a CMEK protection key to a repository after the repository has been created. You can apply CMEK encryption during repository creation only.
  • You can't remove a CMEK protection key from a repository.
  • You can't change a CMEK protection key for a repository.
  • CMEK organization policies are not available.
  • Using Cloud HSM keys is subject to availability. For more information about availability of keys across locations, see Cloud KMS locations.

Cloud KMS quotas and Dataform

You can use Cloud HSM keys with Dataform. When you use CMEK in Dataform, your projects can consume Cloud KMS cryptographic requests quotas. For example, CMEK-encrypted Dataform repositories can consume these quotas for each change to repository contents. Encryption and decryption operations using CMEK keys affect Cloud KMS quotas only if you use hardware (Cloud HSM) or external (Cloud EKM) keys. For more information, see Cloud KMS quotas.

Managing keys

Use Cloud KMS for all key-management operations. Dataform cannot detect or act upon any key changes until they are propagated by Cloud KMS. Some operations, such as disabling or destroying a key, can take up to three hours to propagate. Changes to permissions usually propagate much faster.

After the repository is created, Dataform calls Cloud KMS to make sure that the key is still valid during each operation on encrypted repository data.

If Dataform detects that your Cloud KMS key has been disabled or destroyed, all data stored in the corresponding repository becomes inaccessible.

If calls by Dataform to Cloud KMS detect that a formerly disabled key has been re-enabled, Dataform restores access automatically.

How an unavailable key status is handled

In rare scenarios, such as during periods when Cloud KMS is unavailable, Dataform might be unable to retrieve the status of your key from Cloud KMS.

If your Dataform repository is protected by a key that is enabled at the time at which Dataform is unable to communicate with Cloud KMS, the encrypted repository data becomes inaccessible.

The encrypted repository data remains inaccessible until Dataform can reconnect with Cloud KMS and Cloud KMS responds that the key is active.

Conversely, if your Dataform repository is protected by a key that is disabled at the time at which Dataform is first unable to communicate with Cloud KMS, the encrypted repository data remains inaccessible until it is able to reconnect to Cloud KMS and you have re-enabled your key.

Logging

You can audit the requests that Dataform sends to Cloud KMS on your behalf in Cloud Logging, if you have enabled audit logging for the Cloud KMS API in your project. These Cloud KMS log entries are visible in Cloud Logging. For more information, see View logs.

Before you begin

  • Decide whether you are going to run Dataform and Cloud KMS in different projects, or in the same project. We recommend using separate projects for greater control over permissions. For information about Google Cloud project IDs and project numbers, see Identifying projects.

  • For the Google Cloud project that runs Cloud KMS:

    1. Enable the Cloud Key Management Service API.
    2. Create a key ring and a key as described in Creating key rings and keys. Create the key ring in a location that matches the location of your Dataform repository:
      • Repositories must use matching regional keys. For example, a repository in region asia-northeast3 must be protected with a key from a key ring located in asia-northeast3.
      • The global region can't be used with Dataform.
      For more information about the supported locations for Dataform and Cloud KMS, see Cloud locations.

Enable CMEK

Dataform can access the key on your behalf after you grant the Cloud KMS CryptoKey Encrypter/Decrypter (roles/cloudkms.cryptoKeyEncrypterDecrypter) role to the default Dataform service account.

Your default Dataform service account ID is in the following format:

service-YOUR_PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com

To grant the CryptoKey Encrypter/Decrypter role to the default Dataform service account, follow these steps:

Console

  1. Open the Key Management page in the Google Cloud console.

    Open the Key Management page

  2. Click the name of the key ring that contains the key.

  3. Click the checkbox for the encryption key to which you want to add the role. The Permissions tab opens.

  4. Click Add member.

  5. Enter the email address of the service account

    • If the service account is already on the members list, it has existing roles. Click the current role drop-down list for the service account.
  6. Click the drop-down list for Select a role, click Cloud KMS, and then click the Cloud KMS CryptoKey Encrypter/Decrypter role.

  7. Click Save to apply the role to the service account.

gcloud

You can use the Google Cloud CLI to assign the role:

gcloud kms keys add-iam-policy-binding \
    --project=KMS_PROJECT_ID \
    --member serviceAccount:SERVICE_ACCOUNT \
    --role roles/cloudkms.cryptoKeyEncrypterDecrypter \
    --location=KMS_KEY_LOCATION \
    --keyring=KMS_KEY_RING \
    KMS_KEY

Replace the following:

  • KMS_PROJECT_ID: the ID of your Google Cloud project that is running Cloud KMS
  • SERVICE_ACCOUNT: the email address of your default Dataform service account
  • KMS_KEY_LOCATION: the location name of your Cloud KMS key
  • KMS_KEY_RING: the key ring name of your Cloud KMS key
  • KMS_KEY: the key name of your Cloud KMS key

Apply CMEK to a repository

You can apply CMEK protection to a Dataform repository during repository creation.

To apply CMEK encryption to a Dataform repository, specify a Cloud KMS key when you create the repository. For instructions, see Create a repository.

You can't change the encryption mechanism of a Dataform repository after the repository is created.

For more information, see Restrictions.

What's next