By default, Google Cloud automatically encrypts data when it is at rest using encryption keys managed by Google. If you have specific compliance or regulatory requirements related to the keys that protect your data, you can use customer-managed encryption keys (CMEK) for Dataform repositories.
This guide describes using CMEK for Dataform and walks you through how to enable CMEK encryption of Dataform repositories.
For more information about CMEK in general, including when and why to enable it, see the CMEK documentation.
CMEK encryption of repository data
When you apply CMEK encryption to a Dataform repository, all Dataform-managed customer data in that repository is encrypted at rest using the CMEK protection key set for the repository. This data includes the following:
- Git repository content of the Dataform repository and its workspaces
- Compiled SQL queries and compilation errors
- Stored SQL queries of workflow actions
- Error details of executed workflow actions
Dataform uses CMEK protection keys in the following scenarios:
- During every operation that requires decryption of customer data stored at rest.
These operations include, but are not limited to the following:
- Responses to a user query—for example,
compilationResults.query
. - Creation of Dataform resources that require previously created encrypted repository data—for example, workflow invocations.
- Git operations to update the remote repository, for example, pushing a Git commit.
- Responses to a user query—for example,
- During every operation that requires storing customer data at rest.
These operations include, but are not limited to, the following:
- Responses to a user query—for example,
compilationResults.create
. - Git operations to a workspace—for example, pulling a Git commit.
- Responses to a user query—for example,
Dataform manages the encryption of customer data associated only with Dataform resources. Dataform does not manage encryption of customer data that is created in BigQuery through execution of Dataform workflows. To encrypt data created and stored in BigQuery, configure CMEK for BigQuery.
Supported keys
Dataform supports the following types of CMEK keys:
- Cloud KMS software keys
- Cloud Hardware Security Module (HSM) keys
- Cloud External Key Manager (Cloud EKM) keys
Key availability varies by key type and region. For more information about the geographical availability of CMEK keys, see Cloud KMS locations.
Restrictions
Dataform supports CMEK with the following restrictions:
- The maximum size of a CMEK-encrypted repository is 512 MB.
- The maximum size of a workspace in a CMEK-encrypted repository is 512 MB.
- You can't apply a CMEK protection key to a repository after the repository has been created. You can apply CMEK encryption during repository creation only.
- You can't remove a CMEK protection key from a repository.
- You can't change a CMEK protection key for a repository.
- If you set a default Dataform CMEK key for your Google Cloud project, all new repositories created in the Google Cloud project location must be encrypted with CMEK. When you create a new repository in the Google Cloud project location, you can apply the default Dataform CMEK key or a different CMEK key, but you cannot apply default encryption at rest.
- If you change the value of a default Dataform CMEK key, the previous value applies to pre-existing repositories, and the updated value applies to repositories created after the change.
- You can set only one default Dataform CMEK key per location of Google Cloud project repositories.
- CMEK organization policies are not available.
- Using Cloud HSM and Cloud EKM keys is subject to availability. For more information about availability of keys across locations, see Cloud KMS locations.
Cloud KMS quotas and Dataform
You can use Cloud HSM and Cloud EKM keys with Dataform. When you use CMEK in Dataform, your projects can consume Cloud KMS cryptographic requests quotas. For example, CMEK-encrypted Dataform repositories can consume these quotas for each change to repository contents. Encryption and decryption operations using CMEK keys affect Cloud KMS quotas only if you use hardware (Cloud HSM) or external (Cloud EKM) keys. For more information, see Cloud KMS quotas.
Managing keys
Use Cloud KMS for all key-management operations. Dataform cannot detect or act upon any key changes until they are propagated by Cloud KMS. Some operations, such as disabling or destroying a key, can take up to three hours to propagate. Changes to permissions usually propagate much faster.
After the repository is created, Dataform calls Cloud KMS to make sure that the key is still valid during each operation on encrypted repository data.
If Dataform detects that your Cloud KMS key has been disabled or destroyed, all data stored in the corresponding repository becomes inaccessible.
If calls by Dataform to Cloud KMS detect that a formerly disabled key has been re-enabled, Dataform restores access automatically.
Use external keys with Cloud EKM
As an alternative to using keys that reside on Cloud KMS, you can use keys that reside with a supported external key management partner. To do this, use Cloud External Key Manager (Cloud EKM) to create and manage external keys, which are pointers to keys that reside outside of Google Cloud. For more information, see Cloud External Key Manager.
After you create an external key with Cloud EKM, you can apply it to a new Dataform repository by providing the ID of that key when creating the repository. This procedure is the same as applying a Cloud KMS key to a new repository.
Use Dataform default CMEK keys
To encrypt multiple Dataform repositories with the same CMEK key, you can set a default Dataform CMEK key for your Google Cloud project. You must specify the location of the Google Cloud project for the default Dataform CMEK key. You can set only one default CMEK key per Google Cloud project.
After you set a default Dataform CMEK key, Dataform applies the key to all new repositories created in the Google Cloud project location by default. When you create a repository, you can use the default key, or select a different CMEK key.
How an unavailable key status is handled
In rare scenarios, such as during periods when Cloud KMS is unavailable, Dataform might be unable to retrieve the status of your key from Cloud KMS.
If your Dataform repository is protected by a key that is enabled at the time at which Dataform is unable to communicate with Cloud KMS, the encrypted repository data becomes inaccessible.
The encrypted repository data remains inaccessible until Dataform can reconnect with Cloud KMS and Cloud KMS responds that the key is active.
Conversely, if your Dataform repository is protected by a key that is disabled at the time at which Dataform is first unable to communicate with Cloud KMS, the encrypted repository data remains inaccessible until it is able to reconnect to Cloud KMS and you have re-enabled your key.
Logging
You can audit the requests that Dataform sends to Cloud KMS on your behalf in Cloud Logging, if you have enabled audit logging for the Cloud KMS API in your project. These Cloud KMS log entries are visible in Cloud Logging. For more information, see View logs.
Before you begin
Decide whether you are going to run Dataform and Cloud KMS in different projects, or in the same project. We recommend using separate projects for greater control over permissions. For information about Google Cloud project IDs and project numbers, see Identifying projects.
For the Google Cloud project that runs Cloud KMS:
- Enable the Cloud Key Management Service API.
- Create a key ring and a key as described in
Creating key rings and keys. Create the key ring
in a location that matches the location of your Dataform
repository:
-
Repositories must use matching regional keys. For example,
a repository in region
asia-northeast3
must be protected with a key from a key ring located inasia-northeast3
. -
The
global
region can't be used with Dataform.
-
Repositories must use matching regional keys. For example,
a repository in region
Enable CMEK
Dataform can access the key on your behalf after
you grant the Cloud KMS
CryptoKey Encrypter/Decrypter (roles/cloudkms.cryptoKeyEncrypterDecrypter
)
role to the
default Dataform service account.
Your default Dataform service account ID is in the following format:
service-YOUR_PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com
To grant the CryptoKey Encrypter/Decrypter role to the default Dataform service account, follow these steps:
Console
Open the Key Management page in the Google Cloud console.
Click the name of the key ring that contains the key.
Click the checkbox for the encryption key to which you want to add the role. The Permissions tab opens.
Click Add member.
Enter the email address of the service account
- If the service account is already on the members list, it has existing roles. Click the current role drop-down list for the service account.
Click the drop-down list for Select a role, click Cloud KMS, and then click the Cloud KMS CryptoKey Encrypter/Decrypter role.
Click Save to apply the role to the service account.
gcloud
You can use the Google Cloud CLI to assign the role:
gcloud kms keys add-iam-policy-binding \ --project=KMS_PROJECT_ID \ --member serviceAccount:SERVICE_ACCOUNT \ --role roles/cloudkms.cryptoKeyEncrypterDecrypter \ --location=KMS_KEY_LOCATION \ --keyring=KMS_KEY_RING \ KMS_KEY
Replace the following:
KMS_PROJECT_ID
: the ID of your Google Cloud project that is running Cloud KMSSERVICE_ACCOUNT
: the email address of your default Dataform service accountKMS_KEY_LOCATION
: the location name of your Cloud KMS keyKMS_KEY_RING
: the key ring name of your Cloud KMS keyKMS_KEY
: the key name of your Cloud KMS key
Set a default Dataform CMEK key
Setting a default Dataform CMEK key for your Google Cloud project lets you encrypt multiple repositories with the same CMEK key. For more information, see Use a default key for Dataform repositories.
To set or edit a default CMEK key, call the Dataform API in the following request:
curl -X PATCH \
-H "Content-Type: application/json" \
-d '{"defaultKmsKeyName":"projects/PROJECT_ID/locations/PROJECT_LOCATION/keyRings/KMS_KEY_RING/cryptoKeys/KMS_KEY"}' \
https://dataform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/PROJECT_LOCATION/config
Replace the following:
- KMS_KEY_RING: the key ring name of your Cloud KMS key.
- KMS_KEY: the name of your Cloud KMS key.
- PROJECT_ID: the ID of your Google Cloud project.
- PROJECT_LOCATION: the location name of your Google Cloud project.
Remove a default Dataform CMEK key
To remove a default Dataform CMEK key from your Google Cloud project, call the Dataform API in the following request:
curl -X PATCH \
-H "Content-Type: application/json" \
-d '{"defaultKmsKeyName":""}' \
https://dataform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/PROJECT_LOCATION/config
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project.
- PROJECT_LOCATION: the location name of your Google Cloud project where you want to unset default CMEK.
Check if a default Dataform CMEK key is set
To check if a default Dataform CMEK key is set for your, Google Cloud project, call the Dataform API in the following request:
curl -X GET \
-H "Content-Type: application/json" \
https://dataform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/PROJECT_LOCATION/config
Replace the following:
- PROJECT_ID: the ID of your Google Cloud project.
- PROJECT_LOCATION: the location name of your Google Cloud project.
Apply CMEK to a repository
You can apply CMEK protection to a Dataform repository during repository creation.
To apply CMEK encryption to a Dataform repository, select encryption with the default Dataform CMEK key or specify a unique Cloud KMS key when you create the repository. For instructions, see Create a repository.
You can't change the encryption mechanism of a Dataform repository after the repository is created.
For more information, see Restrictions.
What's next
- To learn more about CMEK, see CMEK overview.
- To learn more about Cloud KMS quotas, see Cloud KMS Quotas.
- To learn more about Cloud KMS pricing, see Cloud KMS Pricing.
- To learn more about Dataform repositories, see Introduction to repositories.