When you use Cloud Dataproc, cluster and job data is stored on Persistent Disks (PDs) associated with the Compute Engine VMs in your cluster and in a Cloud Storage bucket. This PD and bucket data is encrypted using a Google-generated data encryption key (DEK) and key encryption key (KEK). The CMEK feature allows you to create, use, and revoke the key encryption key (KEK). Google still controls the data encryption key (DEK). For more information on Google data encryption keys, see Encryption at Rest in Google Cloud Platform→Key Management.
You can use CMEK to encrypt data on the PDs associated with the VMs in your Cloud Dataproc cluster and/or the cluster and job data in the Cloud Storage bucket used by Cloud Dataproc. Follow Steps 1 and 2, then follow Steps 3, 4, or 5 to use CMEK with your cluster's PDs, Cloud Storage bucket, or both, respectively.
- Create a key using the Cloud Key Management Service (Cloud KMS).
Copy the resource name, which you can use in the next steps. The resource
name is constructed as follows:
- To enable the Compute Engine and Cloud Storage service accounts to use
- Follow Item #5 in
Compute Engine→Protecting Resources with Cloud KMS Keys→Before you begin
to assign the Cloud KMS
CryptoKey Encrypter/Decrypterrole to the Compute Engine service account.
- Assign the Cloud KMS
CryptoKey Encrypter/Decrypterrole to the Cloud Storage service account.
- Follow Item #5 in Compute Engine→Protecting Resources with Cloud KMS Keys→Before you begin to assign the Cloud KMS
- You can use the
gcloudcommand-line tool to set the key you created in Step 1 on the PDs associated with the VMs in the Cloud Dataproc cluster.
Pass the Cloud KMS resource ID obtained in Step 1 to the
--gce-pd-kms-keyflag when you create the cluster with the gcloud beta dataproc clusters create command.
gcloud beta dataproc clusters create my-cluster-name \ --gce-pd-kms-key='projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name' \ other args ...
- To use CMEK on the Cloud Storage bucket used by Cloud Dataproc to read/write cluster and job data,
create a bucket with CMEK. Note: Use
the key created in Step 1 when enabling the key on the bucket. Then, pass the bucket name
to the gcloud beta dataproc clusters create
command when you create the cluster.
gcloud beta dataproc clusters create my-cluster \ --bucket name-of-CMEK-bucket \ other args
- To use CMEK on the PDs in your cluster and the Cloud Storage bucket
used by Cloud Dataproc, pass both the
--bucketflags to the
gcloud beta dataproc clusters createcommand as explained in Steps 3 and 4. You can create and use a separate key for PD data and bucket data.