Customer Managed Encryption Keys (CMEK)

When you use Cloud Dataproc, cluster and job data is stored on Persistent Disks (PDs) associated with the Compute Engine VMs in your cluster and in a Cloud Storage bucket. This PD and bucket data is encrypted using a Google-generated data encryption key (DEK) and key encryption key (KEK). The CMEK feature allows you to create, use, and revoke the key encryption key (KEK). Google still controls the data encryption key (DEK). For more information on Google data encryption keys, see Encryption at Rest in Google Cloud Platform→Key Management.

Using CMEK

You can use CMEK to encrypt data on the PDs associated with the VMs in your Cloud Dataproc cluster and/or the cluster and job data in the Cloud Storage bucket used by Cloud Dataproc. Follow Steps 1 and 2, then follow Steps 3, 4, or 5 to use CMEK with your cluster's PDs, Cloud Storage bucket, or both, respectively.

  1. Create a key using the Cloud Key Management Service (Cloud KMS). Copy the resource name, which you can use in the next steps. The resource name is constructed as follows:
    projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name
    
  2. To enable the Compute Engine and Cloud Storage service accounts to use your key:
    1. Follow Item #5 in Compute Engine→Protecting Resources with Cloud KMS Keys→Before you begin to assign the Cloud KMS CryptoKey Encrypter/Decrypter role to the Compute Engine service account.
    2. Assign the Cloud KMS CryptoKey Encrypter/Decrypter role to the Cloud Storage service account.
  3. You can use the gcloud command-line tool to set the key you created in Step 1 on the PDs associated with the VMs in the Cloud Dataproc cluster.

    gcloud Command

    Pass the Cloud KMS resource ID obtained in Step 1 to the --gce-pd-kms-key flag when you create the cluster with the gcloud beta dataproc clusters create command.

    Example:

    gcloud beta dataproc clusters create my-cluster-name \
      --gce-pd-kms-key='projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name' \
      other args ...
    
  4. To use CMEK on the Cloud Storage bucket used by Cloud Dataproc to read/write cluster and job data, create a bucket with CMEK. Note: Use the key created in Step 1 when enabling the key on the bucket. Then, pass the bucket name to the gcloud beta dataproc clusters create command when you create the cluster.
    Example:
    gcloud beta dataproc clusters create my-cluster \
      --bucket name-of-CMEK-bucket \
      other args
    
  5. To use CMEK on the PDs in your cluster and the Cloud Storage bucket used by Cloud Dataproc, pass both the --gce-pd-kms-key and the --bucket flags to the gcloud beta dataproc clusters create command as explained in Steps 3 and 4. You can create and use a separate key for PD data and bucket data.
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation