Steps to recreate and update a cluster
You can use the gcloud
command-line tool or the Dataproc API
to copy configuration from an existing cluster, update the copied configuration,
and then create a new cluster with the updated configuration.
gcloud CLI
The example instructions show updating the image version setting in a cluster configuration. You can change the example to update different cluster configuration settings.
- Set variables.
export PROJECT=project-id export REGION=region export OLD_CLUSTER=old-cluster-name export NEW_CLUSTER=new-cluster-name export NEW_IMAGE_VERSION=image-version (for example, '2.2-debian12')
- Export the existing (old) cluster configuration to a YAML file.
gcloud dataproc clusters export $OLD_CLUSTER \ --project=$PROJECT \ --region=$REGION > "${OLD_CLUSTER}-config.yaml"
- Update the configuration. The following example uses
sed
to update the image version.sed -E "s|(^[[:blank:]]+)imageVersion: .+|\1imageVersion: ${NEW_IMAGE_VERSION}|g" "${OLD_CLUSTER}-config.yaml" | sed -E '/^[[:blank:]]+imageUri: /d' > "${NEW_CLUSTER}-config-updated.yaml"
- Create a new cluster with a new name and the updated configuration.
gcloud dataproc clusters import $NEW_CLUSTER \ --project=$PROJECT \ --region=$REGION \ --source="${NEW_CLUSTER}-config-updated.yaml"
- After confirming your workloads run in the new cluster without issues,
delete the existing (old) cluster. IMPORTANT: This step deletes all
data stored in HDFS and on local disk in your cluster.
gcloud dataproc clusters delete $OLD_CLUSTER \ --project=$PROJECT \ --region=$REGION
REST API
The example instructions show updating the cluster name and the image version settings in a cluster configuration. You can change the example variables to update different cluster configuration settings.
- Set variables.
export PROJECT=project-id export REGION=region export OLD_CLUSTER=old-cluster-name export NEW_CLUSTER=new-cluster-name export NEW_IMAGE_VERSION=image-version (for example, '2.2-debian12')
- Export the existing (old) cluster configuration to a JSON file.
curl -X GET -H "Authorization: Bearer $(gcloud auth print-access-token)" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters/${OLD_CLUSTER}?alt=json" > "${OLD_CLUSTER}-config.json"
- Update the configuration. The following example uses
jq
to update the cluster name and the image version.jq ".clusterName = \"${NEW_CLUSTER}\" | .config.softwareConfig.imageVersion=\"${NEW_IMAGE_VERSION}\" | del(.config.workerConfig.imageUri) | del(.config.masterConfig.imageUri)" "${OLD_CLUSTER}-config.json" > "${NEW_CLUSTER}-config-updated.json"
- Import the updated cluster configuration to create a new cluster with the updated configuration.
curl -i -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json; charset=utf-8" -d "@${NEW_CLUSTER}-config-updated.json" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters?alt=json"
- After confirming your workloads run in the new cluster without issues, delete the existing (old) cluster. IMPORTANT: This step deletes all data stored in HDFS and on local disk in your cluster.
curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters/${OLD_CLUSTER}"
Console
The console does not support recreating a cluster by importing a cluster configuration.