Recreate and update a cluster

Stay organized with collections Save and categorize content based on your preferences.

Steps to recreate and update a cluster

You can use the gcloud command-line tool or the Dataproc API to copy configuration from an existing cluster, update the copied configuration, and then create a new cluster using the updated configuration.

gcloud command

  1. Set variables.
    export PROJECT=project-id
    export REGION=region
    export CLUSTER=cluster-name
    export NEW_IMAGE_VERSION=image-version (for example, '2.0-debian10')
    
  2. Export the cluster configuration to a YAML file.
    gcloud dataproc clusters export $CLUSTER \
        --project=$PROJECT \
        --region=$REGION > "${CLUSTER}-config.yaml"
  3. Update the configuration. The following example uses `sed` to update the image version.
    sed -E "s|(^[[:blank:]]+)imageVersion: .+|\1imageVersion: ${NEW_IMAGE_VERSION}|g" "${CLUSTER}-config.yaml" | sed -E '/^[[:blank:]]+imageUri: /d' > "${CLUSTER}-config-updated.yaml"
  4. Delete the existing cluster. IMPORTANT: This step deletes all data stored in HDFS and on local disk in your cluster.
    gcloud dataproc clusters delete $CLUSTER \
        --project=$PROJECT \
        --region=$REGION
  5. Import the updated cluster configuration to create a new cluster with the previous cluster's name and the updated settings.
    gcloud dataproc clusters import $CLUSTER \
        --project=$PROJECT \
        --region=$REGION \
        --source="${CLUSTER}-config-updated.yaml"

REST API

  1. Set variables.
    export CLUSTER=cluster-name
    export REGION=region
    export NEW_IMAGE_VERSION=image-version (for example, '2.0-debian10')
    
  2. Export the cluster configuration to a JSON file.
    curl -X GET -H "Authorization: Bearer $(gcloud auth print-access-token)"  "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters/${CLUSTER}?alt=json" > "${CLUSTER}-config.json"
  3. Update the configuration. The following example uses `jq` to update the image version.
    jq ".config.softwareConfig.imageVersion=\"${NEW_IMAGE_VERSION}\" | del(.config.workerConfig.imageUri) | del(.config.masterConfig.imageUri)" "${CLUSTER}-config.json" > "${CLUSTER}-config-updated.json"
  4. Delete the existing cluster. IMPORTANT: This step deletes all data stored in HDFS and on local disk in your cluster.
    curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters/${CLUSTER}"
    
  5. Wait for the previous operation returned by the call to finish, and then import the updated cluster configuration to create a new cluster with the previous cluster's name and the updated settings.
    curl -i -X POST  -H "Authorization: Bearer $(gcloud auth print-access-token)"  -H "Content-Type: application/json; charset=utf-8" -d "@${CLUSTER}-config-updated.json" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters?alt=json"

Console

The console does not support recreating a cluster by importing a cluster's configuration.