You can copy an existing Dataproc on GKE virtual cluster's configuration, update the copied configuration, and then create a new Dataproc on GKE cluster using the updated configuration.
Steps to recreate and update a Dataproc on GKE cluster
gcloud
1. Set environment variables:
CLUSTER=existing Dataproc on GKE cluster name \ REGION=region
Export the existing Dataproc on GKE cluster configuration to a YAML file.
gcloud dataproc clusters export $CLUSTER \ --region=$REGION > "${CLUSTER}-config.yaml"
Update the configuration.
Remove the
kubernetesNamespace
field. Removing this field is necessary to avoid a namespace conflict when you create the updated cluster.Sample
sed
command to remove thekubernetesNamespace
field:sed -E "s/kubernetesNamespace: .+$//g" ${CLUSTER}-config.yaml
Make additional changes to update Dataproc on GKE virtual cluster configuration settings, such as changing the Spark componentVersion.
Delete the existing Dataproc on GKE virtual cluster if you will create a cluster that has the same name as the cluster it is updating (if you are replacing the original cluster).
Wait for the previous delete operation to finish, and then import the updated cluster configuration to create a new Dataproc on GKE virtual cluster with the updated config settings.
gcloud dataproc clusters import $CLUSTER \ --region=$REGION \ --source="${CLUSTER}-config.yaml"
API
1. Set environment variables:
CLUSTER=existing Dataproc on GKE cluster name \ REGION=region
Export the existing Dataproc on GKE cluster configuration to a YAML file.
curl -X GET -H "Authorization: Bearer $(gcloud auth print-access-token)" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters/${CLUSTER}?alt=json" > "${CLUSTER}-config.json"
Remove the
kubernetesNamespace
field. Removal of this field is necessary to avoid a namespace conflict when you create the updated cluster.Sample
jq
command to removekubernetesNamespace
field:jq 'del(.virtualClusterConfig.kubernetesClusterConfig.kubernetesNamespace)'
- Make additional changes to update Dataproc on GKE virtual cluster configuration settings, such as changing the Spark componentVersion.
Delete the existing Dataproc on GKE virtual cluster if you will create a cluster that has the same name as the cluster it is updating (if you are replacing the original cluster).
curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters/${CLUSTER}"
Wait for the previous delete operation to finish, and then import the updated cluster configuration to create a new Dataproc on GKE virtual cluster with the updated settings.
curl -i -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json; charset=utf-8" -d "@${CLUSTER}-config.json" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters?alt=json"
Console
The Google Cloud console does not support recreating a Dataproc on GKE virtual cluster by importing an existing cluster's configuration.