Using cluster selectors with workflows

As an alternative to running a workflow on a managed cluster, you can use a cluster selector to choose an existing cluster for your workflow. At the conclusion of the workflow, the selected cluster is not deleted.

Selectors specify one or more Cloud Dataproc user labels. Clusters in same region as the workflow whose labels match all of the selector labels are eligible to run workflow jobs. If multiple clusters match the selector, Cloud Dataproc will choose the cluster with the most free YARN memory.

Adding a cluster selector to a template

You can add a cluster selector to a workflow template using the gcloud command-line tool or the Cloud Dataproc API.

gcloud command

gcloud dataproc workflow-templates set-cluster-selector template-id \
    --cluster-labels name=value[[,name=value]...]

REST API

See WorkflowTemplatePlacement.ClusterSelector. This field is provided as part of a completed WorkflowTemplate submitted with a workflowTemplates.create or workflowTemplates.update request.

Console

You can view existing workflow templates and instantiated workflows from the Cloud Dataproc Workflows page in GCP Console.

Using Automatically Applied Labels

You can point a cluster selector to an existing cluster by using one of the following automatically-applied cluster labels:

  • goog-dataproc-cluster-name
  • goog-dataproc-cluster-uuid

Example:

gcloud dataproc workflow-templates set-cluster-selector template-id \
    --cluster-labels goog-dataproc-cluster-name=my-cluster

Selecting from a Cluster Pool

You can let Cloud Dataproc choose a cluster from a pool of clusters. The cluster pools can be defined with labels.

Example:

gcloud dataproc clusters create cluster-1 --labels cluster-pool=pool-1
gcloud dataproc clusters create cluster-2 --labels cluster-pool=pool-1
gcloud dataproc clusters create cluster-3 --labels cluster-pool=pool-2

After cluster creation ...

gcloud dataproc workflow-templates create my-template
gcloud dataproc workflow-templates set-cluster-selector my-template \
  --cluster-labels cluster-pool=pool-1

The workflow will be run on either cluster-1 or cluster-2, but not on cluster-3.

Var denne side nyttig? Giv os en anmeldelse af den:

Send feedback om...

Cloud Dataproc Documentation
Har du brug for hjælp? Besøg vores supportside.