Cluster metadata

Dataproc sets special metadata values for the instances that run in your cluster:

Metadata keyValue
dataproc-bucketName of the cluster's staging bucket
dataproc-regionRegion of the cluster's endpoint
dataproc-worker-countNumber of worker nodes in the cluster. The value is 0 for single node clusters.
dataproc-cluster-nameName of the cluster
dataproc-cluster-uuidUUID of the cluster
dataproc-roleInstance's role, either Master or Worker
dataproc-masterHostname of the first master node. The value is either [CLUSTER_NAME]-m in a standard or single node cluster, or [CLUSTER_NAME]-m-0 in a high-availability cluster, where [CLUSTER_NAME] is the name of your cluster.
dataproc-master-additionalComma-separated list of hostnames for the additional master nodes in a high-availability cluster, for example, [CLUSTER_NAME]-m-1,[CLUSTER_NAME]-m-2 in a cluster that has 3 master nodes.

You can use these values to customize the behavior of initialization actions.

You can also use the --metadata flag of the gcloud dataproc clusters create command in the gcloud CLI to provide your own custom metadata:

gcloud dataproc clusters create cluster-name \
    --region=region \
    --metadata=name1=value1,name2=value2... \
    ... other flags ...