Cloud Dataproc User Labels

You can apply user labels to Cloud Dataproc cluster and job resources in order to group resources and related operations for later filtering and listing. You associate labels with resources when the resource is created (at cluster creation or job submission) using the Google Cloud SDK gcloud command-line tool, the Google Cloud Platform Console, or the Cloud Dataproc REST API. Once a resource is associated with a label, the label is propagated to operations performed on the resource (cluster create, update, patch, or delete; job submit, update, cancel, or delete), allowing you to filter and list clusters, jobs, and operations by label.

Label Semantics and Requirements

Labels are string key:value pairs. Cloud Dataproc labels share the characteristics of other Google Cloud Platform resource labels (exceptions* noted below):

  • Label keys and values can be no longer than 63 characters.
  • Label keys and values can contain only lowercase letters, numbers, underscores, hyphens, and international characters.
  • Label keys and values cannot exceed 128 bytes in size.
  • Label keys must begin with a letter.
  • Label keys must be unique within a resource type (cluster, job, or operation).
  • Each Cloud Dataproc resource can have up to 32 labels (*other Google Cloud Platform resources can be associated with up to 64 labels).
  • Cloud Dataproc resources do not have default labels.
  • Cloud Dataproc automatically manages some system labels using the prefix goog-dataproc-.

Automatically-applied labels

When creating or updating a cluster, Cloud Dataproc automatically applies several labels to the cluster and cluster resources. For example, Cloud Dataproc applies labels to the virtual machines, persistent disks, and accelerators. Automatically applied labels have a special goog-dataproc prefix.

You can use these labels in many ways, including:

The following goog-dataproc labels are automatically applied to Cloud Dataproc resources. Any values you supply for the reserved goog-dataproc labels at cluster creation will override the automatically-supplied value. For this reason, supplying your own values for these labels is not recommended.

Label Description
goog-dataproc-cluster-name User-specified cluster name.
goog-dataproc-cluster-uuid Unique cluster ID.
goog-dataproc-location Cloud Dataproc regional cluster endpoint.

Creating and Using Cloud Dataproc Labels

gcloud Command

You can specify one or more labels to be applied to a Cloud Dataproc cluster or job at creation or submit time using the gcloud command-line tool.

gcloud dataproc clusters create args --labels env=prod,customer=acme
gcloud dataproc jobs submit args --labels env=prod,customer=acme
Once a Cloud Dataproc cluster or job has been created, you can update the labels associated with that resource using the gcloud command-line tool.
gcloud dataproc clusters update args --update-labels env=prod,customer=acme
gcloud dataproc jobs update args --update-labels env=prod,customer=acme
Similarly, you can use the gcloud command-line tool to filter Cloud Dataproc resources by label using a filter expression of the following format: labels.<key=value>.
gcloud dataproc clusters list --filter "status.state=ACTIVE AND labels.env=prod"
gcloud dataproc jobs list --filter "status.state=ACTIVE AND labels.customer=acme"
See the clusters.list and jobs.list Cloud Dataproc API documentation for more information on writing a filter expression.

REST API

Labels can be attached to Cloud Dataproc resources through the Cloud Dataproc REST API. The clusters.create, jobs.submit APIs can be used to attach labels to a cluster or job at creation or submit time. The clusters.patch, jobs.patch APIs can be used to edit labels after the resource has been created. Here is the JSON body of a cluster.create request that includes attaches a key1:value label to the cluster.

{
  "clusterName": "cluster-1",
  "projectId": "my-project",
  "config": {
    "configBucket": "",
    "gceClusterConfig": {
      "networkUri": ".../networks/default",
      "zoneUri": ".../zones/us-central1-f"
    },
    "masterConfig": {
      "numInstances": 1,
      "machineTypeUri": "..../machineTypes/n1-standard-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    },
    "workerConfig": {
      "numInstances": 2,
      "machineTypeUri": "...machineTypes/n1-standard-4",
      "diskConfig": {
        "bootDiskSizeGb": 500,
        "numLocalSsds": 0
      }
    }
  },
  "labels": {
    "key1": "value1"
  }
}
The clusters.list and jobs.list APIs can be used to list resources that match a specified filter, using the following format: labels.<key=value>. Here is a sample Cloud Dataproc API clusters.list HTTPS GET request that specifies a key=value label filter. The caller inserts project, region, a filter label-key and label-value, and an api-key. Note that this sample request is broken into two lines for readability.
GET https://dataproc.googleapis.com/v1/projects/project/regions/region/clusters?
filter=labels.label-key=label-value&key=api-key
See the clusters.list and jobs.list Cloud Dataproc API documentation for more information on writing a filter expression.

Console

You can specify a set of labels to be applied to a Cloud Dataproc resource at creation or submit time using the GCP Console. Below is an example of creating a label to associate with a Cloud Dataproc cluster from the Cloud Dataproc→Create a cluster page.

Here is an example of creating a label to associate with a Cloud Dataproc job from the Cloud Dataproc→Submit a job page.

Once a Cloud Dataproc resource has been created, you can update the labels associated with that resource. To update labels, you must first click SHOW INFO PANEL in the top- left of the page. This is an example from the Cloud Dataproc→List clusters page.

Once the info panel is displayed, you can update the labels for your Cloud Dataproc resources. Below is an example of updating labels for a Cloud Dataproc cluster.

It is also possible to update labels for multiple items in one operation. In this example, labels are being updated for multiple Cloud Dataproc jobs at the same time.

Labels allow you to filter the Cloud Dataproc resources shown on the Cloud Dataproc→List clusters and Cloud Dataproc→List jobs pages. In the top of the page, you can use the search pattern labels.<labelname>=<value> to filter resources by a label.

What's Next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation