You can also add labels to Compute Engine resources associated with cluster resources, such as Virtual Machine instances and disks.
What are labels?
A label is a key-value pair that helps you organize your Google Cloud Dataproc clusters and jobs. You can attach a label to each resource, then filter the resources based on their labels. Information about labels is forwarded to the billing system, so you can break down your billing charges by label.
Common uses of labels
We do not recommend creating large numbers of unique labels, such as for timestamps or individual values for every API call. Here are some common use cases for labels:
Team or cost center labels: Add labels based on team or cost center to distinguish Dataproc clusters and jobs owned by different teams (for example,
team:research
andteam:analytics
). You can use this type of label for cost accounting or budgeting.Component labels: For example,
component:redis
,component:frontend
,component:ingest
, andcomponent:dashboard
.Environment or stage labels: For example,
environment:production
andenvironment:test
.State labels: For example,
state:active
,state:readytodelete
, andstate:archive
.
Requirements for labels
The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
Creating and using Dataproc labels
gcloud Command
You can specify one or more labels to be applied to a Dataproc cluster or job
at creation or submit time using the gcloud
command-line tool.
gcloud dataproc clusters create args --labels env=prod,customer=acme
gcloud dataproc jobs submit args --labels env=prod,customer=acme
Once a Dataproc cluster or job has been created, you can update the labels
associated with that resource using the gcloud
command-line tool.
gcloud dataproc clusters update args --update-labels env=prod,customer=acme
gcloud dataproc jobs update args --update-labels env=prod,customer=acme
Similarly, you can use the gcloud
command-line tool to filter Dataproc resources by label using
a filter expression of the following format: labels.<key=value>
.
gcloud dataproc clusters list \ --region=region \ --filter="status.state=ACTIVE AND labels.env=prod"
gcloud dataproc jobs list \ --region=region \ --filter="status.state=ACTIVE AND labels.customer=acme"
See the clusters.list and jobs.list Dataproc API documentation for more information on writing a filter expression.
REST API
Labels can be attached to Dataproc resources through the
Dataproc REST API. The clusters.create,
jobs.submit
APIs can be used to attach labels to a cluster or job at creation or submit time.
The clusters.patch,
jobs.patch APIs
can be used to edit labels after the resource has been created. Here is the JSON body of a cluster.create request that includes attaches a
key1:value
label to the cluster.
{ "clusterName":"cluster-1", "projectId":"my-project", "config":{ "configBucket":"", "gceClusterConfig":{ "networkUri":".../networks/default", "zoneUri":".../zones/us-central1-f" }, "masterConfig":{ "numInstances":1, "machineTypeUri":"..../machineTypes/n1-standard-4", "diskConfig":{ "bootDiskSizeGb":500, "numLocalSsds":0 } }, "workerConfig":{ "numInstances":2, "machineTypeUri":"...machineTypes/n1-standard-4", "diskConfig":{ "bootDiskSizeGb":500, "numLocalSsds":0 } } }, "labels":{ "key1":"value1" } }
The clusters.list
and jobs.list
APIs can be used to list resources that match a specified filter, using
the following format: labels.<key=value>
.
Here is a sample Dataproc API
clusters.list
HTTPS GET request that specifies a key=value
label filter. The caller inserts
project
, region
, a filter label-key
and label-value
, and an api-key
. Note that this sample request is broken into
two lines for readability.
GET https://dataproc.googleapis.com/v1/projects/project/regions/region/clusters? filter=labels.label-key=label-value&key=api-key
See the clusters.list and jobs.list Dataproc API documentation for more information on writing a filter expression.
Console
You can specify a set of labels to add to a Dataproc resource at creation or submit time using the Cloud Console.
- Add labels to a cluster from the Labels section of the Customize cluster panel of the Dataproc Create a cluster page.
- Add labels to a job from the Dataproc Submit a job page.
Once a Dataproc resource has been created, you can update the labels
associated with that resource. To update labels, you must first click SHOW INFO PANEL
in the top-
left of the page. This is an example from the Dataproc→List clusters page.

Once the info panel is displayed, you can update the labels for your Dataproc resources. Below is an example of updating labels for a Dataproc cluster.

It is also possible to update labels for multiple items in one operation. In this example, labels are being updated for multiple Dataproc jobs at the same time.

Labels allow you to filter the Dataproc resources shown on the Dataproc→List clusters and Dataproc→List jobs pages. In the top of the page, you can use the search pattern labels.<labelname>=<value>
to filter resources by a label.

Automatically applied labels
When creating or updating a cluster, Dataproc automatically
applies several labels to the cluster and cluster resources. For example,
Dataproc applies labels to virtual machines, persistent disks,
and accelerators when a cluster is created. Automatically applied labels have a
special goog-dataproc
prefix.
The following goog-dataproc
labels are automatically applied to
Dataproc resources. Any values you supply for the reserved
goog-dataproc
labels at cluster creation will override
automatically supplied values. For this reason, supplying your own values for
these labels is not recommended.
Label | Description |
---|---|
goog-dataproc-cluster-name |
User-specified cluster name |
goog-dataproc-cluster-uuid |
Unique cluster ID |
goog-dataproc-location |
Dataproc regional cluster endpoint |
You can use these automatically applied labels in many ways, including:
- Searching and filtering for Dataproc resources
- Filtering billing data to calculate Dataproc costs