Labeling Datasets

To help organize your projects, you can add arbitrary labels to your datasets. After labeling your datasets, you can search for datasets based on label values. For example, you can use labels to group datasets by purpose, environment, department, and so on.

Before you begin

Labeling a dataset

Labels are key:value pairs that you can attach to a dataset. BigQuery labels have the same characteristics as labels in Compute Engine and Cloud Resource Manager:

  • Label keys and values can be no longer than 63 characters.
  • Label keys and values can contain only lowercase letters, numbers, underscores, hyphens, and international characters.
  • Label keys and values cannot exceed 128 bytes in size.
  • Label keys must begin with a letter.
  • Label keys must be unique within a dataset.

Each of your datasets can have up to 64 labels. Datasets do not have any default labels.

Labels must have a key, but can have an empty value. Labels with no value can be used as tags.

Creating or updating a label

To create or update a label for a dataset:

Command-line

Use the bq update command with the set_label flag.

  • bq update --set_label [KEY1:VALUE1] --set_label [KEY2:VALUE2] --set_label [KEYN:VALUEN] [DATASET_ID]

Where [KEY1:VALUE1], [KEY2:VALUE2], and [KEYN:VALUEN] are label key:value pairs that you want to create, and [DATASET_ID] is a valid dataset ID.

For example, to add a label to track departments, you can create a label with a department key:

  • bq update --set_label department:shipping [DATASET_ID]

API

  1. Populate the labels field in the dataset's configuration object.

      {
        "kind": "bigquery#dataset",
        ...
        "friendlyName": string,
        "description": string,
        "labels": {
          "[KEY1]": "[VALUE1]",
          "[KEY2]": "[VALUE2]"
        }
         ...
      }
    

    Where [KEY1] and [KEY2] are label keys, and [VALUE1] and [VALUE2] are the corresponding label values that you want to create.

  2. To replace all labels in the dataset, call Datasets.update().
  3. Alternately, to replace only the labels specified and otherwise preserve existing ones, call Datasets.patch().

  4. If there are concurrent calls to update() and delete(), BigQuery executes the last one received.

Java

This sample uses the Google HTTP Client Library for Java to send a request to the BigQuery API.
static final HttpTransport HTTP_TRANSPORT = new NetHttpTransport();
static final JsonFactory JSON_FACTORY = new JacksonFactory();

public static class Dataset {
  @Key private Map<String, String> labels;

  public Map<String, String> getLabels() {
    return this.labels;
  }

  public Dataset addLabel(String key, String value) {
    if (this.labels == null) {
      this.labels = new HashMap<>();
    }
    this.labels.put(key, value);
    return this;
  }
}

/**
 * Add or modify a label on a dataset.
 *
 * <p>See <a href="https://cloud.google.com/bigquery/docs/labeling-datasets">the BigQuery
 * documentation</a>.
 */
public static void labelDataset(
    String projectId, String datasetId, String labelKey, String labelValue) throws IOException {

  // Authenticate requests using Google Application Default credentials.
  GoogleCredential credential = GoogleCredential.getApplicationDefault();
  credential = credential.createScoped(Arrays.asList("https://www.googleapis.com/auth/bigquery"));

  // Get a new access token.
  // Note that access tokens have an expiration. You can reuse a token rather than requesting a
  // new one if it is not yet expired.
  credential.refreshToken();
  String accessToken = credential.getAccessToken();

  // Set the content of the request.
  Dataset dataset = new Dataset();
  dataset.addLabel(labelKey, labelValue);
  HttpContent content = new JsonHttpContent(JSON_FACTORY, dataset);

  // Send the request to the BigQuery API.
  String urlFormat =
      "https://www.googleapis.com/bigquery/v2/projects/%s/datasets/%s"
          + "?fields=labels&access_token=%s";
  GenericUrl url = new GenericUrl(String.format(urlFormat, projectId, datasetId, accessToken));
  HttpRequestFactory requestFactory = HTTP_TRANSPORT.createRequestFactory();
  HttpRequest request = requestFactory.buildPostRequest(url, content);
  request.setParser(JSON_FACTORY.createJsonObjectParser());

  // Workaround for transports which do not support PATCH requests.
  // See: http://stackoverflow.com/a/32503192/101923
  request.setHeaders(new HttpHeaders().set("X-HTTP-Method-Override", "PATCH"));
  HttpResponse response = request.execute();

  // Check for errors.
  if (response.getStatusCode() != 200) {
    throw new RuntimeException(response.getStatusMessage());
  }

  Dataset responseDataset = response.parseAs(Dataset.class);
  System.out.printf(
      "Updated label \"%s\" with value \"%s\"\n",
      labelKey, responseDataset.getLabels().get(labelKey));
}

Python

This sample uses the Google Auth Library for Python to send a request to the BigQuery API with an AuthorizedSession which is compatible with a Requests session.
def label_dataset(dataset_id, label_key, label_value, project_id=None):
    """Add or modify a label on a dataset."""
    # Authenticate requests using Google Application Default credentials.
    credentials, default_project_id = google.auth.default(
            scopes=['https://www.googleapis.com/auth/bigquery'])
    session = google.auth.transport.requests.AuthorizedSession(credentials)

    if project_id is None:
        project_id = default_project_id

    # Send a PATCH request to add or modify a label.
    url_format = (
        'https://www.googleapis.com/bigquery/v2/'
        'projects/{project_id}/datasets/{dataset_id}')
    response = session.patch(
        url_format.format(project_id=project_id, dataset_id=dataset_id),
        params={'fields': 'labels'},
        json={
            'labels': {
                label_key: label_value,
            }
        })

    # Check the response for errors.
    response.raise_for_status()

    # Print the new label value from the response.
    labels = response.json()['labels']
    print(
        'Updated label "{}" with value "{}"'.format(
            label_key,
            labels[label_key]))

Deleting a label

To delete a label from a dataset:

Command-line

Use the bq update command with the clear_label flag.

  • bq update --clear_label [KEY1] --clear_label [KEY2] --clear_label [KEYN] [DATASET_ID]

Where [KEY1], [KEY2], and [KEYN] are label keys that you want to delete, and [DATASET_ID] is a valid dataset ID.

For example, to delete a label that tracks departments, specify the department key:

  • bq update --clear_label department [DATASET_ID]

API

To remove all labels from a dataset:

  1. Remove the labels field from the dataset's configuration object.

    {
      "kind": "bigquery#dataset",
      ...
      "friendlyName": string,
      "description": string,
       ...
    }
    
  2. Call Datasets.update().

To delete specific labels from a dataset and otherwise preserve the other existing labels:

  1. Change the value of the labels you want to delete to null:

    {
      "kind": "bigquery#dataset",
      ...
      "friendlyName": string,
      "description": string,
      "labels": {
        "[KEY1]": null,
        "[KEY2]": null
      }
       ...
    }
    

    Where [KEY1] and [KEY2] are label keys that you want to delete.

  2. Call Datasets.patch().

If there are concurrent calls to update() and delete(), BigQuery executes the last one received.

Deleting a label's value

A label that has a key but an empty value can be used as a tag. To delete a label's value, specify a label with an empty value:

Command-line

Use the bq update command with the set_label flag. Specify the key, followed by a colon, but leave the value unspecified.

  • bq update --set_label [KEY1]: --set_label [KEY2]: --set_label [KEYN]: [DATASET_ID]

Where [KEY1], [KEY2], and [KEYN] are label keys that you want to keep without values, and [DATASET_ID] is a valid dataset ID.

For example, to delete the value associated with the department key, specify the department key with an empty value:

  • bq update --set_label department: [DATASET_ID]

API

To delete a label's value, but retain the label's key:

  1. Change the value of the labels you want to delete to the empty string (""):

    {
      "kind": "bigquery#dataset",
      ...
      "friendlyName": string,
      "description": string,
      "labels": {
        "[KEY1]": "",
        "[KEY2]": ""
      }
       ...
    }
    

    Where [KEY1] and [KEY2] are label keys that you want to keep, but with no values attached to them.

  2. Call Datasets.patch().

If there are concurrent calls to update() and delete(), BigQuery executes the last one received.

Viewing a dataset's labels

To view a dataset's labels:

Command-line

Use the bq show command with the dataset ID.

  • bq show [DATASET_ID]

The command-line tool returns the labels associated with that dataset.

API

Call the Datasets.get() command. The response includes all labels associated with that dataset.

Alternately, you can use Datasets.list() to view the labels for multiple datasets. You can also use filters to return a group of datasets that share common labels.

Filtering datasets using labels

To filter for a set of datasets based on labels, create a filter specification for use in the bq command-line tool or the BigQuery API that uses the following syntax:

field[:value][ field[:value]...

Where:

  • field is expressed as labels.[KEY] where [KEY] is a label key.
  • value is an optional label value.

The filter specification has the following characteristics:

  • Only the AND logical operator is supported. Space-separated comparisons are treated as having implicit AND operators.
  • The only field currently eligible for filtering is "labels.key" where "key" is the name of a label.
  • The filter can include up to ten expressions.
  • Filtering is case-sensitive.

Filter specification examples

To search for all datasets that have a department key with a value of shipping, use the following specification:

labels.department:shipping

To search for multiple key:value pairs, separate them with a space. For example, to search for all datasets where the value of the department key is shipping and the value of the location key is usa, use the following specification:

labels.department:shipping labels.location:usa

You can filter on the presence of a key alone, rather than matching against a key:value pair. The following filter specification returns all datasets with labels named department, without regard to the values that might be associated with the keys:

labels.department

An equivalent specification uses an asterisk to represent all possible values associated with the department key:

labels.department:*

Generating filtered lists of datasets

To generate a filtered list of datasets:

Command-line

Use the bq ls command with the filter flag where [FILTER_SPECIFICATION] is a valid filter specification:

  • bq ls --filter "[FILTER_SPECIFICATION]"

The command-line tool returns a list of datasets that meet the filter requirements.

For example, the following command returns a list of datasets that have a department key with a value of shipping:

  • bq ls --filter "labels.department:shipping"

API

  1. Add a filter property that specifies the filter you want to use:

    {
      "kind": "bigquery#dataset",
      ...
      "friendlyName": string,
      "description": string,
      "filter": "[FILTER_SPECIFICATION]",
       ...
    }
    

    Where [FILTER_SPECIFICATION] is a valid filter specification.

  2. Call the Datasets.list() command. The response includes all labels associated with that dataset.

What's next

Send feedback about...

BigQuery Documentation