Stay organized with collections Save and categorize content based on your preferences.

Getting information about datasets

This document describes how to get information or metadata about datasets in BigQuery.

You can get dataset information in the following ways:

  • Using the Google Cloud console.
  • Using the bq show command in the bq command-line tool.
  • Calling the datasets.get API method.
  • Querying the INFORMATION_SCHEMA views (beta).
  • Using the client libraries.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Required permissions

To get information or metadata about a dataset, you need the bigquery.datasets.get IAM permission.

Each of the following predefined IAM roles includes the permissions that you need in order to get information or metadata about a dataset:

  • roles/bigquery.user
  • roles/bigquery.metadataViewer
  • roles/bigquery.dataViewer
  • roles/bigquery.dataOwner
  • roles/bigquery.dataEditor
  • roles/bigquery.admin

For more information on IAM roles and permissions in BigQuery, see Access control.

Getting dataset information

To get information about datasets in a project:

Console

  1. In the Explorer panel, expand your project and select a dataset.

  2. Expand the Actions option and click Open. The description and details appear in the details panel. The tables for a dataset are listed with the dataset name in the Explorer panel.

By default, anonymous datasets are hidden from the Google Cloud console. To show information about anonymous datasets, use the bq command-line tool or the API.

bq

Issue the bq show command. The --format flag can be used to control the output. If you are getting information about a dataset in a project other than your default project, add the project ID to the dataset name in the following format: project_id:dataset.

To show information about an anonymous dataset, use the bq ls --all command to list all datasets and then use the name of the anonymous dataset in the bq show command.

bq show --format=prettyjson project_id:dataset

Replace the following:

  • project_id is the name of your project.
  • dataset is the name of the dataset.

Examples:

Enter the following command to display information about mydataset in your default project.

bq show --format=prettyjson mydataset

Enter the following command to display information about mydataset in myotherproject.

bq show --format=prettyjson myotherproject:mydataset

Enter the following command to display information about an anonymous dataset, _1234abcd56efgh78ijkl1234 in your default project.

bq show --format=prettyjson _1234abcd56efgh78ijkl1234

API

Call the datasets.get API method and provide any relevant parameters.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/bigquery"
	"google.golang.org/api/iterator"
)

// printDatasetInfo demonstrates fetching dataset metadata and printing some of it to an io.Writer.
func printDatasetInfo(w io.Writer, projectID, datasetID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydataset"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	meta, err := client.Dataset(datasetID).Metadata(ctx)
	if err != nil {
		return err
	}

	fmt.Fprintf(w, "Dataset ID: %s\n", datasetID)
	fmt.Fprintf(w, "Description: %s\n", meta.Description)
	fmt.Fprintln(w, "Labels:")
	for k, v := range meta.Labels {
		fmt.Fprintf(w, "\t%s: %s", k, v)
	}
	fmt.Fprintln(w, "Tables:")
	it := client.Dataset(datasetID).Tables(ctx)

	cnt := 0
	for {
		t, err := it.Next()
		if err == iterator.Done {
			break
		}
		cnt++
		fmt.Fprintf(w, "\t%s\n", t.TableID)
	}
	if cnt == 0 {
		fmt.Fprintln(w, "\tThis dataset does not contain any tables.")
	}
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.api.gax.paging.Page;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQuery.TableListOption;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Dataset;
import com.google.cloud.bigquery.DatasetId;
import com.google.cloud.bigquery.Table;

public class GetDatasetInfo {

  public static void runGetDatasetInfo() {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String datasetName = "MY_DATASET_NAME";
    getDatasetInfo(projectId, datasetName);
  }

  public static void getDatasetInfo(String projectId, String datasetName) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
      DatasetId datasetId = DatasetId.of(projectId, datasetName);
      Dataset dataset = bigquery.getDataset(datasetId);

      // View dataset properties
      String description = dataset.getDescription();
      System.out.println(description);

      // View tables in the dataset
      // For more information on listing tables see:
      // https://javadoc.io/static/com.google.cloud/google-cloud-bigquery/0.22.0-beta/com/google/cloud/bigquery/BigQuery.html
      Page<Table> tables = bigquery.listTables(datasetName, TableListOption.pageSize(100));

      tables.iterateAll().forEach(table -> System.out.print(table.getTableId().getTable() + "\n"));

      System.out.println("Dataset info retrieved successfully.");
    } catch (BigQueryException e) {
      System.out.println("Dataset info not retrieved. \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function getDataset() {
  // Retrieves dataset named "my_dataset".

  /**
   * TODO(developer): Uncomment the following lines before running the sample
   */
  // const datasetId = "my_dataset";

  // Retrieve dataset reference
  const [dataset] = await bigquery.dataset(datasetId).get();

  console.log('Dataset:');
  console.log(dataset.metadata.datasetReference);
}
getDataset();

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to fetch.
# dataset_id = 'your-project.your_dataset'

dataset = client.get_dataset(dataset_id)  # Make an API request.

full_dataset_id = "{}.{}".format(dataset.project, dataset.dataset_id)
friendly_name = dataset.friendly_name
print(
    "Got dataset '{}' with friendly_name '{}'.".format(
        full_dataset_id, friendly_name
    )
)

# View dataset properties.
print("Description: {}".format(dataset.description))
print("Labels:")
labels = dataset.labels
if labels:
    for label, value in labels.items():
        print("\t{}: {}".format(label, value))
else:
    print("\tDataset has no labels defined.")

# View tables in dataset.
print("Tables:")
tables = list(client.list_tables(dataset))  # Make an API request(s).
if tables:
    for table in tables:
        print("\t{}".format(table.table_id))
else:
    print("\tThis dataset does not contain any tables.")

INFORMATION_SCHEMA

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, routines, tables, views, jobs, reservations, and streaming data.

SCHEMATA view

When you query the INFORMATION_SCHEMA.SCHEMATA view, the query results contain one row for each dataset in a project to which the current user has access.

The INFORMATION_SCHEMA.SCHEMATA view has the following schema:

Column name Data type Value
CATALOG_NAME STRING The name of the project that contains the dataset
SCHEMA_NAME STRING The dataset's name also referred to as the datasetId
SCHEMA_OWNER STRING The value is always NULL
CREATION_TIME TIMESTAMP The dataset's creation time
LAST_MODIFIED_TIME TIMESTAMP The dataset's last modified time
LOCATION STRING The dataset's geographic location
DDL STRING The CREATE SCHEMA DDL statement that can be used to create the dataset
DEFAULT_COLLATION_NAME STRING The name of the default collation specification if it exists; otherwise, NULL.

Examples

To run the query against a project other than your default project, add the project ID to the dataset in the following format:

`PROJECT_ID`.INFORMATION_SCHEMA.SCHEMATA
for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA.

SELECT
  * EXCEPT (schema_owner)
FROM
  INFORMATION_SCHEMA.SCHEMATA;

The result is similar to the following. For readability, some columns are excluded from the result.

+----------------+---------------+---------------------+---------------------+------------+------------------------------------------+
|  catalog_name  |  schema_name  |    creation_time    | last_modified_time  |  location  |                   ddl                    |
+----------------+---------------+---------------------+---------------------+------------+------------------------------------------+
| myproject      | mydataset1    | 2018-11-07 19:50:24 | 2018-11-07 19:50:24 | US         | CREATE SCHEMA `myproject.mydataset1`     |
|                |               |                     |                     |            | OPTIONS(                                 |
|                |               |                     |                     |            |   location="us"                          |
|                |               |                     |                     |            | );                                       |
+----------------+---------------+---------------------+---------------------+------------+------------------------------------------+
| myproject      | mydataset2    | 2018-07-16 04:24:22 | 2018-07-16 04:24:22 | US         | CREATE SCHEMA `myproject.mydataset2`     |
|                |               |                     |                     |            | OPTIONS(                                 |
|                |               |                     |                     |            |   default_partition_expiration_days=3.0, |
|                |               |                     |                     |            |   location="us"                          |
|                |               |                     |                     |            | );                                       |
+----------------+---------------+---------------------+---------------------+------------+------------------------------------------+
| myproject      | mydataset3    | 2018-02-07 21:08:45 | 2018-05-01 23:32:53 | US         | CREATE SCHEMA `myproject.mydataset3`     |
|                |               |                     |                     |            | OPTIONS(                                 |
|                |               |                     |                     |            |   description="My dataset",              |
|                |               |                     |                     |            |   location="us"                          |
|                |               |                     |                     |            | );                                       |
+----------------+---------------+---------------------+---------------------+------------+------------------------------------------+

SCHEMATA_OPTIONS view

When you query the INFORMATION_SCHEMA.SCHEMATA_OPTIONS view, the query results contain one row for each option of each dataset in a project.

The INFORMATION_SCHEMA.SCHEMATA_OPTIONS view has the following schema:

Column name Data type Value
CATALOG_NAME STRING The name of the project that contains the dataset
SCHEMA_NAME STRING The dataset's name also referred to as the datasetId
OPTION_NAME STRING One of the name values in the options table
OPTION_TYPE STRING One of the data type values in the options table
OPTION_VALUE STRING One of the value options in the options table

Options table
OPTION_NAME OPTION_TYPE OPTION_VALUE
default_table_expiration_days FLOAT64 The default lifetime, in days, of all tables in the dataset
default_partition_expiration_days FLOAT64 The default lifetime, in days, of all partitioned tables in the dataset
friendly_name STRING The dataset's descriptive name
description STRING A description of the dataset
labels ARRAY<STRUCT<STRING, STRING>> An array of STRUCT's that represent the labels on the dataset

Examples

Retrieve the default table expiration time for all datasets in your project

To run the query against a project other than your default project, add the project ID to the dataset in the following format:

`PROJECT_ID`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS
for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS.

SELECT
  *
FROM
  INFORMATION_SCHEMA.SCHEMATA_OPTIONS
WHERE
  option_name = 'default_table_expiration_days';

The result is similar to the following:

  +----------------+---------------+-------------------------------+-------------+---------------------+
  |  catalog_name  |  schema_name  |          option_name          | option_type |    option_value     |
  +----------------+---------------+-------------------------------+-------------+---------------------+
  | myproject      | mydataset3    | default_table_expiration_days | FLOAT64     | 0.08333333333333333 |
  | myproject      | mydataset2    | default_table_expiration_days | FLOAT64     | 90.0                |
  | myproject      | mydataset1    | default_table_expiration_days | FLOAT64     | 30.0                |
  +----------------+---------------+-------------------------------+-------------+---------------------+
  

Retrieve labels for all datasets in your project

To run the query against a project other than your default project, add the project ID to the dataset in the following format:

`PROJECT_ID`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS
; for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS.

SELECT
  *
FROM
  INFORMATION_SCHEMA.SCHEMATA_OPTIONS
WHERE
  option_name = 'labels';

The result is similar to the following:

  +----------------+---------------+-------------+---------------------------------+------------------------+
  |  catalog_name  |  schema_name  | option_name |          option_type            |      option_value      |
  +----------------+---------------+-------------+---------------------------------+------------------------+
  | myproject      | mydataset1    | labels      | ARRAY<STRUCT<STRING, STRING>>   | [STRUCT("org", "dev")] |
  | myproject      | mydataset2    | labels      | ARRAY<STRUCT<STRING, STRING>>   | [STRUCT("org", "dev")] |
  +----------------+---------------+-------------+---------------------------------+------------------------+
  

Dataset security

To control access to datasets in BigQuery, see Controlling access to datasets. For information about data encryption, see Encryption at rest.

Next steps