Getting information about datasets

This document describes how to get information or metadata about datasets in BigQuery.

You can get dataset information by:

  • Using the GCP Console or the classic BigQuery web UI
  • Using the bq show CLI command
  • Calling the datasets.get API method
  • Querying the INFORMATION_SCHEMA views (beta)
  • Using the client libraries

Required permissions

At a minimum, to get information or metadata about a dataset, you must be granted bigquery.datasets.get permissions. The following predefined Cloud IAM roles include bigquery.datasets.get permissions:

  • bigquery.user
  • bigquery.metadataViewer
  • bigquery.dataViewer
  • bigquery.dataOwner
  • bigquery.dataEditor
  • bigquery.admin

For more information on Cloud IAM roles and permissions in BigQuery, see Access control.

Getting dataset information

To get information about datasets in a project:

Console

Click the dataset name in the Resources panel. Below the Query editor you should see dataset's description and details. The tables for a dataset are nested below it in the Resources panel.

View dataset

By default, anonymous datasets are hidden from the BigQuery web UI. To show information about anonymous datasets, use the CLI or API.

Classic UI

Click the dataset name. The Dataset Details page displays the dataset's description, details, and tables.

View dataset

By default, anonymous datasets are hidden from the BigQuery web UI. To show information about anonymous datasets, use the CLI or API.

CLI

Issue the bq show command. The --format flag can be used to control the output. If you are getting information about a dataset in a project other than your default project, add the project ID to the dataset name in the following format: project_id:dataset.

To show information about an anonymous dataset, use the bq ls --all command to list all datasets and then use the name of the anonymous dataset in the bq show command.

bq show --format=prettyjson project_id:dataset

Where:

  • project_id is the name of your project.
  • dataset is the name of the dataset.

Examples:

Enter the following command to display information about mydataset in your default project.

bq show --format=prettyjson mydataset

Enter the following command to display information about mydataset in myotherproject.

bq show --format=prettyjson myotherproject:mydataset

Enter the following command to display information about an anonymous dataset, _1234abcd56efgh78ijkl1234 in your default project.

bq show --format=prettyjson _1234abcd56efgh78ijkl1234

API

Call the datasets.get API method and provide any relevant parameters.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .

// To run this sample, you will need to create (or reuse) a context and
// an instance of the bigquery client.  For example:
// import "cloud.google.com/go/bigquery"
// ctx := context.Background()
// client, err := bigquery.NewClient(ctx, "your-project-id")
meta, err := client.Dataset(datasetID).Metadata(ctx)
if err != nil {
	return err
}

fmt.Printf("Dataset ID: %s\n", datasetID)
fmt.Printf("Description: %s\n", meta.Description)
fmt.Println("Labels:")
for k, v := range meta.Labels {
	fmt.Printf("\t%s: %s", k, v)
}
fmt.Println("Tables:")
it := client.Dataset(datasetID).Tables(ctx)

cnt := 0
for {
	t, err := it.Next()
	if err == iterator.Done {
		break
	}
	cnt++
	fmt.Printf("\t%s\n", t.TableID)
}
if cnt == 0 {
	fmt.Println("\tThis dataset does not contain any tables.")
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Java API reference documentation .

DatasetId datasetId = DatasetId.of(projectId, datasetName);
Dataset dataset = bigquery.getDataset(datasetId);

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

from google.cloud import bigquery

# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()

# TODO(developer): Set dataset_id to the ID of the dataset to fetch.
# dataset_id = 'your-project.your_dataset'

dataset = client.get_dataset(dataset_id)

full_dataset_id = "{}.{}".format(dataset.project, dataset.dataset_id)
friendly_name = dataset.friendly_name
print(
    "Got dataset '{}' with friendly_name '{}'.".format(
        full_dataset_id, friendly_name
    )
)

# View dataset properties
print("Description: {}".format(dataset.description))
print("Labels:")
labels = dataset.labels
if labels:
    for label, value in labels.items():
        print("\t{}: {}".format(label, value))
else:
    print("\tDataset has no labels defined.")

# View tables in dataset
print("Tables:")
tables = list(client.list_tables(dataset))  # API request(s)
if tables:
    for table in tables:
        print("\t{}".format(table.table_id))
else:
    print("\tThis dataset does not contain any tables.")

INFORMATION_SCHEMA (beta)

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, tables, and views.

SCHEMATA view

When you query the INFORMATION_SCHEMA.SCHEMATA view, the query results contain one row for each dataset in a project to which the current user has access.

The INFORMATION_SCHEMA.SCHEMATA view has the following schema:

Column name Data type Value
CATALOG_NAME STRING The name of the project that contains the dataset
SCHEMA_NAME STRING The dataset's name also referred to as the datasetId
SCHEMA_OWNER STRING The value is always NULL
CREATION_TIME TIMESTAMP The dataset's creation time
LAST_MODIFIED_TIME TIMESTAMP The dataset's last modified time
LOCATION STRING The dataset's geographic location

For more information on dataset properties, see the dataset resource page in the REST API documentation.

Examples

The following example retrieves all columns from the INFORMATION_SCHEMA.SCHEMATA view except for schema_owner which is reserved for future use. The metadata returned is for all datasets in the default project — myproject.

To run the query against a project other than your default project, add the project ID to the dataset in the following format: `project_id`.INFORMATION_SCHEMA.view for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA.

To run the query:

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.

    SELECT
     * EXCEPT(schema_owner)
    FROM
     INFORMATION_SCHEMA.SCHEMATA
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   * EXCEPT(schema_owner)
 FROM
   INFORMATION_SCHEMA.SCHEMATA'

The results should look like the following:

  +----------------+---------------+---------------------+---------------------+-----------------+
  |  catalog_name  |  schema_name  |    creation_time    | last_modified_time  |    location     |
  +----------------+---------------+---------------------+---------------------+-----------------+
  | myproject      | mydataset1    | 2018-11-07 19:50:24 | 2018-11-07 19:50:24 | US              |
  | myproject      | mydataset2    | 2018-07-16 04:24:22 | 2018-07-16 04:24:22 | US              |
  | myproject      | mydataset3    | 2018-02-07 21:08:45 | 2018-05-01 23:32:53 | asia-northeast1 |
  +----------------+---------------+---------------------+---------------------+-----------------+
  

SCHEMATA_OPTIONS view

When you query the INFORMATION_SCHEMA.SCHEMATA_OPTIONS view, the query results contain one row for each dataset in a project to which the current user has access.

The INFORMATION_SCHEMA.SCHEMATA_OPTIONS view has the following schema:

Column name Data type Value
CATALOG_NAME STRING The name of the project that contains the dataset
SCHEMA_NAME STRING The dataset's name also referred to as the datasetId
OPTION_NAME STRING One of the name values in the options table
OPTION_TYPE STRING One of the data type values in the options table
OPTION_VALUE STRING One of the value options in the options table

Options table
OPTION_NAME OPTION_TYPE OPTION_VALUE
default_table_expiration_days FLOAT64 The default lifetime, in days, of all tables in the dataset
friendly_name STRING The dataset's descriptive name
description STRING A description of the dataset
labels ARRAY<STRUCT<STRING, STRING>> An array of STRUCT's that represent the labels on the dataset

For more information on dataset properties, see the dataset resource page in the REST API documentation.

Examples

Example 1:

The following example retrieves the default table expiration times for all datasets in your default project (myproject) by querying the INFORMATION_SCHEMATA.SCHEMATA_OPTIONS view.

To run the query against a project other than your default project, add the project ID to the dataset in the following format: `project_id`.INFORMATION_SCHEMA.view for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS.

To run the query:

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.

    SELECT
     *
    FROM
     INFORMATION_SCHEMA.SCHEMATA_OPTIONS
    WHERE
     option_name="default_table_expiration_days"
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   *
 FROM
   INFORMATION_SCHEMA.SCHEMATA_OPTIONS
 WHERE
   option_name="default_table_expiration_days"'

The results should look like the following:

  +----------------+---------------+-------------------------------+-------------+---------------------+
  |  catalog_name  |  schema_name  |          option_name          | option_type |    option_value     |
  +----------------+---------------+-------------------------------+-------------+---------------------+
  | myproject      | mydataset3    | default_table_expiration_days | FLOAT64     | 0.08333333333333333 |
  | myproject      | mydataset2    | default_table_expiration_days | FLOAT64     | 90.0                |
  | myproject      | mydataset1    | default_table_expiration_days | FLOAT64     | 30.0                |
  +----------------+---------------+-------------------------------+-------------+---------------------+
  

Example 2:

The following example retrieves the labels for all datasets in your default project (myproject) by querying the INFORMATION_SCHEMATA.SCHEMATA_OPTIONS view.

To run the query against a project other than your default project, add the project ID to the dataset in the following format: `project_id`.INFORMATION_SCHEMA.view; for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS.

To run the query:

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.

    SELECT
     *
    FROM
     INFORMATION_SCHEMA.SCHEMATA_OPTIONS
    WHERE
     option_name="labels"
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   *
 FROM
   INFORMATION_SCHEMA.SCHEMATA_OPTIONS
 WHERE
   option_name="labels"'

The results should look like the following:

  +----------------+---------------+-------------+---------------------------------+------------------------+
  |  catalog_name  |  schema_name  | option_name |          option_type            |      option_value      |
  +----------------+---------------+-------------+---------------------------------+------------------------+
  | myproject      | mydataset1    | labels      | ARRAY<STRUCT<STRING, STRING>>   | [STRUCT("org", "dev")] |
  | myproject      | mydataset2    | labels      | ARRAY<STRUCT<STRING, STRING>>   | [STRUCT("org", "dev")] |
  +----------------+---------------+-------------+---------------------------------+------------------------+
  

Next steps

Trang này có hữu ích không? Hãy cho chúng tôi biết đánh giá của bạn:

Gửi phản hồi về...

Bạn cần trợ giúp? Truy cập trang hỗ trợ của chúng tôi.