Getting information about datasets
This document describes how to get information or metadata about datasets in BigQuery.
You can get dataset information in the following ways:
- Using the Cloud console.
- Using the
bq show
command in thebq
command-line tool. - Calling the
datasets.get
API method. - Querying the
INFORMATION_SCHEMA
views (beta). - Using the client libraries.
Before you begin
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
Required permissions
To get information or metadata about a dataset, you need the bigquery.datasets.get
IAM permission.
Each of the following predefined IAM roles includes the permissions that you need in order to get information or metadata about a dataset:
roles/bigquery.user
roles/bigquery.metadataViewer
roles/bigquery.dataViewer
roles/bigquery.dataOwner
roles/bigquery.dataEditor
roles/bigquery.admin
For more information on IAM roles and permissions in BigQuery, see Access control.
Getting dataset information
To get information about datasets in a project:
Console
In the Explorer panel, expand your project and select a dataset.
Expand the
Actions option and click Open. The description and details appear in the details panel. The tables for a dataset are listed with the dataset name in the Explorer panel.
By default, anonymous datasets
are hidden from the Cloud console. To show information about
anonymous datasets, use the bq
command-line tool or the API.
bq
Issue the bq show
command. The --format
flag can be used to control the
output. If you are getting information about a dataset in a project other
than your default project, add the project ID to the dataset name in the
following format: project_id:dataset
.
To show information about an anonymous dataset,
use the bq ls --all
command to list all datasets and then use the name of the anonymous dataset
in the bq show
command.
bq show --format=prettyjson project_id:dataset
Replace the following:
- project_id is the name of your project.
- dataset is the name of the dataset.
Examples:
Enter the following command to display information about mydataset
in your
default project.
bq show --format=prettyjson mydataset
Enter the following command to display information about mydataset
in
myotherproject
.
bq show --format=prettyjson myotherproject:mydataset
Enter the following command to display information about an anonymous dataset,
_1234abcd56efgh78ijkl1234
in your default project.
bq show --format=prettyjson _1234abcd56efgh78ijkl1234
API
Call the datasets.get
API method and provide any relevant parameters.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
INFORMATION_SCHEMA
INFORMATION_SCHEMA
is a series of views that provide access to metadata
about datasets, routines, tables, views, jobs, reservations, and streaming data.
SCHEMATA
view
When you query the INFORMATION_SCHEMA.SCHEMATA
view, the query results contain
one row for each dataset in a project to which the current user has access.
The INFORMATION_SCHEMA.SCHEMATA
view has the following schema:
Column name | Data type | Value |
---|---|---|
CATALOG_NAME |
STRING |
The name of the project that contains the dataset |
SCHEMA_NAME |
STRING |
The dataset's name also referred to as the datasetId |
SCHEMA_OWNER |
STRING |
The value is always NULL |
CREATION_TIME |
TIMESTAMP |
The dataset's creation time |
LAST_MODIFIED_TIME |
TIMESTAMP |
The dataset's last modified time |
LOCATION |
STRING |
The dataset's geographic location |
DDL |
STRING |
The CREATE SCHEMA
DDL statement that can be used to create the dataset |
DEFAULT_COLLATION_NAME |
STRING |
The name of the default collation specification
if it exists; otherwise, NULL .
|
Examples
The following example retrieves all columns from the
INFORMATION_SCHEMA.SCHEMATA
view except for schema_owner
which is reserved
for future use. The metadata returned is for all datasets in the default project
— myproject
.
To run the query against a project other than your default project, add the
project ID to the dataset in the following format:
`project_id`.INFORMATION_SCHEMA.view
for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA
.
To run the query:
Console
Open the BigQuery page in the Cloud console.
Enter the following standard SQL query in the Query editor box.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud console.SELECT * EXCEPT(schema_owner) FROM INFORMATION_SCHEMA.SCHEMATA
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT * EXCEPT(schema_owner) FROM INFORMATION_SCHEMA.SCHEMATA'
The results should look like the following. For readability, some columns are excluded from the results.
+----------------+---------------+---------------------+---------------------+------------+------------------------------------------+ | catalog_name | schema_name | creation_time | last_modified_time | location | ddl | +----------------+---------------+---------------------+---------------------+------------+------------------------------------------+ | myproject | mydataset1 | 2018-11-07 19:50:24 | 2018-11-07 19:50:24 | US | CREATE SCHEMA `myproject.mydataset1` | | | | | | | OPTIONS( | | | | | | | location="us" | | | | | | | ); | +----------------+---------------+---------------------+---------------------+------------+------------------------------------------+ | myproject | mydataset2 | 2018-07-16 04:24:22 | 2018-07-16 04:24:22 | US | CREATE SCHEMA `myproject.mydataset2` | | | | | | | OPTIONS( | | | | | | | default_partition_expiration_days=3.0, | | | | | | | location="us" | | | | | | | ); | +----------------+---------------+---------------------+---------------------+------------+------------------------------------------+ | myproject | mydataset3 | 2018-02-07 21:08:45 | 2018-05-01 23:32:53 | US | CREATE SCHEMA `myproject.mydataset3` | | | | | | | OPTIONS( | | | | | | | description="My dataset", | | | | | | | location="us" | | | | | | | ); | +----------------+---------------+---------------------+---------------------+------------+------------------------------------------+
SCHEMATA_OPTIONS
view
When you query the INFORMATION_SCHEMA.SCHEMATA_OPTIONS
view, the query results
contain one row for each option of each dataset in a project.
The INFORMATION_SCHEMA.SCHEMATA_OPTIONS
view has the following schema:
Column name | Data type | Value |
---|---|---|
CATALOG_NAME |
STRING |
The name of the project that contains the dataset |
SCHEMA_NAME |
STRING |
The dataset's name also referred to as the datasetId |
OPTION_NAME |
STRING |
One of the name values in the options table |
OPTION_TYPE |
STRING |
One of the data type values in the options table |
OPTION_VALUE |
STRING |
One of the value options in the options table |
Options table
OPTION_NAME |
OPTION_TYPE |
OPTION_VALUE |
---|---|---|
default_table_expiration_days |
FLOAT64 |
The default lifetime, in days, of all tables in the dataset |
friendly_name |
STRING |
The dataset's descriptive name |
description |
STRING |
A description of the dataset |
labels |
ARRAY<STRUCT<STRING, STRING>> |
An array of STRUCT 's that represent the labels on the
dataset |
Examples
Example 1:
The following example retrieves the default table expiration times for all
datasets in your default project (myproject
) by querying the
INFORMATION_SCHEMA.SCHEMATA_OPTIONS
view.
To run the query against a project other than your default project, add the
project ID to the dataset in the following format:
`project_id`.INFORMATION_SCHEMA.view
for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS
.
To run the query:
Console
Open the BigQuery page in the Cloud console.
Enter the following standard SQL query in the Query editor box.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud console.SELECT * FROM INFORMATION_SCHEMA.SCHEMATA_OPTIONS WHERE option_name="default_table_expiration_days"
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT * FROM INFORMATION_SCHEMA.SCHEMATA_OPTIONS WHERE option_name="default_table_expiration_days"'
The results should look like the following:
+----------------+---------------+-------------------------------+-------------+---------------------+ | catalog_name | schema_name | option_name | option_type | option_value | +----------------+---------------+-------------------------------+-------------+---------------------+ | myproject | mydataset3 | default_table_expiration_days | FLOAT64 | 0.08333333333333333 | | myproject | mydataset2 | default_table_expiration_days | FLOAT64 | 90.0 | | myproject | mydataset1 | default_table_expiration_days | FLOAT64 | 30.0 | +----------------+---------------+-------------------------------+-------------+---------------------+
Example 2:
The following example retrieves the labels for all
datasets in your default project (myproject
) by querying the
INFORMATION_SCHEMA.SCHEMATA_OPTIONS
view.
To run the query against a project other than your default project, add the
project ID to the dataset in the following format:
`project_id`.INFORMATION_SCHEMA.view
;
for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS
.
To run the query:
Console
Open the BigQuery page in the Cloud console.
Enter the following standard SQL query in the Query editor box.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud console.SELECT * FROM INFORMATION_SCHEMA.SCHEMATA_OPTIONS WHERE option_name="labels"
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT * FROM INFORMATION_SCHEMA.SCHEMATA_OPTIONS WHERE option_name="labels"'
The results should look like the following:
+----------------+---------------+-------------+---------------------------------+------------------------+ | catalog_name | schema_name | option_name | option_type | option_value | +----------------+---------------+-------------+---------------------------------+------------------------+ | myproject | mydataset1 | labels | ARRAY<STRUCT<STRING, STRING>> | [STRUCT("org", "dev")] | | myproject | mydataset2 | labels | ARRAY<STRUCT<STRING, STRING>> | [STRUCT("org", "dev")] | +----------------+---------------+-------------+---------------------------------+------------------------+
Dataset security
To control access to datasets in BigQuery, see Controlling access to datasets. For information about data encryption, see Encryption at rest.
Next steps
- For more information on creating datasets, see Creating datasets.
- For more information on assigning access controls to datasets, see Controlling access to datasets.
- For more information on listing datasets in a project, see Listing datasets.
- For more information on changing dataset properties, see Updating datasets.
- For more information on creating and managing labels, see Creating and managing labels.
- To see an overview of
INFORMATION_SCHEMA
, go to Introduction to BigQueryINFORMATION_SCHEMA
.