Getting dataset metadata using INFORMATION_SCHEMA

INFORMATION_SCHEMA contains these views for dataset metadata:

  • SCHEMATA
  • SCHEMATA_OPTIONS

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Required permissions

To get SCHEMATA and SCHEMATA_OPTIONS views for dataset metadata, you need the bigquery.datasets.get IAM permission at the project level.

Each of the following predefined IAM roles includes the permissions that you need in order to get SCHEMATA and SCHEMATA_OPTIONS views:

  • roles/bigquery.admin
  • roles/bigquery.dataEditor
  • roles/bigquery.dataOwner
  • roles/bigquery.dataViewer

For more information about granular BigQuery permissions, see roles and permissions.

Syntax

Queries against any of these views should include a region qualifier.

If you do not specify a regional qualifier, metadata is retrieved from the us multi-region.

-- Returns metadata for datasets in a region.
SELECT * FROM region-us.INFORMATION_SCHEMA.SCHEMATA;

SCHEMATA view

When you query the INFORMATION_SCHEMA.SCHEMATA view, the query results contain one row for each dataset in a project to which the current user has access.

The INFORMATION_SCHEMA.SCHEMATA view has the following schema:

Column name Data type Value
CATALOG_NAME STRING The name of the project that contains the dataset
SCHEMA_NAME STRING The dataset's name also referred to as the datasetId
SCHEMA_OWNER STRING The value is always NULL
CREATION_TIME TIMESTAMP The dataset's creation time
LAST_MODIFIED_TIME TIMESTAMP The dataset's last modified time
LOCATION STRING The dataset's geographic location

Examples

The following example retrieves all columns from the INFORMATION_SCHEMA.SCHEMATA view except for schema_owner which is reserved for future use. The metadata returned is for all datasets in the default project — myproject.

To run the query against a project other than your default project, add the project ID to the dataset in the following format: `project_id`.INFORMATION_SCHEMA.view for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA.

To run the query:

Console

  1. Open the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     * EXCEPT(schema_owner)
    FROM
     INFORMATION_SCHEMA.SCHEMATA
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   * EXCEPT(schema_owner)
 FROM
   INFORMATION_SCHEMA.SCHEMATA'

The results should look like the following:

  +----------------+---------------+---------------------+---------------------+------------+
  |  catalog_name  |  schema_name  |    creation_time    | last_modified_time  |  location  |
  +----------------+---------------+---------------------+---------------------+------------+
  | myproject      | mydataset1    | 2018-11-07 19:50:24 | 2018-11-07 19:50:24 | US         |
  | myproject      | mydataset2    | 2018-07-16 04:24:22 | 2018-07-16 04:24:22 | US         |
  | myproject      | mydataset3    | 2018-02-07 21:08:45 | 2018-05-01 23:32:53 | US         |
  +----------------+---------------+---------------------+---------------------+------------+
  

SCHEMATA_OPTIONS view

When you query the INFORMATION_SCHEMA.SCHEMATA_OPTIONS view, the query results contain one row for each option of each dataset in a project.

The INFORMATION_SCHEMA.SCHEMATA_OPTIONS view has the following schema:

Column name Data type Value
CATALOG_NAME STRING The name of the project that contains the dataset
SCHEMA_NAME STRING The dataset's name also referred to as the datasetId
OPTION_NAME STRING One of the name values in the options table
OPTION_TYPE STRING One of the data type values in the options table
OPTION_VALUE STRING One of the value options in the options table

Options table
OPTION_NAME OPTION_TYPE OPTION_VALUE
default_table_expiration_days FLOAT64 The default lifetime, in days, of all tables in the dataset
friendly_name STRING The dataset's descriptive name
description STRING A description of the dataset
labels ARRAY<STRUCT<STRING, STRING>> An array of STRUCT's that represent the labels on the dataset

Examples

Example 1:

The following example retrieves the default table expiration times for all datasets in your default project (myproject) by querying the INFORMATION_SCHEMA.SCHEMATA_OPTIONS view.

To run the query against a project other than your default project, add the project ID to the dataset in the following format: `project_id`.INFORMATION_SCHEMA.view for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS.

To run the query:

Console

  1. Open the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     *
    FROM
     INFORMATION_SCHEMA.SCHEMATA_OPTIONS
    WHERE
     option_name="default_table_expiration_days"
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   *
 FROM
   INFORMATION_SCHEMA.SCHEMATA_OPTIONS
 WHERE
   option_name="default_table_expiration_days"'

The results should look like the following:

  +----------------+---------------+-------------------------------+-------------+---------------------+
  |  catalog_name  |  schema_name  |          option_name          | option_type |    option_value     |
  +----------------+---------------+-------------------------------+-------------+---------------------+
  | myproject      | mydataset3    | default_table_expiration_days | FLOAT64     | 0.08333333333333333 |
  | myproject      | mydataset2    | default_table_expiration_days | FLOAT64     | 90.0                |
  | myproject      | mydataset1    | default_table_expiration_days | FLOAT64     | 30.0                |
  +----------------+---------------+-------------------------------+-------------+---------------------+
  

Example 2:

The following example retrieves the labels for all datasets in your default project (myproject) by querying the INFORMATION_SCHEMA.SCHEMATA_OPTIONS view.

To run the query against a project other than your default project, add the project ID to the dataset in the following format: `project_id`.INFORMATION_SCHEMA.view; for example, `myproject`.INFORMATION_SCHEMA.SCHEMATA_OPTIONS.

To run the query:

Console

  1. Open the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     *
    FROM
     INFORMATION_SCHEMA.SCHEMATA_OPTIONS
    WHERE
     option_name="labels"
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   *
 FROM
   INFORMATION_SCHEMA.SCHEMATA_OPTIONS
 WHERE
   option_name="labels"'

The results should look like the following:

  +----------------+---------------+-------------+---------------------------------+------------------------+
  |  catalog_name  |  schema_name  | option_name |          option_type            |      option_value      |
  +----------------+---------------+-------------+---------------------------------+------------------------+
  | myproject      | mydataset1    | labels      | ARRAY<STRUCT<STRING, STRING>>   | [STRUCT("org", "dev")] |
  | myproject      | mydataset2    | labels      | ARRAY<STRUCT<STRING, STRING>>   | [STRUCT("org", "dev")] |
  +----------------+---------------+-------------+---------------------------------+------------------------+