Creating and using clustered tables

This document describes how to create and use clustered tables in BigQuery. For an overview of clustered table support in BigQuery, see Introduction to clustered tables.

Limitations

Clustered tables in BigQuery are subject to the following limitations:

  • Currently, clustering is supported only for partitioned tables.
  • Only standard SQL is supported for querying clustered tables and for writing query results to clustered tables.
  • You can only specify clustering columns when a table is created.
  • After a clustered table is created, clustering columns cannot be modified.
  • Clustering columns must be top-level, non-repeated columns of one of the following types: INT64, STRING, DATE, TIMESTAMP, BOOL, NUMERIC, or GEOGRAPHY. For more information on data types, see Standard SQL data types.
  • You can specify up to four clustering columns.

Creating clustered tables

Currently, you can only cluster a partitioned table. This includes both ingestion-time partitioned tables and tables partitioned by a TIMESTAMP or DATE column.

You can create a clustered table in BigQuery:

Table naming

When you create a table in BigQuery, the table name must be unique per dataset. The table name can:

  • Contain up to 1,024 characters
  • Contain letters (upper or lower case), numbers, and underscores

Required permissions

At a minimum, to create a table, you must be granted the following permissions:

  • bigquery.tables.create permissions to create the table
  • bigquery.tables.updateData to write data to the table by using a load job, a query job, or a copy job
  • bigquery.jobs.create to run a query job, load job, or copy job that writes data to the table

Additional permissions such as bigquery.tables.getData may be required to access the data you're writing to the table.

The following predefined Cloud IAM roles include both bigquery.tables.create and bigquery.tables.updateData permissions:

  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

The following predefined Cloud IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to create and update tables in the dataset.

For more information on Cloud IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Creating an empty clustered table with a schema definition

You specify clustering columns when you create a table in BigQuery. After the table is created, you cannot modify the clustering columns. Currently, you can only specify clustering columns for a partitioned table.

Clustering columns must be top-level, non-repeated columns, and they must be one of the following simple data types: INTEGER, STRING, DATE, TIMESTAMP, BOOLEAN, NUMERIC, or GEOGRAPHY.

You can specify up to four clustering columns. When you specify multiple columns, the order of the columns determines how the data is sorted. For example, if the table is clustered by columns a, b and c, the data is sorted in the same order — first by column a, then by column b, and then by column c. As a best practice, the most frequently filtered or aggregated column should appear first.

The order of your clustering columns also affects query performance and pricing. For more information on query best practices for clustered tables, see Querying clustered tables.

To create an empty clustered table with a schema definition:

Console

  1. Open the BigQuery web UI in the GCP Console.
    Go to the GCP Console

  2. In the navigation panel, in the Resources section, expand your project and select a dataset.

  3. On the right side of the window, in the details panel, click Create table.

    Create table button

  4. On the Create table page, in the Source section, for Create table from, select Empty table.

    Create table from option

  5. In the Destination section:

    • For Dataset name, choose the appropriate dataset, and in the Table name field, enter the name of the table you're creating.
    • Verify that Table type is set to Native table.
  6. In the Schema section, enter the schema definition.

    • Enter schema information manually by:

      • Enabling Edit as text and entering the table schema as a JSON array.

      • Using Add field to manually input the schema.

  7. For Partition and cluster settings, click No partition, select Partition by field and choose the DATE or TIMESTAMP column. This option is not available if the schema does not contain a DATE or TIMESTAMP column.

    To create an ingestion-time partitioned table, click No partitioning and select Partition by ingestion time.

  8. (Optional) For Partitioning filter, click the Require partition filter box to require users to include a WHERE clause that specifies the partitions to query. Requiring a partition filter may reduce cost and improve performance. For more information, see Querying partitioned tables.

  9. For Clustering order, enter between one and four comma-separated column names.

  10. (Optional) Click Advanced options and for Encryption, click Customer-managed key to use a Cloud Key Management Service key. If you leave the Google-managed key setting, BigQuery encrypts the data at rest.

  11. Click Create table.

Classic UI

  1. Go to the classic BigQuery web UI.

    Go to the classic BigQuery web UI

  2. Click the down arrow icon down arrow icon next to your dataset name in the navigation and click Create new table.

  3. On the Create Table page, in the Source Data section, click Create empty table.

  4. On the Create Table page, in the Destination Table section:

    • For Table name, choose the appropriate dataset, and in the table name field, enter the name of the table you're creating.
    • Verify that Table type is set to Native table.
  5. In the Schema section, manually enter the schema definition.

    • You can enter schema information manually by:

      • Clicking Edit as text and entering the table schema as a JSON array.

      • Using Add Field to input the schema.

  6. In the Options section:

    • For Partitioning Type, click None and choose Day.
    • For Partitioning Field, choose one of the following:
      • Choose timestamp to create a table partitioned by a DATE or TIMESTAMP column.
      • Choose _PARTITIONTIME to create an ingestion-time partitioned table.
    • For Clustering columns, enter between one and four field names.
    • For Destination encryption, leave the Default option. This property is for customer-managed encryption keys. By default, BigQuery encrypts customer content stored at rest.

      Partitioned table details

  7. Click Create Table.

After the table is created, you can update the clustered table's table expiration, description, and labels. You cannot add a partition expiration after a table is created using the BigQuery web UI.

CLI

Use the mk command with the following flags:

  • --table (or the -t shortcut).
  • --schema &mdash: You can supply the table's schema definition inline or via a JSON schema file.
  • Either --time_partitioning_type (for ingestion-time partitioned tables) or --time_partitioning_field (for partitioned tables). Currently, DAY is the only supported value for --time_partitioning_type.
  • --clustering_fields to specify up to four clustering columns.

Optional parameters include --expiration, --description, --time_partitioning_expiration, --destination_kms_key, and --label.

If you are creating a table in a project other than your default project, add the project ID to the dataset in the following format: project_id:dataset.

--destination_kms_key is not demonstrated here. For more information on using this flag, see customer-managed encryption keys.

Enter the following command to create an empty clustered table with a schema definition:

bq mk \
--table \
--expiration integer1 \
--schema schema \
--time_partitioning_type=DAY \
--time_partitioning_field partition_column \
--clustering_fields cluster_columns \
--time_partitioning_expiration integer2 \
--description "description" \
--label key:value,key:value \
project_id:dataset.table

Where:

  • integer1 is the default lifetime (in seconds) for the table. The minimum value is 3600 seconds (one hour). The expiration time evaluates to the current UTC time plus the integer value. If you set the table's expiration time when you create a time-partitioned table, the dataset's default table expiration setting is ignored. Setting this value deletes the table and all partitions after the specified time.
  • schema is an inline schema definition in the format column:data_type,column:data_type or the path to the JSON schema file on your local machine.
  • partition_column is the name of the TIMESTAMP or DATE column used to create a partitioned table. If you create a partitioned table, you do not need to specify the --time_partitioning_type=DAY flag.
  • cluster_columns is a comma-separated list of up to four clustering columns.
  • integer2 is the default lifetime (in seconds) for the table's partitions. There is no minimum value. The expiration time evaluates to the partition's date plus the integer value. The partition expiration is independent of the table's expiration but does not override it. If you set a partition expiration that is longer than the table's expiration, the table expiration takes precedence.
  • description is a description of the table in quotes.
  • key:value is the key:value pair that represents a label. You can enter multiple labels using a comma-separated list.
  • project_id is your project ID.
  • dataset is a dataset in your project.
  • table is the name of the partitioned table you're creating.

When you specify the schema on the command line, you cannot include a RECORD (STRUCT) type, you cannot include a column description, and you cannot specify the column's mode. All modes default to NULLABLE. To include descriptions, modes, and RECORD types, supply a JSON schema file instead.

Examples:

Enter the following command to create a clustered table named myclusteredtable in mydataset in your default project. The table is a partitioned table (partitioned by a TIMESTAMP column). The partitioning expiration is set to 86,400 seconds (1 day), the table's expiration is set to 2,592,000 (1 30-day month), the description is set to This is my clustered table, and the label is set to organization:development. The command uses the -t shortcut instead of --table.

The schema is specified inline as: timestamp:timestamp,customer_id:string,transaction_amount:float. The specified clustering field customer_id is used to cluster the partitions.

bq mk -t \
--expiration 2592000 \
--schema 'timestamp:timestamp,customer_id:string,transaction_amount:float' \
--time_partitioning_field timestamp \
--clustering_fields customer_id \
--time_partitioning_expiration 86400  \
--description "This is my clustered table" \
--label org:dev \
mydataset.myclusteredtable

Enter the following command to create a clustered table named myclusteredtable in myotherproject, not your default project. The table is an ingestion-time partitioned table. The partitioning expiration is set to 259,200 seconds (3 days), the description is set to This is my partitioned table, and the label is set to organization:development. The command uses the -t shortcut instead of --table. This command does not specify a table expiration. If the dataset has a default table expiration, it is applied. If the dataset has no default table expiration, the table will never expire, but the partitions will expire in 3 days.

The schema is specified in a local JSON file: /tmp/myschema.json. The customer_id field is used to cluster the partitions.

bq mk -t \
--expiration 2592000 \
--schema /tmp/myschema.json \
--time_partitioning_type=DAY \
--clustering_fields=customer_id \
--time_partitioning_expiration 86400  \
--description "This is my partitioned table" \
--label org:dev \
myotherproject:mydataset.myclusteredtable

After the table is created, you can update the partitioned table's table expiration, partition expiration, description, and labels.

API

Call the tables.insert method with a defined table resource that specifies the timePartitioning property, the clustering.fields property, and the schema property.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .

// To run this sample, you will need to create (or reuse) a context and
// an instance of the bigquery client.  For example:
// import "cloud.google.com/go/bigquery"
// ctx := context.Background()
// client, err := bigquery.NewClient(ctx, "your-project-id")
sampleSchema := bigquery.Schema{
	{Name: "timestamp", Type: bigquery.TimestampFieldType},
	{Name: "origin", Type: bigquery.StringFieldType},
	{Name: "destination", Type: bigquery.StringFieldType},
	{Name: "amount", Type: bigquery.NumericFieldType},
}
metaData := &bigquery.TableMetadata{
	Schema: sampleSchema,
	TimePartitioning: &bigquery.TimePartitioning{
		Field:      "timestamp",
		Expiration: 90 * 24 * time.Hour,
	},
	Clustering: &bigquery.Clustering{
		Fields: []string{"origin", "destination"},
	},
}
tableRef := client.Dataset(datasetID).Table(tableID)
if err := tableRef.Create(ctx, metaData); err != nil {
	return err
}

Creating a clustered table from a query result

There are two ways to create a clustered table from a query result:

You can create a clustered table by querying either a partitioned table or a non-partitioned table. You cannot change an existing table to a clustered table using query results.

When you create a clustered table from a query result, you must use standard SQL. Currently, legacy SQL is not supported for querying clustered tables or for writing query results to clustered tables.

Console

You cannot specify clustering options for a destination table when you query data using the console BigQuery web UI unless you use a DDL statement. For more information, see Using Data Definition Language statements.

Classic UI

You cannot specify clustering options for a destination table when you query data using the classic BigQuery web UI unless you use a DDL statement. For more information, see Using Data Definition Language statements.

CLI

Enter the following command to create a new, clustered destination table from a query result:

bq --location=location query \
--use_legacy_sql=false 'query'

Where:

  • location is the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • query is a query in standard SQL syntax. Currently, you cannot use legacy SQL to query clustered tables or to write query results to clustered tables. The query can contain a CREATE TABLE DDL statement that specifies the options for creating your clustered table. You can use DDL rather than specifying the individual command-line flags.

Examples:

Enter the following command to write query results to a clustered destination table named myclusteredtable in mydataset. mydataset is in your default project. The query retrieves data from a non-partitioned table — mytable. The table's customer_id column is used to cluster the table. The table's timestamp column is used to create a partitioned table.

bq query --use_legacy_sql=false \
'CREATE TABLE
   mydataset.myclusteredtable
 PARTITION BY
   DATE(timestamp)
 CLUSTER BY
   customer_id AS
 SELECT
   *
 FROM
   `mydataset.mytable`'

API

To save query results to a clustered table, call the jobs.insert method, configure a query job, and include a CREATE TABLE DDL statement that creates your clustered table.

Specify your location in the location property in the jobReference section of the job resource.

Creating a clustered table when you load data

You can create a clustered table by specifying clustering columns when you load data into a new table. You do not need to create an empty table before loading data into it. You can create the clustered table and load your data at the same time.

For more information on loading data, see Introduction to loading data into BigQuery.

To define clustering when defining a load job:

API

To define clustering configuration when creating a table via a load job, you can populate the Clustering properties for the table.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .

// To run this sample, you will need to create (or reuse) a context and
// an instance of the bigquery client.  For example:
// import "cloud.google.com/go/bigquery"
// ctx := context.Background()
// client, err := bigquery.NewClient(ctx, "your-project-id")
gcsRef := bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/sample-transactions/transactions.csv")
gcsRef.SkipLeadingRows = 1
gcsRef.Schema = bigquery.Schema{
	{Name: "timestamp", Type: bigquery.TimestampFieldType},
	{Name: "origin", Type: bigquery.StringFieldType},
	{Name: "destination", Type: bigquery.StringFieldType},
	{Name: "amount", Type: bigquery.NumericFieldType},
}
loader := client.Dataset(destDatasetID).Table(destTableID).LoaderFrom(gcsRef)
loader.TimePartitioning = &bigquery.TimePartitioning{
	Field: "timestamp",
}
loader.Clustering = &bigquery.Clustering{
	Fields: []string{"origin", "destination"},
}
loader.WriteDisposition = bigquery.WriteEmpty

job, err := loader.Run(ctx)
if err != nil {
	return err
}
status, err := job.Wait(ctx)
if err != nil {
	return err
}

if status.Err() != nil {
	return fmt.Errorf("Job completed with error: %v", status.Err())
}

Controlling access to clustered tables

You cannot assign access controls directly to tables or views. The lowest level of BigQuery resources to which you are able to grant access is the dataset level. To configure access to tables and views, you grant a Cloud IAM role to an entity at the dataset level or higher.

Granting a role at the dataset-level specifies the operations an entity is allowed to perform on tables and views in that specific dataset. For information on configuring dataset-level access controls, see Controlling access to datasets.

You can also grant Cloud IAM roles at a higher level in the Google Cloud Platform resource hierarchy such as the project, folder, or organization level. Granting roles at a higher level gives the entity access to a broader set of resources. For example, granting a role to an entity at the project level gives that entity permissions that apply to all datasets throughout the project. For more information on granting access to resources, see Granting, changing, and revoking access to resources in the Cloud IAM documentation.

You can also create Cloud IAM custom roles. If you create a custom role, the permissions you grant depend on the specific operations you want the entity to be able to perform.

For more information on roles and permissions, see:

Using clustered tables

Getting information about clustered tables

You can get information about tables by:

  • Using the GCP Console or the classic BigQuery web UI
  • Using the bq show CLI command
  • Calling the tables.get API method
  • Querying the INFORMATION_SCHEMA views (beta)

Required permissions

At a minimum, to get information about tables, you must be granted bigquery.tables.get permissions. The following predefined Cloud IAM roles include bigquery.tables.get permissions:

  • bigquery.metadataViewer
  • bigquery.dataViewer
  • bigquery.dataOwner
  • bigquery.dataEditor
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to get information about tables in a dataset.

For more information on Cloud IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Getting clustered table information

To view information about a clustered table:

Console

  1. In the Resources pane, click your dataset name to expand it, then click the table name you want to view.

  2. Click Details. This page displays the table's details including the clustering columns.

    Table details

Classic UI

  1. In the navigation pane, down arrow icon down arrow icon to the left of your dataset to expand it, or double-click the dataset name. This displays the tables and views in the dataset.

  2. Click the table name.

  3. Click Details. The Table Details page displays the table's details including the clustering columns.

    Clustered table details

CLI

Issue the bq show command to display all table information. Use the --schema flag to display only table schema information. The --format flag can be used to control the output.

If you are getting information about a table in a project other than your default project, add the project ID to the dataset in the following format: project_id:dataset.

bq show \
--schema \
--format=prettyjson \
project_id:dataset.table

Where:

  • project_id is your project ID.
  • dataset is the name of the dataset.
  • table is the name of the table.

Examples:

Enter the following command to display all information about myclusteredtable in mydataset. mydataset is in your default project.

bq show --format=prettyjson mydataset.myclusteredtable

The output should look like the following:

{
  "clustering": {
    "fields": [
      "customer_id"
    ]
  },
...
}

API

Call the bigquery.tables.get method and provide any relevant parameters.

Getting clustered table information using INFORMATION_SCHEMA (beta)

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, tables, and views.

You can query the INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.TABLE_OPTIONS views to retrieve metadata about tables and views in a project. You can also query the INFORMATION_SCHEMA.COLUMNS and INFORMATION_SCHEMA.COLUMN_FIELD_PATHS views to retrieve metadata about the columns (fields) in a table.

For clustered tables, you can query the CLUSTERING_ORDINAL_POSITION column in the INFORMATION_SCHEMA.COLUMNS view to retrieve information about your clustering columns.

TABLES view

When you query the INFORMATION_SCHEMA.TABLES view, the query results contain one row for each table or view in a dataset.

Queries against the INFORMATION_SCHEMA.TABLES view must have a dataset qualifier. The user submitting the query must have access to the dataset that contains the tables or views.

The INFORMATION_SCHEMA.TABLES view has the following schema:

Column name Data type Value
TABLE_CATALOG STRING The name of the project that contains the dataset
TABLE_SCHEMA STRING The name of the dataset that contains the table or view also referred to as the datasetId
TABLE_NAME STRING The name of the table or view also referred to as the tableId
TABLE_TYPE STRING The table type:
IS_INSERTABLE_INTO STRING YES or NO depending on whether the table supports DML INSERT statements
IS_TYPED STRING The value is always NO
CREATION_TIME TIMESTAMP The table's creation time

For more information on dataset properties, see the dataset resource page in the REST API documentation. For more information on table and view properties, see the table resource page in the REST API documentation.

Examples

Example 1:

The following example retrieves all columns from the INFORMATION_SCHEMA.TABLES view except for is_typed which is reserved for future use. The metadata returned is for all tables in mydataset in your default project — myproject.

mydataset contains the following tables:

  • mytable1: a standard BigQuery table
  • myview1: a BigQuery view

Queries against the INFORMATION_SCHEMA.TABLES view must have a dataset qualifier. The user submitting the query must have access to the dataset that contains the tables.

To run the query against a project other than your default project, add the project ID to the dataset in the following format: `project_id`.dataset.INFORMATION_SCHEMA.view; for example, `myproject`.mydataset.INFORMATION_SCHEMA.TABLES.

To run the query:

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.

    SELECT
     * EXCEPT(is_typed)
    FROM
     mydataset.INFORMATION_SCHEMA.TABLES
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   * EXCEPT(is_typed)
 FROM
   mydataset.INFORMATION_SCHEMA.TABLES'

The results should look like the following:

  +----------------+---------------+----------------+------------+--------------------+---------------------+
  | table_catalog  | table_schema  |   table_name   | table_type | is_insertable_into |    creation_time    |
  +----------------+---------------+----------------+------------+--------------------+---------------------+
  | myproject      | mydataset     | mytable1       | BASE TABLE | YES                | 2018-10-29 20:34:44 |
  | myproject      | mydataset     | myview1        | VIEW       | NO                 | 2018-12-29 00:19:20 |
  +----------------+---------------+----------------+------------+--------------------+---------------------+
  

Example 2:

The following example retrieves all tables of type BASE TABLE from the INFORMATION_SCHEMA.TABLES view. The is_typed column is excluded. The metadata returned is for tables in mydataset in your default project — myproject.

Queries against the INFORMATION_SCHEMA.TABLES view must have a dataset qualifier. The user submitting the query must have access to the dataset that contains the tables.

To run the query against a project other than your default project, add the project ID to the dataset in the following format: `project_id`.dataset.INFORMATION_SCHEMA.view; for example, `myproject`.mydataset.INFORMATION_SCHEMA.TABLES.

To run the query:

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.

    SELECT
     * EXCEPT(is_typed)
    FROM
     mydataset.INFORMATION_SCHEMA.TABLES
    WHERE
     table_type="BASE TABLE"
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   * EXCEPT(is_typed)
 FROM
   mydataset.INFORMATION_SCHEMA.TABLES
 WHERE
   table_type="BASE TABLE"'

The results should look like the following:

  +----------------+---------------+----------------+------------+--------------------+---------------------+
  | table_catalog  | table_schema  |   table_name   | table_type | is_insertable_into |    creation_time    |
  +----------------+---------------+----------------+------------+--------------------+---------------------+
  | myproject      | mydataset     | mytable1       | BASE TABLE | NO                 | 2018-10-31 22:40:05 |
  +----------------+---------------+----------------+------------+--------------------+---------------------+
  

TABLE_OPTIONS view

When you query the INFORMATION_SCHEMA.TABLE_OPTIONS view, the query results contain one row for each table or view in a dataset.

Queries against the INFORMATION_SCHEMA.TABLE_OPTIONS view must have a dataset qualifier. The user submitting the query must have access to the dataset that contains the tables or views.

The INFORMATION_SCHEMA.TABLE_OPTIONS view has the following schema:

Column name Data type Value
TABLE_CATALOG STRING The name of the project that contains the dataset
TABLE_SCHEMA STRING The name of the dataset that contains the table or view also referred to as the datasetId
TABLE_NAME STRING The name of the table or view also referred to as the tableId
OPTION_NAME STRING One of the name values in the options table
OPTION_TYPE STRING One of the data type values in the options table
OPTION_VALUE STRING One of the value options in the options table
Options table
OPTION_NAME OPTION_TYPE OPTION_VALUE
partition_expiration_days FLOAT64 The default lifetime, in days, of all partitions in a partitioned table
expiration_timestamp FLOAT64 The default lifetime, in days, of the table
kms_key_name STRING The name of the Cloud KMS key used to encrypt the table
friendly_name STRING The table's descriptive name
description STRING A description of the table
labels ARRAY<STRUCT<STRING, STRING>> An array of STRUCT's that represent the labels on the table

For more information on dataset properties, see the dataset resource page in the REST API documentation. For more information on table and view properties, see the table resource page in the REST API documentation.

Examples

Example 1:

The following example retrieves the default table expiration times for all tables in mydataset in your default project (myproject) by querying the INFORMATION_SCHEMATA.TABLE_OPTIONS view.

Queries against the INFORMATION_SCHEMA.TABLE_OPTIONS view must have a dataset qualifier. The user submitting the query must have access to the dataset that contains the tables.

To run the query against a project other than your default project, add the project ID to the dataset in the following format: `project_id`.dataset.INFORMATION_SCHEMA.view; for example, `myproject`.mydataset.INFORMATION_SCHEMA.TABLE_OPTIONS.

To run the query:

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.

    SELECT
     *
    FROM
     mydataset.INFORMATION_SCHEMA.TABLE_OPTIONS
    WHERE
     option_name="expiration_timestamp"
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   *
 FROM
   mydataset.INFORMATION_SCHEMA.TABLE_OPTIONS
 WHERE
   option_name="expiration_timestamp"'

The results should look like the following:

  +----------------+---------------+------------+----------------------+-------------+--------------------------------------+
  | table_catalog  | table_schema  | table_name |     option_name      | option_type |             option_value             |
  +----------------+---------------+------------+----------------------+-------------+--------------------------------------+
  | myproject      | mydataset     | mytable1   | expiration_timestamp | TIMESTAMP   | TIMESTAMP "2020-01-16T21:12:28.000Z" |
  | myproject      | mydataset     | mytable2   | expiration_timestamp | TIMESTAMP   | TIMESTAMP "2021-01-01T21:12:28.000Z" |
  +----------------+---------------+------------+----------------------+-------------+--------------------------------------+
  

Example 2:

The following example retrieves metadata about all tables in mydataset that contain test data. The query uses the values in the description option to find tables that contain "test" anywhere in the description. mydataset is in your default project — myproject.

To run the query against a project other than your default project, add the project ID to the dataset in the following format: `project_id`.dataset.INFORMATION_SCHEMA.view; for example, `myproject`.mydataset.INFORMATION_SCHEMA.TABLE_OPTIONS.

To run the query:

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.

    SELECT
     *
    FROM
     mydataset.INFORMATION_SCHEMA.TABLE_OPTIONS
    WHERE
     option_name="description" AND option_value LIKE "%test%"
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   *
 FROM
   mydataset.INFORMATION_SCHEMA.TABLE_OPTIONS
 WHERE
   option_name="description" AND option_value LIKE "%test%"'

The results should look like the following:

  +----------------+---------------+------------+-------------+-------------+--------------+
  | table_catalog  | table_schema  | table_name | option_name | option_type | option_value |
  +----------------+---------------+------------+-------------+-------------+--------------+
  | myproject      | mydataset     | mytable1   | description | STRING      | "test data"  |
  | myproject      | mydataset     | mytable2   | description | STRING      | "test data"  |
  +----------------+---------------+------------+-------------+-------------+--------------+
  

COLUMNS view

When you query the INFORMATION_SCHEMA.COLUMNS view, the query results contain one row for each column (field) in a table.

Queries against the INFORMATION_SCHEMA.COLUMNS view must have a dataset qualifier. The user submitting the query must have access to the dataset that contains the tables.

The INFORMATION_SCHEMA.COLUMNS view has the following schema:

Column name Data type Value
TABLE_CATALOG STRING The name of the project that contains the dataset
TABLE_SCHEMA STRING The name of the dataset that contains the table also referred to as the datasetId
TABLE_NAME STRING The name of the table or view also referred to as the tableId
COLUMN_NAME STRING The name of the column
ORDINAL_POSITION INT64 The 1-indexed offset of the column within the table; if it's a pseudo column such as _PARTITIONTIME or _PARTITIONDATE, the value is NULL
IS_NULLABLE STRING YES or NO depending on whether the column's mode allows NULL values
DATA_TYPE STRING The column's standard SQL data type
IS_GENERATED STRING The value is always NEVER
GENERATION_EXPRESSION STRING The value is always NULL
IS_STORED STRING The value is always NULL
IS_HIDDEN STRING YES or NO depending on whether the column is a pseudo column such as _PARTITIONTIME or _PARTITIONDATE
IS_UPDATABLE STRING The value is always NULL
IS_SYSTEM_DEFINED STRING YES or NO depending on whether the column is a pseudo column such as _PARTITIONTIME or _PARTITIONDATE
IS_PARTITIONING_COLUMN STRING YES or NO depending on whether the column is a partitioning column
CLUSTERING_ORDINAL_POSITION STRING The 1-indexed offset of the column within the table's clustering columns; the value is NULL if the table is not a clustered table

For more information on dataset properties, see the dataset resource page in the REST API documentation. For more information on table and view properties, see the table resource page in the REST API documentation.

Examples

The following example retrieves metadata from the INFORMATION_SCHEMA.COLUMNS view for the population_by_zip_2010 table in the census_bureau_usa dataset. This dataset is part of the BigQuery public dataset program.

Because the table you're querying is in another project, the bigquery-public-data project, you add the project ID to the dataset in the following format: `project_id`.dataset.INFORMATION_SCHEMA.view; for example, `bigquery-public-data`.census_bureau_usa.INFORMATION_SCHEMA.TABLES.

The following columns are excluded from the query results because they are currently reserved for future use:

  • IS_GENERATED
  • GENERATION_EXPRESSION
  • IS_STORED
  • IS_UPDATABLE

Queries against the INFORMATION_SCHEMA.COLUMNS view must have a dataset qualifier. The user submitting the query must have access to the dataset that contains the tables.

To run the query:

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.

    SELECT
     * EXCEPT(is_generated, generation_expression, is_stored, is_updatable)
    FROM
     `bigquery-public-data`.census_bureau_usa.INFORMATION_SCHEMA.COLUMNS
    WHERE
     table_name="population_by_zip_2010"
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   * EXCEPT(is_generated, generation_expression, is_stored, is_updatable)
 FROM
   `bigquery-public-data`.census_bureau_usa.INFORMATION_SCHEMA.COLUMNS
 WHERE
   table_name="population_by_zip_2010"'

The results should look like the following. For readability, table_catalog and table_schema are excluded from the results:

+------------------------+-------------+------------------+-------------+-----------+-----------+-------------------+------------------------+-----------------------------+
|       table_name       | column_name | ordinal_position | is_nullable | data_type | is_hidden | is_system_defined | is_partitioning_column | clustering_ordinal_position |
+------------------------+-------------+------------------+-------------+-----------+-----------+-------------------+------------------------+-----------------------------+
| population_by_zip_2010 | zipcode     |                1 | NO          | STRING    | NO        | NO                | NO                     |                        NULL |
| population_by_zip_2010 | geo_id      |                2 | YES         | STRING    | NO        | NO                | NO                     |                        NULL |
| population_by_zip_2010 | minimum_age |                3 | YES         | INT64     | NO        | NO                | NO                     |                        NULL |
| population_by_zip_2010 | maximum_age |                4 | YES         | INT64     | NO        | NO                | NO                     |                        NULL |
| population_by_zip_2010 | gender      |                5 | YES         | STRING    | NO        | NO                | NO                     |                        NULL |
| population_by_zip_2010 | population  |                6 | YES         | INT64     | NO        | NO                | NO                     |                        NULL |
+------------------------+-------------+------------------+-------------+-----------+-----------+-------------------+------------------------+-----------------------------+
  

COLUMN_FIELD_PATHS view

When you query the INFORMATION_SCHEMA.COLUMN_FIELD_PATHS view, the query results contain one row for each column nested within a RECORD (or STRUCT) column.

Queries against the INFORMATION_SCHEMA.COLUMN_FIELD_PATHS view must have a dataset qualifier. The user submitting the query must have access to the dataset that contains the tables.

The INFORMATION_SCHEMA.COLUMN_FIELD_PATHS view has the following schema:

Column name Data type Value
TABLE_CATALOG >STRING The name of the project that contains the dataset
TABLE_SCHEMA STRING The name of the dataset that contains the table also referred to as the datasetId
TABLE_NAME STRING The name of the table or view also referred to as the tableId
COLUMN_NAME STRING The name of the column
FIELD_PATH STRING The path to a column nested within a `RECORD` or `STRUCT` column
DATA_TYPE STRING The column's standard SQL data type
DESCRIPTION STRING The column's description

For more information on dataset properties, see the dataset resource page in the REST API documentation. For more information on table and view properties, see the table resource page in the REST API documentation.

Examples

The following example retrieves metadata from the INFORMATION_SCHEMA.COLUMN_FIELD_PATHS view for the commits table in the github_repos dataset. This dataset is part of the BigQuery public dataset program.

Because the table you're querying is in another project, the bigquery-public-data project, you add the project ID to the dataset in the following format: `project_id`.dataset.INFORMATION_SCHEMA.view; for example, `bigquery-public-data`.github_repos.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS.

The commits table contains the following nested and nested and repeated columns:

  • author: nested RECORD column
  • committer: nested RECORD column
  • trailer: nested and repeated RECORD column
  • difference: nested and repeated RECORD column

Your query will retrieve metadata about the author and difference columns.

Queries against the INFORMATION_SCHEMA.COLUMN_FIELD_PATHS view must have a dataset qualifier. The user submitting the query must have access to the dataset that contains the tables.

To run the query:

Console

  1. Open the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the GCP Console.

    SELECT
     *
    FROM
     `bigquery-public-data`.github_repos.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
    WHERE
     table_name="commits"
     AND column_name="author"
     OR column_name="difference"
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   *
 FROM
   `bigquery-public-data`.github_repos.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
 WHERE
   table_name="commits"
   AND column_name="author"
   OR column_name="difference"'

The results should look like the following. For readability, table_catalog and table_schema are excluded from the results.

  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
  | table_name | column_name |     field_path      |                                                                      data_type                                                                      | description |
  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
  | commits    | author      | author              | STRUCT<name STRING, email STRING, time_sec INT64, tz_offset INT64, date TIMESTAMP>                                                                  | NULL        |
  | commits    | author      | author.name         | STRING                                                                                                                                              | NULL        |
  | commits    | author      | author.email        | STRING                                                                                                                                              | NULL        |
  | commits    | author      | author.time_sec     | INT64                                                                                                                                               | NULL        |
  | commits    | author      | author.tz_offset    | INT64                                                                                                                                               | NULL        |
  | commits    | author      | author.date         | TIMESTAMP                                                                                                                                           | NULL        |
  | commits    | difference  | difference          | ARRAY<STRUCT<old_mode INT64, new_mode INT64, old_path STRING, new_path STRING, old_sha1 STRING, new_sha1 STRING, old_repo STRING, new_repo STRING>> | NULL        |
  | commits    | difference  | difference.old_mode | INT64                                                                                                                                               | NULL        |
  | commits    | difference  | difference.new_mode | INT64                                                                                                                                               | NULL        |
  | commits    | difference  | difference.old_path | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_path | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.old_sha1 | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_sha1 | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.old_repo | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_repo | STRING                                                                                                                                              | NULL        |
  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
  

Listing clustered tables in a dataset

You can list clustered tables in datasets by:

  • Using the GCP Console or the classic BigQuery web UI
  • Using the bq ls CLI command
  • Calling the tables.list API method
  • Using the client libraries

The permissions required to list clustered tables, and the steps to list them are the same as for partitioned tables. For more information on listing tables, see Listing partitioned tables in a dataset.

Features under development

The following features are being developed but are not currently available in the alpha release:

  • Support for clustering native (non-partitioned) tables.
  • Reducing the costs for particular types of queries that use filters on clustering columns.

Next steps

Oliko tästä sivusta apua? Kerro mielipiteesi

Palautteen aihe:

Tämä sivu
BigQuery
Tarvitsetko apua? Siirry tukisivullemme.