Creating and using time-unit column-partitioned tables

This document describes how to create and use tables partitioned by a DATE, TIMESTAMP, or DATETIME column. For information on ingestion-time partitioned tables, see Creating and using ingestion-time partitioned tables. For information on integer range partitioned tables, see Creating and using integer range partitioned tables.

After creating a partitioned table, you can:

  • Control access to your table data
  • Get information about your partitioned tables
  • List the partitioned tables in a dataset
  • Get partitioned table metadata using meta-tables

For more information on managing partitioned tables including updating partitioned table properties, copying a partitioned table, and deleting a partitioned table, see Managing partitioned tables.

Limitations

Partitioned tables are subject to the following limitations:

  • The partitioning column must be either a scalar DATE, TIMESTAMP, or DATETIME column. While the mode of the column can be REQUIRED or NULLABLE, it cannot be REPEATED (array-based).
  • The partitioning column must be a top-level field. You cannot use a leaf field from a RECORD (STRUCT) as the partitioning column.
  • You cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.

Hourly, monthly, and yearly partitioned tables are subject to further limitations:

  • The bq partition command in the bq command-line tool is unsupported.

Creating partitioned tables

You can create a partitioned table in the following ways:

  • By using the Cloud Console.
  • By using a DDL CREATE TABLE statement with a PARTITION BY clause containing a partition expression.
  • By using the bq command-line tool's bq mk command.
  • Programmatically by calling the tables.insert API method.
  • From query results.
  • When you load data.

Table naming

When you create a table in BigQuery, the table name must be unique per dataset. The table name can:

  • Contain up to 1,024 characters.
  • Contain Unicode characters in category L (letter), M (mark), N (number), Pc (connector, including underscore), Pd (dash), Zs (space). For more information, see General Category.

For example, the following are all valid table names: table-01, ग्राहक, 00_お客様, étudiant.

Required permissions

At a minimum, to create a table, you must be granted the following permissions:

  • bigquery.tables.create permissions to create the table
  • bigquery.tables.updateData to write data to the table by using a load job, a query job, or a copy job
  • bigquery.jobs.create to run a query job, load job, or copy job that writes data to the table

Additional permissions such as bigquery.tables.getData might be required to access the data you're writing to the table.

The following predefined IAM roles include both bigquery.tables.create and bigquery.tables.updateData permissions:

  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

The following predefined IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to create and update tables in the dataset.

For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Daily partitioning versus hourly, monthly, or yearly partitioning

When using a TIMESTAMP, DATETIME, or DATE column to partition data, you can create partitions with either daily, hourly, monthly, or yearly granularity, depending on your data and needs.

Daily partitioning is the default partitioning type and, when used with clustering, serves the majority of BigQuery use cases. In particular, daily partitioning is the better choice when your data is spread out over a wide range of dates, or if data is continuously added over time. If your data spans a wide range of dates, daily partitioning allows you to remain under your table's partition limits.

Choose hourly partitioning instead if your tables have a high volume of data that spans a short date range (typically less than six months of timestamp values). With hourly partitioning, you can address data at hour-level granularity; for example, when appending, truncating, or deleting data from a particular partition.

Choose monthly or yearly partitioning if your tables have a relatively small amount of data for each day, but span a wide date range. This partitioning option is also recommended if your workflow requires frequently updating or adding rows that span a wide date range (for example, more than 500 dates). Use monthly or yearly partitioning along with clustering on your timestamp, date or datetime partitioning column to achieve the best performance in these scenarios. See Time-unit partitioning with clustering for more details and examples.

Creating an empty partitioned table with a schema definition

You cannot create an empty partitioned table that does not have a schema definition. The schema is required in order to identify the column used to create the partitions.

When you create an empty partitioned table with a schema definition, you can do the following:

  • Provide the schema inline using the bq command-line tool.
  • Specify a JSON schema file using the bq command-line tool.
  • Provide the schema in a table resource when calling the API's tables.insert method.

For more information about specifying a table schema, see Specifying a schema.

After the partitioned table is created, you can do the following:

  • Load data into it.
  • Write query results to it.
  • Copy data into it.

To create an empty partitioned table with a schema definition:

Console

  1. In the navigation panel, in the Resources section, expand your project and select your dataset.

  2. On the right side of the window, in the details panel, click Create table.

  3. In the Create table panel, in the Source section:

    • For Create table from, select Empty table.
  4. In the Destination section:

    • For Dataset name, choose the appropriate dataset, and in the Table name field, enter the name of the table you're creating.
    • Verify that Table type is set to Native table.
  5. In the Schema section, enter the schema definition.

    • Enter schema information manually by:

      • Enabling Edit as text and entering the table schema as a JSON array.

      • Using Add field to manually input the schema.

  6. For Partition and cluster settings, click No partition, select Partition by field and choose the DATE, TIMESTAMP, or DATETIME column. This option is not available if the schema does not contain a DATE, TIMESTAMP, or DATETIME column.

  7. (Optional) For Partitioning filter, click the Require partition filter box to require users to include a WHERE clause that specifies the partitions to query. Requiring a partition filter might reduce cost and improve performance. For more information, see Querying partitioned tables.

  8. (Optional) Click Advanced options and for Encryption, click Customer-managed key to use a Cloud Key Management Service key. If you leave the Google-managed key setting, BigQuery encrypts the data at rest.

  9. Click Create table.

SQL

Data definition language (DDL) statements allow you to create and modify tables and views using standard SQL query syntax.

See more on Using Data Definition Language statements.

To create a partitioned table by using a DDL statement in the Cloud Console:

  1. Open the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. Click Compose new query.

  3. Type your CREATE TABLE DDL statement into the Query editor text area.

    The following query creates a table named newtable that is daily partitioned by the transaction_date DATE column and has a partition expiration of three days.

     CREATE TABLE
       mydataset.newtable (transaction_id INT64,
         transaction_date DATE)
     PARTITION BY
       transaction_date
     OPTIONS
       ( partition_expiration_days=3,
         description="a table partitioned by transaction_date" )

    The following query creates the same table, but is hourly partitioned instead. Note the use of TIMESTAMP_TRUNC to delineate the timestamp at the hour mark:

     CREATE TABLE
       mydataset.newtable (transaction_id INT64,
         transaction_ts TIMESTAMP)
     PARTITION BY
       TIMESTAMP_TRUNC(transaction_ts, HOUR)
     OPTIONS
       ( partition_expiration_days=3,
         description="a table partitioned by transaction_ts" )

    The following query creates the same table, but is monthly partitioned instead. Note the use of TIMESTAMP_TRUNC to delineate the timestamp at the month mark:

     CREATE TABLE
       mydataset.newtable (transaction_id INT64,
         transaction_ts TIMESTAMP)
     PARTITION BY
       TIMESTAMP_TRUNC(transaction_ts, MONTH)
     OPTIONS
       ( partition_expiration_days=3,
         description="a table partitioned by transaction_ts" )

    The following query creates the same table, but is yearly partitioned instead. Note the use of TIMESTAMP_TRUNC to delineate the timestamp at the year mark:

     CREATE TABLE
       mydataset.newtable (transaction_id INT64,
         transaction_ts TIMESTAMP)
     PARTITION BY
       TIMESTAMP_TRUNC(transaction_ts, YEAR)
     OPTIONS
       ( partition_expiration_days=3,
         description="a table partitioned by transaction_ts" )

  4. Click Run. When the query completes, the table will appear in the Resources pane.

bq

Use the bq mk command with the --table flag (or -t shortcut), the --schema flag, and the --time_partitioning_field flag. You can supply table's schema definition inline or via a JSON schema file.

Optional parameters include --expiration, --description, --time_partitioning_expiration, --destination_kms_key, --require_partition_filter, --time_partitioning_type, and --label.

If you are creating a table in a project other than your default project, add the project ID to the dataset in the following format: project_id:dataset.

--destination_kms_key is not demonstrated here. For more information on using this flag, see Protecting data with Cloud Key Management Service keys.

Enter the following command to create an empty partitioned table with a schema definition:

bq mk --table \
--expiration integer1 \
--schema schema \
--time_partitioning_field column \
--time_partitioning_type unit_time \
--time_partitioning_expiration integer2 \
--[no]require_partition_filter \
--description "description" \
--label key:value, key:value \
project_id:dataset.table

Replace the following:

  • integer1 is the default lifetime (in seconds) for the table. The minimum value is 3600 seconds (one hour). The expiration time evaluates to the current UTC time plus the integer value. If you set the table's expiration time when you create a time-unit partitioned table, the dataset's default table expiration setting is ignored. Setting this value deletes the table and all partitions after the specified time.
  • schema is an inline schema definition in the format field:data_type, field:data_type or the path to the JSON schema file on your local machine.
  • column is the name of the TIMESTAMP, DATETIME, or DATE column used to create the partitions.
  • unit_time is either DAY, HOUR, MONTH, or YEAR, based on the desired time-unit partitioning granularity. This defaults to DAY if time_partitioning_type is unspecified.
  • integer2 is the default lifetime (in seconds) for the table's partitions. There is no minimum value. The expiration time evaluates to the partition's date plus the integer value. The partition expiration is independent of the table's expiration but does not override it. If you set a partition expiration that is longer than the table's expiration, the table expiration takes precedence.
  • description is a description of the table in quotes.
  • key:value is the key:value pair that represents a label. You can enter multiple labels using a comma-separated list.
  • project_id is your project ID.
  • dataset is a dataset in your project.
  • table is the name of the partitioned table you're creating.

When you specify the schema using the bq command-line tool, you cannot include a RECORD (STRUCT) type, you cannot include a column description, and you cannot specify the column's mode. All modes default to NULLABLE. To include descriptions, modes, and RECORD types, supply a JSON schema file instead.

Examples:

Enter the following command to create an hourly partitioned table named mypartitionedtable in mydataset in your default project. The partitioning expiration is set to 86,400 seconds (1 day), the table's expiration is set to 2,592,000 (1 30-day month), the description is set to This is my partitioned table, and the label is set to organization:development. The command uses the -t shortcut instead of --table.

The --require_partition_filter flag is used to require users to include a WHERE clause that specifies the partitions to query. Requiring a partition filter might reduce cost and improve performance. For more information, see Querying partitioned tables.

The schema is specified inline as: ts:TIMESTAMP,column1:STRING,column2:INTEGER,coumn4:STRING. The specified TIMESTAMP field ts is used to partition the data by hour. Note that hourly partitioning requires a TIMESTAMP or DATETIME column rather than a DATE column.

bq mk -t \
--expiration 2592000 \
--schema 'ts:TIMESTAMP,column1:STRING,column2:INTEGER,coumn4:STRING' \
--time_partitioning_field ts \
--time_partitioning_type HOUR \
--time_partitioning_expiration 86400  \
--require_partition_filter \
--description "This is my partitioned table" \
--label org:dev \
mydataset.mypartitionedtable

Enter the following command to create a daily partitioned table named mypartitionedtable in myotherproject, not your default project. The partitioning expiration is set to 259,200 seconds (3 days), the description is set to This is my partitioned table, and the label is set to organization:development. The command uses the -t shortcut instead of --table. This command does not specify a table expiration. If the dataset has a default table expiration, it is applied. If the dataset has no default table expiration, the table will never expire, but the partitions will expire in 3 days.

The schema is specified in a local JSON file: /tmp/myschema.json. The schema definition includes a TIMESTAMP field named ts that is used to partition the data by day.

bq mk -t \
--expiration 2592000 \
--schema /tmp/myschema.json \
--time_partitioning_field ts \
--time_partitioning_type DAY \
--time_partitioning_expiration 86400  \
--description "This is my partitioned table" \
--label org:dev \
myotherproject:mydataset.mypartitionedtable

After the table is created, you can use the bq command-line tool to update the partitioned table's table expiration, partition expiration, description, and labels.

API

Call the tables.insert method with a defined table resource that specifies the timePartitioning property and the schema property.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Go API reference documentation.

import (
	"context"
	"fmt"
	"time"

	"cloud.google.com/go/bigquery"
)

// createTablePartitioned demonstrates creating a table and specifying a time partitioning configuration.
func createTablePartitioned(projectID, datasetID, tableID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydatasetid"
	// tableID := "mytableid"
	ctx := context.Background()

	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	sampleSchema := bigquery.Schema{
		{Name: "name", Type: bigquery.StringFieldType},
		{Name: "post_abbr", Type: bigquery.IntegerFieldType},
		{Name: "date", Type: bigquery.DateFieldType},
	}
	metadata := &bigquery.TableMetadata{
		TimePartitioning: &bigquery.TimePartitioning{
			Field:      "date",
			Expiration: 90 * 24 * time.Hour,
		},
		Schema: sampleSchema,
	}
	tableRef := client.Dataset(datasetID).Table(tableID)
	if err := tableRef.Create(ctx, metadata); err != nil {
		return err
	}
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Field;
import com.google.cloud.bigquery.Schema;
import com.google.cloud.bigquery.StandardSQLTypeName;
import com.google.cloud.bigquery.StandardTableDefinition;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableInfo;
import com.google.cloud.bigquery.TimePartitioning;

public class CreatePartitionedTable {

  public static void runCreatePartitionedTable() {
    // TODO(developer): Replace these variables before running the sample.
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    createPartitionedTable(datasetName, tableName);
  }

  public static void createPartitionedTable(String datasetName, String tableName) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(datasetName, tableName);

      TimePartitioning partitioning = TimePartitioning.of(TimePartitioning.Type.DAY);

      Schema schema =
          Schema.of(
              Field.of("stringField", StandardSQLTypeName.STRING),
              Field.of("booleanField", StandardSQLTypeName.BOOL),
              Field.of("dateField", StandardSQLTypeName.DATE));

      StandardTableDefinition tableDefinition =
          StandardTableDefinition.newBuilder()
              .setSchema(schema)
              .setTimePartitioning(partitioning)
              .build();
      TableInfo tableInfo = TableInfo.newBuilder(tableId, tableDefinition).build();

      bigquery.create(tableInfo);
      System.out.println("Partitioned table created successfully");
    } catch (BigQueryException e) {
      System.out.println("Partitioned table was not created. \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function createTablePartitioned() {
  // Creates a new partitioned table named "my_table" in "my_dataset".

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_dataset";
  // const tableId = "my_table";
  const schema = 'Name:string, Post_Abbr:string, Date:date';

  // For all options, see https://cloud.google.com/bigquery/docs/reference/v2/tables#resource
  const options = {
    schema: schema,
    location: 'US',
    timePartitioning: {
      type: 'DAY',
      expirationMS: '7776000000',
      field: 'date',
    },
  };

  // Create a new table in the dataset
  const [table] = await bigquery
    .dataset(datasetId)
    .createTable(tableId, options);
  console.log(`Table ${table.id} created with partitioning: `);
  console.log(table.metadata.timePartitioning);
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.

# from google.cloud import bigquery
# client = bigquery.Client()
# project = client.project
# dataset_ref = bigquery.DatasetReference(project, 'my_dataset')

table_ref = dataset_ref.table("my_partitioned_table")
schema = [
    bigquery.SchemaField("name", "STRING"),
    bigquery.SchemaField("post_abbr", "STRING"),
    bigquery.SchemaField("date", "DATE"),
]
table = bigquery.Table(table_ref, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
    type_=bigquery.TimePartitioningType.DAY,
    field="date",  # name of column to use for partitioning
    expiration_ms=7776000000,
)  # 90 days

table = client.create_table(table)

print(
    "Created table {}, partitioned on column {}".format(
        table.table_id, table.time_partitioning.field
    )
)

Creating partitioned tables from query results

To create a partitioned table from a query result, write the results to a new destination table. You can create a partitioned table by querying either a partitioned table or a non-partitioned table. You cannot change an existing standard table to a partitioned table using query results.

When you create a partitioned table from a query result, you must use standard SQL. Currently, legacy SQL is not supported for querying partitioned tables or for writing query results to partitioned tables.

Partition decorators enable you to write the query results to a specific partition. For example, to write the results to the May 1, 2016, partition, use the following partition decorator:

table_name$20160501

When writing query results to a specific partition using a partition decorator, the data that is being written to the partition must conform to the table's partitioning scheme. All rows written to the partition should have values that fall within the partition's date.

For example:

The following query retrieves data from February 1, 2018 and writes the data to the $20180201 partition of table mytable. The table has two columns — a TIMESTAMP column named TS and an INT64 column named a.

bq query \
--nouse_legacy_sql  \
--destination_table=mytable$20180201 \
'SELECT
   TIMESTAMP("2018-02-01") AS TS,
   2 AS a'

The following query retrieves data from January 31, 2018, and attempts to write the data to the $20180201 partition of mytable. This query fails because the data you're attempting to write doesn't fall within the partition's date.

bq query \
--nouse_legacy_sql  \
--destination_table=T$20180201 \
'SELECT
   TIMESTAMP("2018-01-31") as TS,
   2 as a'

For information on appending to or restating (replacing) data in partitioned tables, see Appending to and overwriting partitioned table data. For more information on querying partitioned tables, see Querying partitioned tables.

Creating a partitioned table from a query result

To create a partitioned table from a query result:

Console

You cannot specify partitioning options for a destination table when you query data using the Cloud Console.

bq

Enter the bq query command, specify the --destination_table flag to create a permanent table based on the query results, and specify the --time_partitioning_field flag to create a partitioned destination table.

Specify the use_legacy_sql=false flag to use standard SQL syntax. To write the query results to a table that is not in your default project, add the project ID to the dataset name in the following format: project_id:dataset.

(Optional) Supply the --location flag and set the value to your location.

Enter the following command to create a new, partitioned destination table from a query result:

bq --location=location query \
--destination_table project_id:dataset.table \
--time_partitioning_field column \
--time_partitioning_type unit_time
--use_legacy_sql=false \
'query'

Replace the following:

  • location is the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • project_id is your project ID.
  • dataset is the name of the dataset that will contain the new partitioned table.
  • table is the name of the partitioned table that you're creating using the query results.
  • column is the name of the TIMESTAMP or DATE column that's used to create the partitions.
  • unit_time is either DAY, HOUR, MONTH, or YEAR, based on the desired time-unit partitioning granularity. This defaults to DAY if time_partitioning_type is unspecified.
  • query is a query in standard SQL syntax. You cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.

Examples:

Enter the following command to write query results to a partitioned destination table named mypartitionedtable in mydataset. mydataset is in your default project. The query retrieves data from a non-partitioned table: the NHTSA Traffic Fatality public dataset. The table's timestamp_of_crash TIMESTAMP column is used to create the partitions.

bq query \
--destination_table mydataset.mypartitionedtable \
--time_partitioning_field timestamp_of_crash \
--use_legacy_sql=false \
'SELECT
   state_number,
   state_name,
   day_of_crash,
   month_of_crash,
   year_of_crash,
   latitude,
   longitude,
   manner_of_collision,
   number_of_fatalities,
   timestamp_of_crash
 FROM
   `bigquery-public-data`.nhtsa_traffic_fatalities.accident_2016
 LIMIT
   100'

Enter the following command to write query results to a partitioned destination table named mypartitionedtable in mydataset. mydataset is in myotherproject, not your default project. The query retrieves data from a non-partitioned table: the NHTSA Traffic Fatality public dataset. The table's timestamp_of_crash TIMESTAMP column is used to create the partitions.

bq query \
--destination_table myotherproject:mydataset.mypartitionedtable \
--time_partitioning_field timestamp_of_crash \
--use_legacy_sql=false \
'SELECT
   state_number,
   state_name,
   day_of_crash,
   month_of_crash,
   year_of_crash,
   latitude,
   longitude,
   manner_of_collision,
   number_of_fatalities,
   timestamp_of_crash
 FROM
   `bigquery-public-data`.nhtsa_traffic_fatalities.accident_2016
 LIMIT
   100'

API

To save query results to a permanent partitioned table, call the jobs.insert method, configure a query job, and include a value for the destinationTable and timePartitioning properties.

Specify your location in the location property in the jobReference section of the job resource.

Creating a partitioned table when loading data

You can create a partitioned table by specifying partitioning options when you load data into a new table. You do not need to create an empty partitioned table before loading data into it. You can create the partitioned table and load your data at the same time.

When you load data into BigQuery, you can supply the table schema, or for supported data formats, you can use schema auto-detection.

Partition decorators enable you to load data into a specific partition. For example, to load all data that is generated on May 1, 2016, into the 20160501 partition, use the following partition decorator:

table_name$20160501

When loading data into a specific partition using a partition decorator, the data that is being loaded into the partition must conform to the table's partitioning scheme. All rows written to the partition should have values that fall within the partition's date.

For more information on loading data, see Introduction to loading data into BigQuery.

Time-unit partitioning with clustering

Time-unit partitioning can be used with clustering. A time-unit partitioned table with clustering would first partition its data by the time-unit boundaries (day, hour, month, or year) of the partitioning column, then within each partition boundary, data is then clustered further by the clustering columns.

As an example, this command creates a table with a daily partitioned column and a cluster.

    bq mk --time_partitioning_type=DAY \
    --time_partitioning_field=ts_column \
    --clustering_fields=column1,column2 \
    mydataset.mytable2 "ts_column:TIMESTAMP,column1:INTEGER,column2:STRING"

When you retrieve the format of the table, you will see that both daily timestamp partitioning and clustering are in effect:

    bq show --format=prettyjson mydataset.mytable2
    ...
      "clustering": {
        "fields": [
          "column1",
          "column2"
        ]
      },
    ...
      "timePartitioning": {
        "field": "ts_column",
        "type": "DAY"
      },
    ...

If you are running over the limit of number of partitions per table, or if you have too little data spread across many partitions and are mutating it too often, consider using larger interval time-unit partitioning with clustering on the same partitioning column instead. This is the recommended way to use partitioned tales to stay within the partition limits.

For instance, this command creates a daily partitioned and clustered table on the same column:

    bq mk --time_partitioning_type=DAY \
    --time_partitioning_field=ts_column \
    --clustering_fields=ts_column,column1 \
    mydataset.mytable2 "ts_column:TIMESTAMP,column1:INTEGER,column2:STRING"

Here's another example for the table above, but with a larger time-unit partitioning interval:

    bq mk --time_partitioning_type=MONTH \
    --time_partitioning_field=ts_column \
    --clustering_fields=ts_column,column1 \
    mydataset.mytable2 "ts_column:TIMESTAMP,column1:INTEGER,column2:STRING"

Controlling access to partitioned tables

To configure access to tables and views, you can grant an IAM role to an entity at the following levels, listed in order of range of resources allowed (largest to smallest):

Access with any resource protected by IAM is additive. For example, if an entity does not have access at the high level such as a project, you could grant the entity access at the dataset level, and then the entity will have access to the tables and views in the dataset. Similarly, if the entity does not have access at the high level or the dataset level, you could grant the entity access at the table of view level.

Granting IAM roles at a higher level in the Google Cloud resource hierarchy such as the project, folder, or organization level gives the entity access to a broad set of resources. For example, granting a role to an entity at the project level gives that entity permissions that apply to all datasets throughout the project.

Granting a role at the dataset level specifies the operations an entity is allowed to perform on tables and views in that specific dataset, even if the entity does not have access at a higher level. For information on configuring dataset-level access controls, see Controlling access to datasets.

Granting a role at the table or view level specifies the operations an entity is allowed to perform on specific tables and views, even if the entity does not have access at a higher level. For information on configuring table-level access controls, see Controlling access to tables and views.

You can also create IAM custom roles. If you create a custom role, the permissions you grant depend on the specific operations you want the entity to be able to perform.

You can't set a "deny" permission on any resource protected by IAM.

For more information on roles and permissions, see:

Using partitioned tables

Getting information about partitioned tables

You can get information about tables in the following ways:

  • Using the Cloud Console.
  • Using the bq show command in the bq command-line tool.
  • Calling the tables.get API method.
  • Using the client libraries.

Required permissions

At a minimum, to get information about tables, you must be granted bigquery.tables.get permissions. The following predefined IAM roles include bigquery.tables.get permissions:

  • bigquery.metadataViewer
  • bigquery.dataViewer
  • bigquery.dataOwner
  • bigquery.dataEditor
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to retrieve table metadata.

For more information on IAM roles and permissions in BigQuery, see Access control.

Getting partitioned table information

To view information about a partitioned table:

Console

  1. Open the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. In the navigation panel, in the Resources section, expand your project and dataset, then click the table name in the list.

  3. Click Details below the Query editor. This tab displays the table's description and table information.

    Table details

  4. Click the Schema tab to view the table's schema definition. Notice partitioned tables do not include the _PARTITIONTIME pseudo column.

bq

Issue the bq show command to display all table information. Use the --schema flag to display only table schema information. The --format flag can be used to control the output.

If you are getting information about a table in a project other than your default project, add the project ID to the dataset in the following format: project_id:dataset.

bq show --schema --format=prettyjson project_id:dataset.table

Replace the following:

  • project_id is your project ID.
  • dataset is the name of the dataset.
  • table is the name of the table.

Examples:

Enter the following command to display all information about mytable in mydataset. mydataset is in your default project.

bq show --format=prettyjson mydataset.mytable

Enter the following command to display all information about mytable in mydataset. mydataset is in myotherproject, not your default project.

bq show --format=prettyjson myotherproject:mydataset.mytable

The output should look like the following:

{
  "creationTime": "1563236533535",
  "description": "This is my partitioned table",
  "etag": "/ABcDEo7f8GHijKL2mnOpQr==",
  "expirationTime": "1565828533000",
  "id": "myproject:mydataset.mypartitionedtable",
  "kind": "bigquery#table",
  "labels": {
    "org": "dev"
  },
  "lastModifiedTime": "1563236533576",
  "location": "US",
  "numBytes": "0",
  "numLongTermBytes": "0",
  "numRows": "0",
  "requirePartitionFilter": true,
  "schema": {
    "fields": [
      {
        "name": "ts",
        "type": "TIMESTAMP"
      },
      {
        "name": "column1",
        "type": "STRING"
      },
      {
        "name": "column2",
        "type": "INTEGER"
      },
      {
        "name": "column3",
        "type": "STRING"
      }
    ]
  },
  "selfLink": "https://bigquery.googleapis.com/bigquery/v2/projects/myproject/datasets/mydataset/tables/mypartitionedtable",
  "tableReference": {
    "datasetId": "mydataset",
    "projectId": "myproject",
    "tableId": "mypartitionedtable"
  },
  "timePartitioning": {
    "expirationMs": "86400000",
    "field": "ts",
    "requirePartitionFilter": true,
    "type": "DAY"
  },
  "type": "TABLE"
}

Enter the following command to display only schema information about mytable in mydataset. mydataset is in myotherproject, not your default project.

bq show --schema --format=prettyjson myotherproject:mydataset.mytable

The output should look like the following:

[
  {
    "name": "ts",
    "type": "TIMESTAMP"
  },
  {
    "name": "column1",
    "type": "STRING"
  },
  {
    "name": "column2",
    "type": "INTEGER"
  },
  {
    "name": "column3",
    "type": "STRING"
  }
]

API

Call the bigquery.tables.get method and provide any relevant parameters.

Listing partitioned tables in a dataset

You can list tables in datasets (including partitioned tables) in the following ways:

  • Using the Cloud Console.
  • Using the bq ls command in the bq command-line tool.
  • Calling the tables.list API method.
  • Using the client libraries.

Required permissions

At a minimum, to list tables in a dataset, you must be granted bigquery.tables.list permissions. The following predefined IAM roles include bigquery.tables.list permissions:

  • bigquery.user
  • bigquery.metadataViewer
  • bigquery.dataViewer
  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

For more information on IAM roles and permissions in BigQuery, see Access control.

Listing partitioned tables

To list the tables in a dataset (including partitioned tables):

Console

  1. Open the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. In the navigation panel, in the Resources section, expand your project and click on your dataset.

  3. Scroll through the list to see the tables in the dataset. Tables, partitioned tables, models, and views are identified by different icons.

bq

Issue the bq ls command. The --format flag can be used to control the output. If you are listing tables in a project other than your default project, add the project ID to the dataset in the following format: project_id:dataset.

bq ls --format=pretty project_id:dataset

Replace the following:

  • project_id is your project ID.
  • dataset is the name of the dataset.

When you run the command, the Type field displays either TABLE or VIEW. For partitioned tables, the Time Partitioning field displays DAY, the column used to create the partitions, and the partition expiration time in milliseconds, if expiration time is specified.

For example:

+-------------------------+-------+----------------------+---------------------------------------------------+
|         tableId         | Type  |        Labels        | Time Partitioning                                 |
+-------------------------+-------+----------------------+---------------------------------------------------+
| mytable                 | TABLE | department:shipping  |  DAY (field: source_date, expirationMs: 86400000) |
| myview                  | VIEW  |                      |                                                   |
+-------------------------+-------+----------------------+---------------------------------------------------+

Examples:

Enter the following command to list tables in dataset mydataset in your default project.

bq ls --format=pretty mydataset

Enter the following command to list tables in dataset mydataset in myotherproject.

bq ls --format=pretty myotherproject:mydataset

API

To list tables using the API, call the tables.list method.

Listing partitions in partitioned tables

You can list the partitions in a partitioned table by querying the __PARTITIONS_SUMMARY__ meta table using legacy SQL.

You can run the query by using the Cloud Console, by using the bq query command, or by calling the jobs.insert method and configuring a query job.

Required permissions

At a minimum, to run a query job that uses the __PARTITIONS_SUMMARY__ meta-table, you must be granted bigquery.jobs.create permissions. The following predefined IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

You must also be granted bigquery.tables.getData permissions. The following predefined IAM roles include bigquery.tables.getData permissions:

  • bigquery.dataViewer
  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

For more information on IAM roles in BigQuery, see Access control.

Listing partitions in a partitioned table

You can list partitions in a partitioned table using legacy SQL. To list partitions in a partitioned table:

Console

  1. Open the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. Click the Compose new query button.

  3. Enter the following text into the Query editor box to query the __PARTITIONS_SUMMARY__ meta-table:

    #legacySQL
    SELECT
      partition_id
    FROM
      [dataset.table$__PARTITIONS_SUMMARY__]
    

    Replace the following:

    • dataset is the dataset that contains the table.
    • table is the name of the table.
  4. Click Run.

bq

Enter the following query using the bq query command:

bq --location=location query \
--use_legacy_sql=true \
'SELECT
  partition_id
FROM
  [dataset.table$__PARTITIONS_SUMMARY__]'

Replace the following:

  • location is the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • dataset is the dataset that contains the table.
  • table is the name of the table.

API

Call the jobs.insert method and configure a query job that queries the table's __PARTITIONS_SUMMARY__ meta table.

Getting partitioned table metadata using meta tables

You can get information about partitioned tables by using special tables called meta tables. Meta tables contain metadata such as the list of tables and views in a dataset. The meta tables are read only.

Currently, you cannot use the INFORMATION_SCHEMA service to get partitioned table metadata.

Getting partition metadata using meta tables

The __PARTITIONS_SUMMARY__ meta table is a special table whose contents represent metadata about partitions in a time-partitioned table. The __PARTITIONS_SUMMARY__ meta table is read-only.

To access metadata about the partitions in a time-partitioned table, use the __PARTITIONS_SUMMARY__ meta-table in a query's SELECT statement. You can run the query by:

  • Using the Cloud Console or the classic BigQuery web UI
  • Using the bq command-line tool's bq query command
  • Calling the jobs.insert API method and configuring a query job
  • Using the client libraries

Currently, standard SQL does not support the partition decorator separator ($) so you cannot query __PARTITIONS_SUMMARY__ in standard SQL. A legacy SQL query that uses the __PARTITIONS_SUMMARY__ meta-table looks like the following:

#legacySQL
SELECT
  column
FROM
  [dataset.table$__PARTITIONS_SUMMARY__]

Where:

  • dataset is the name of your dataset.
  • table is the name of the time-partitioned table.
  • column is one of the following:
Value Description
project_id Name of the project.
dataset_id Name of the dataset.
table_id Name of the time-partitioned table.
partition_id Name (date) of the partition.
creation_time The time at which the partition was created, in milliseconds since January 1, 1970 UTC.
last_modified_time The time at which the partition was last modified, in milliseconds since January 1, 1970 UTC.

Partition meta table permissions

At a minimum, to run a query job that uses the __PARTITIONS_SUMMARY__ meta-table, you must be granted bigquery.jobs.create permissions. The following predefined IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

You must also be granted bigquery.tables.getData permissions. The following predefined IAM roles include bigquery.tables.getData permissions:

  • bigquery.dataViewer
  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

For more information on IAM roles in BigQuery, see Access control.

Partition meta table examples

The following query retrieves all partition metadata for a time-partitioned table named mydataset.mytable.

Console

#legacySQL
SELECT
  *
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]

Classic UI

#legacySQL
SELECT
  *
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]

bq

bq query --use_legacy_sql=true '
SELECT
  *
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]'

The output looks like the following:

+----------------+------------+----------------+--------------+---------------+--------------------+
|   project_id   | dataset_id |    table_id    | partition_id | creation_time | last_modified_time |
+----------------+------------+----------------+--------------+---------------+--------------------+
| myproject      | mydataset  | mytable        | 20160314     | 1517190224120 | 1517190224997      |
| myproject      | mydataset  | mytable        | 20160315     | 1517190224120 | 1517190224997      |
+----------------+------------+----------------+--------------+---------------+--------------------+

The following query lists the times when the partitions in mydataset.mytable were last modified.

Console

#legacySQL
SELECT
  partition_id,
  last_modified_time
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]

Classic UI

#legacySQL
SELECT
  partition_id,
  last_modified_time
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]

bq

bq query --use_legacy_sql=true '
SELECT
  partition_id,
  last_modified_time
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]'

The output looks like the following:

+--------------+--------------------+
| partition_id | last_modified_time |
+--------------+--------------------+
| 20160102     |      1471632556179 |
| 20160101     |      1471632538142 |
| 20160103     |      1471632570463 |
+--------------+--------------------+

To display the last_modified_time field in human-readable format, use the FORMAT_UTC_USEC function. For example:

Console

#legacySQL
SELECT
  partition_id,
  FORMAT_UTC_USEC(last_modified_time*1000) AS last_modified
FROM
  [mydataset.table1$__PARTITIONS_SUMMARY__]

Classic UI

#legacySQL
SELECT
  partition_id,
  FORMAT_UTC_USEC(last_modified_time*1000) AS last_modified
FROM
  [mydataset.table1$__PARTITIONS_SUMMARY__]

bq

bq query --use_legacy_sql=true '
SELECT
  partition_id,
  FORMAT_UTC_USEC(last_modified_time*1000) AS last_modified
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]'

The output looks like the following:

+--------------+----------------------------+
| partition_id |       last_modified        |
+--------------+----------------------------+
| 20160103     | 2016-08-19 18:49:30.463000 |
| 20160102     | 2016-08-19 18:49:16.179000 |
| 20160101     | 2016-08-19 18:48:58.142000 |
+--------------+----------------------------+

Next steps