Managing Partitioned Tables

This document describes how to manage partitioned tables in BigQuery. Both ingestion-time partitioned and partitioned tables are managed the same way. You can perform the following management tasks for partitioned tables:

  • Update a time-partitioned table's:
    • Table expiration time
    • Partition expiration time
    • Description
    • Schema definition
    • Labels
  • Rename (copy) a time-partitioned table
  • Copy a time-partitioned table
  • Copy partitions
  • Delete a time-partitioned table
  • Delete partitions in a time-partitioned table

For more information on creating and using partitioned tables including getting table information, listing tables, and controlling access to table data, see Creating and Using Ingestion-Time Partitioned Tables or Creating and Using Partitioned Tables.

Updating partitioned table properties

You can update a partitioned table's:

Required permissions

To update table properties, you must have WRITER access at the dataset level, or you must be assigned a project-level IAM role that includes requires bigquery.tables.update permissions. The following predefined, project-level IAM roles include bigquery.tables.update permissions:

In addition, because the bigquery.user role has bigquery.datasets.create permissions, a user assigned to the bigquery.user role can update any table that user creates in the dataset. When a user assigned to the bigquery.user role creates a dataset, that user is given OWNER access to the dataset. OWNER access to a dataset gives the user full control over it and all the tables and views in it.

For more information on IAM roles and permissions in BigQuery, see: access control.

Updating a partitioned table's description

The process for updating a partitioned table's description is the same as updating a standard table's description. For information on adding or changing a table's description, see Updating a table's description.

Currently, you cannot create descriptions for individual partitions.

Updating the table expiration

The process for updating a partitioned table's expiration is the same as updating a standard table's expiration. For information on adding or changing a table's expiration, see Updating a table's expiration time.

Updating the partition expiration

You can specify partition expiration for a partitioned table when the table is created by using the command-line tool's bq mk command or by calling the tables.insert API method. Specifying partition expiration is not currently supported by the BigQuery web UI.

If you specify partition expiration when you create a table, all partitions are subject to the expiration. You cannot apply different expiration times to individual partitions.

At any point after the table is created, you can update the table's partition expiration time using the CLI's bq update command, or the API's tables.patch method. Updating partition expiration is not currently supported by the BigQuery web UI.

When you update a table's partition expiration, the setting applies to all partitions, regardless of when they were created.

When you update a table's partition expiration time, you must calculate the partition expiration based on the partition's date. For example, if the partition's date is January 3, 2018, and you set the partition expiration time at 5 days, the partition expires in on January 8, 2018 regardless of when it was last updated.

If the partitioned table also has a table expiration configured, the table and all the partitions in it are deleted according to the table expiration settings. The table expiration takes precedence over the partition expiration.

For example, if a partitioned table's expiration is set to 5 days, and the partition expiration time is set to 7 days, the table and all partitions in it is deleted after 5 days.

For projects that have partitioned tables that were created before December 13, 2016, the partition expiration is based on the last date that the partition was modified. This behavior also applies to new tables created in these projects. To migrate your project to the new behavior, open a request in the BigQuery issue tracker.

To update a partitioned table's partition expiration:

Web UI

You cannot add a partition expiration or update a partition expiration using the BigQuery web UI.

Command-line

Issue the bq update command with the --time_partitioning_expiration flag. If you are updating a partitioned table in a project other than your default project, add the project ID to the dataset name in the following format: [PROJECT_ID]:[DATASET].

bq update --time_partitioning_expiration [INTEGER] [PROJECT_ID]:[DATASET].[TABLE]

Where:

  • [INTEGER] is the default lifetime (in seconds) for the table's partitions. There is no minimum value. The expiration time evaluates to the partition's date plus the integer value. If you specify 0 or a negative number, the partition expiration is removed, and the partition never expires. Partitions with no expiration must be manually deleted.
  • [PROJECT_ID] is your project ID.
  • [DATASET] is the name of the dataset that contains the table you're updating.
  • [TABLE] is the name of the table you're updating.

Examples:

Enter the following command to update the expiration time of partitions in mydataset.mytable to 5 days (432000 seconds). mydataset is in your default project.

bq update --time_partitioning_expiration 432000 mydataset.mytable

Enter the following command to update the expiration time of partitions in mydataset.mytable to 5 days (432000 seconds). mydataset is in myotherproject, not your default project.

bq update --time_partitioning_expiration 432000 myotherproject:mydataset.mytable

API

Call the tables.patch method and use the timePartitioning.expirationMs property to update the partition expiration in milliseconds. Because the tables.update method replaces the entire table resource, the tables.patch method is preferred.

Updating the schema definition

The process for updating the schema definition for a partitioned table is the same as the process for updating the schema definition of a standard table. For more information, see Modifying Table Schemas.

Renaming a partitioned table

Currently, you cannot change the name of an existing partitioned table. If you need to change the table name, follow the steps to copy the table. When you specify the destination table in the copy operation, use the new table name.

Copying partitioned tables

Copying a single partitioned table

The process for copying a partitioned table is the same as the process for copying a standard table. For more information, see Copying a table.

When you copy a partitioned table, note the following:

  • The source and destination tables must be in datasets in the same location.

  • Copying a partitioned table into a new, partitioned destination table

    If you copy a time-partitioned table to a new table, all of the partitioning information is copied with the table. The new table and the old table will have identical partitions.

  • Copying a non-partitioned table into a partitioned table

    If you copy a non-partitioned table into a partitioned table, BigQuery copies the source data into the partition that represents the current date.

  • Copying a partitioned table to another partitioned table

    To copy a partitioned table into another partitioned table, the partition specifications for the source and destination tables must match. You can specify whether to append or overwrite the destination table.

  • Copying a partitioned table into a non-partitioned table

    If you copy a partitioned table into a non-partitioned table, the destination table remains unpartitioned. The data is either appended to the non-partitioned table or used to overwrite the non-partitioned table, depending on your settings.

Copying multiple partitioned tables

The process for copying multiple partitioned tables is the same as the process for copying multiple standard tables. For more information, see Copying multiple source tables.

When you copy multiple partitioned tables, note the following:

  • If you copy multiple source tables into a partitioned table in the same job, the source tables can't contain a mixture of partitioned and non-partitioned tables.
  • If all of the source tables are partitioned tables, the partition specifications for all source tables must match the destination table's partition specification. Your settings determine whether the destination table is appended or overwritten.
  • The source and destination tables must be in datasets in the same location.

Copying partitions

You can copy one or more partitions by using the command-line tool's bq cp command or by calling the jobs.insert API method and configuring a copy job. Currently, copying partitions is not supported by the BigQuery web UI.

Required permissions

At the dataset level, copying partitions requires READER access to the source dataset that contains the partitions being copied, and WRITER access to the destination dataset.

Instead of using dataset-level permissions, you can leverage a project-level IAM role that includes bigquery.tables.create and bigquery.tables.getData permissions. bigquery.tables.create permissions are required to create the copy of the partition in the destination dataset (if the destination table is new). bigquery.tables.getData permissions are required to read the data in the partitions being copied.

The following predefined, project-level IAM roles include both bigquery.tables.create and bigquery.tables.getData permissions for every dataset in the project:

You must also be granted bigquery.jobs.create permissions to run copy jobs. The following predefined, project-level IAM roles include bigquery.jobs.create permissions:

In addition, because the bigquery.user role has bigquery.datasets.create permissions, a user assigned to the bigquery.user role can read or copy the data in any table or partition that user creates. When a user assigned to the bigquery.user role creates a dataset, that user is given OWNER access to the dataset. OWNER access to a dataset gives the user full control over it and all the tables and partitions in it.

For more information on IAM roles and permissions in BigQuery, see access control. For more information on dataset-level roles, see Primitive roles for datasets.

Copying a single partition

Web UI

Copying partitions is not supported by the BigQuery web UI.

Command-line

To copy a partition, use the command-line tool's bq cp (copy) command with a partition decorator ($[DATE]) such as $20160201.

Optional flags can be used to control the write disposition of the destination partition:

  • -a or --append_table appends the data from the source partition to an existing table or partition in the destination dataset.
  • -f or --force overwrites an existing table or partition in the destination dataset and doesn't prompt you for confirmation.
  • -n or --no_clobber returns the following error message if the table or partition exists in the destination dataset: Table '[PROJECT_ID]:[DATASET].[TABLE] or [TABLE]$[DATE]' already exists, skipping. If -n is not specified, the default behavior is to prompt you to choose whether to replace the destination table or partition.
  • --destination_kms_key is the customer-managed Cloud KMS key used to encrypt the destination table or partition.

The cp command does not support the --time_partitioning_field or --time_partitioning_type flags. You cannot use a copy job to convert an ingestion-time partitioned table into a partitioned table.

--destination_kms_key is not demonstrated here. See Protecting Data with Cloud KMS Keys for more information.

If the source or destination dataset is in a project other than your default project, add the project ID to the dataset names in the following format: [PROJECT_ID]:[DATASET].

Supply the --location flag and set the value to your location.

bq --location=[LOCATION] cp -a -f -n [PROJECT_ID]:[DATASET].[SOURCE_TABLE]$[SOURCE_PARTITION] [PROJECT_ID]:[DATASET].[DESTINATION_TABLE]$[DESTINATION_PARTITION]

Where:

  • [LOCATION] is the name of your location. The --location flag is optional if your data is in the US or the EU multi-region location. For example, if you are using BigQuery in the Tokyo region, set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • [PROJECT_ID] is your project ID.
  • [DATASET] is the name of the source or destination dataset.
  • [SOURCE_TABLE] is the table you're copying.
  • [SOURCE_PARTITION] is the partition decorator of the source partition.
  • [DESTINATION_TABLE] is the name of the table in the destination dataset.
  • [DESTINATION_PARTITION] is the partition decorator of the destination partition.

Examples:

Copying a partition to a new table

Enter the following command to copy the January 30, 2018 partition from mydataset.mytable to a new table — mydataset.mytable2. mydataset is in your default project. mydataset was created in the US multi-region location.

bq --location=US cp -a 'mydataset.mytable$20180130' mydataset.mytable2

Copying a partition to a non-partitioned table

Enter the following command to copy the January 30, 2018 partition from mydataset.mytable to a non-partitioned table — mydataset2.mytable2. The -a shortcut is used to append the partition's data to the non-partitioned destination table. Both datasets are in your default project. Both datasets were created in the asia-northeast1 region.

bq --location=asia-northeast1 cp -a 'mydataset.mytable$20180130' mydataset2.mytable2

Enter the following command to copy the January 30, 2018 partition from mydataset.mytable to a non-partitioned table — mydataset2.mytable2. The -f shortcut is used to overwrite the non-partitioned destination table without prompting. Both datasets are in your default project and were created in the US multi-region location.

bq --location=US cp -f 'mydataset.mytable$20180130' mydataset2.mytable2

Copying a partition to another partitioned table

Enter the following command to copy the January 30, 2018 partition from mydataset.mytable to another partitioned table — mydataset2.mytable2. The -a shortcut is used to append the partition's data to the destination table. Since no partition decorator is specified on the destination table, the source partition key is preserved and the data is copied to the January 30, 2018 partition in the destination table. You can also specify a partition decorator on the destination table to copy data to a specific partition. mydataset is in your default project. mydataset2 is in myotherproject, not your default project. Both datasets were created in the US multi-region location.

bq --location=US cp -a 'mydataset.mytable$20180130' myotherproject:mydataset2.mytable2

Enter the following command to copy the January 30, 2018 partition from mydataset.mytable to the February 20, 2018 partition of another partitioned table — mydataset2.mytable2. The -f shortcut is used to overwrite the February 20, 2018 partition in the destination table without prompting. If no partition decorator is used, all data in the destination table is overwritten. mydataset is in your default project. mydataset2 is in myotherproject, not your default project. Both datasets were created in the US multi-region location.

bq --location=US cp -f 'mydataset.mytable$20180130' 'myotherproject:mydataset2.mytable2$20180220'

Enter the following command to copy the January 30, 2018 partition from mydataset.mytable to another partitioned table — mydataset2.mytable2. mydataset is in your default project. mydataset2 is in myotherproject, not your default project. If there is data in the destination table, the default behavior is to prompt you to overwrite. Both datasets were created in the asia-northeast1 region.

bq --location=asia-northeast1 cp 'mydataset.mytable$20180130' myotherproject:mydataset2.mytable2

API

Call the jobs.insert method, and configure a copy job. Specify your region in the location property in the jobReference section of the job resource.

Specify the following properties in your job configuration:

  • Enter the source dataset, table, and partition in the sourceTable property.
  • Enter the destination dataset and table in the destinationTable property.
  • Use the writeDisposition property to specify whether to append or overwrite the destination table or partition.

Copying multiple partitions

To copy multiple partitions:

Web UI

Currently, copying partitions is not supported by the BigQuery web UI.

Command-line

Copying multiple partitions is the same process as copying a single partition, but you specify multiple source partitions as a comma-separated list:

bq --location=[LOCATION] cp 'mydataset.mytable$20180130,mydataset.mytable$20180131' myotherproject:mydataset.mytable2

API

Call the jobs.insert method, and configure a copy job. Specify your region in the location property in the jobReference section of the job resource.

Specify the following properties in your job configuration:

  • Enter multiple source partitions (including the dataset and table names) in the sourceTables property.
  • Enter the destination dataset and table in the destinationTable property.
  • Use the writeDisposition property to specify whether to append or overwrite the destination table or partition.

Deleting a partitioned table

The process for deleting a time-partitioned table and all the partitions in it is the same as the process for deleting a standard table. For information on deleting a table, see Deleting tables.

Deleting partitions in partitioned tables

You can delete partitions in partitioned tables using the command-line tool's bq rm command, or by calling the tables.delete API method. Deleting partitions is not currently supported by the BigQuery web UI.

You can use the partition decorator to delete a specific partition. For example, the partition for March 1, 2016 ($20160301) in a partitioned table named mydataset.mytable can be deleted using the command:

bq rm 'mydataset.mytable$20160301'

To retrieve a list of partitions in a partitioned table, see Listing partitions in ingestion-time partitioned tables or Listing partitions in partitioned tables.

Currently, you can only delete one partition at a time.

Required permissions

To delete a partition, you must have OWNER access at the dataset level, or you must be assigned a project-level IAM role that includes bigquery.tables.delete permissions. The following predefined, project-level IAM roles include bigquery.tables.delete permissions:

Users assigned a predefined, project-level role can delete partitions in any partitioned table in the project. Users assigned OWNER permissions at the dataset level can delete partitions in tables only in that dataset.

In addition, because the bigquery.user role has bigquery.datasets.create permissions, a user assigned to the bigquery.user role can delete tables and partitions in any dataset that user creates. When a user assigned to the bigquery.user role creates a dataset, that user is given OWNER access to the dataset. OWNER access to a dataset gives the user full control over the dataset, all the tables in it, and all the table partitions.

For more information on IAM roles and permissions in BigQuery, see Access Control. For more information on dataset-level roles, see Primitive roles for datasets.

Deleting a partition in a partitioned table

You can delete a partition by specifying the partition's decorator unless it is one of the two special partitions. Currently, you cannot delete the __NULL__ or __UNPARTITIONED__ partitions.

To delete a partition in a partitioned table:

Web UI

Deleting partitions is not supported by the BigQuery web UI.

CLI

Use the bq rm command with the --table flag (or -t shortcut) and reference the partition decorator ($[DATE]) to delete a specific partition in a partitioned table. When you use the CLI to remove a partition, you must confirm the action. You can use the --force flag (or -f shortcut) to skip confirmation.

If the partitioned table is in a dataset in a project other than your default project, add the project ID to the dataset name in the following format: [PROJECT_ID]:[DATASET].

bq rm -f -t [PROJECT_ID]:[DATASET].[TABLE]$[DATE]

Where:

  • [PROJECT_ID] is your project ID.
  • [DATASET] is the name of the dataset that contains the table.
  • [TABLE] is the name of the table.
  • $[DATE] is the partition decorator of the partition you're deleting.

Examples:

Enter the following command to delete the partition for March 1, 2016 ($20160301) in a partitioned table named mydataset.mytable. mydataset is in your default project.

bq rm 'mydataset.mytable$20160301'

Enter the following command to delete the partition for January 1, 2017 ($20170101) in a partitioned table named mydataset.mytable. mydataset is in myotherproject, not your default project.

bq rm 'myotherproject:mydataset.mytable$20170101'

Enter the following command to delete the partition for January 18, 2018 ($20180118) in a partitioned table named mydataset.mytable. mydataset is in myotherproject, not your default project. The -f shortcut is used to skip confirmation.

bq rm -f 'myotherproject:mydataset.mytable$20180118'

API

Call the tables.delete method and specify the table and partition decorator using the tableId parameter.

Cette page vous a-t-elle été utile ? Évaluez-la :

Envoyer des commentaires concernant…