This document describes how to manage partitioned tables in BigQuery. Both ingestion-time partitioned and partitioned tables are managed the same way. You can perform the following management tasks for partitioned tables:
- Update a time-partitioned table's:
- Description
- Table expiration time
- Partition expiration time
- Partition filter requirements
- Schema definition
- Labels
- Rename (copy) a time-partitioned table
- Copy a time-partitioned table
- Copy partitions
- Delete a time-partitioned table
- Delete partitions in a time-partitioned table
For more information on creating and using partitioned tables including getting table information, listing tables, and controlling access to table data, see Creating and using ingestion-time partitioned tables or Creating and using partitioned tables.
Updating partitioned table properties
You can update a partitioned table's:
- Description
- Table expiration time
- Partition expiration time
- Schema definition
- Labels
Required permissions
At a minimum, to update table properties, you must be granted
bigquery.tables.update
and bigquery.tables.get
permissions. The following
predefined IAM roles include bigquery.tables.update
and
bigquery.tables.get
permissions:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that
user creates a dataset, they are granted bigquery.dataOwner
access to it.
bigquery.dataOwner
access gives the user the ability to update the properties
on tables in that dataset.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Updating a partitioned table's description
The process for updating a partitioned table's description is the same as updating a standard table's description. For information on adding or changing a table's description, see Updating a table's description.
Currently, you cannot create descriptions for individual partitions.
Updating the table expiration
The process for updating a partitioned table's expiration is the same as updating a standard table's expiration. For information on adding or changing a table's expiration, see Updating a table's expiration time.
Updating the partition expiration
When you create a table partitioned by ingestion time or date/timestamp column, you can specify a partition expiration. This setting specifies how long BigQuery keeps the data in each partition. The setting applies to all partitions in the table. When a partition expires, BigQuery deletes the data in that partition.
A partition's expiration time is calculated from the partition boundary in UTC. For example, with daily partitioning, the partition boundary is at midnight (00:00:00 UTC). If the table's partition expiration is 6 hours, then each partition expires at 06:00:00 UTC the following day.
You can also specify a default partition expiration at the dataset level. If you set the partition expiration on a table, then the value overrides the default partition expiration. If you don't specify any partition expiration (on the table or dataset), then partitions never expire.
If you set a table expiration, that value takes precedence over the partition expiration. For example, if the table expiration is set to 5 days, and the partition expiration is set to 7 days, then the table and all partitions in it are deleted after 5 days.
At any point after a table is created, you can update the table's partition expiration. The new setting applies to all partitions in that table, regardless of when they were created. Existing partitions expire immediately if they are older than the new expiration time.
For projects that were created before December 13, 2016, partition expiration is based on the last date that the partition was modified. This behavior applies to both existing and new tables created in the project. To migrate your project to the newer behavior, open a request in the BigQuery issue tracker.
To update a partitioned table's partition expiration:
Console
You cannot update the partition expiration in the Cloud Console.
SQL
Data definition language (DDL) statements allow you to create and modify tables and views using standard SQL query syntax.
See more on Using Data Definition Language statements.
To update a partitioned table's partition expiration by using a DDL statement:
Open the BigQuery page in the Cloud Console.
Click Compose new query.
Type your DDL statement into the Query editor text area.
ALTER TABLE mydataset.mytable SET OPTIONS ( -- Sets partition expiration to 5 days partition_expiration_days=5 )
Click Run.
bq
Issue the bq update
command with the --time_partitioning_expiration
flag. If you are updating a partitioned table in a project other than your
default project, add the project ID to the dataset name in the following
format: project_id:dataset
.
bq update \ --time_partitioning_expiration integer \ --time_partitioning_type unit_time \ project_id:dataset.table
Where:
- integer is the default lifetime (in seconds) for the table's
partitions. There is no minimum value. The expiration time evaluates to
the partition's date plus the integer value. If you specify
0
, the partition expiration is removed, and the partition never expires. Partitions with no expiration must be manually deleted. - unit_time is either
DAY
,HOUR
,MONTH
, orYEAR
, based on the table's partitioning granularity. This value must match the granularity that you set when you created the table. - project_id is your project ID.
- dataset is the name of the dataset that contains the table you're updating.
- table is the name of the table you're updating.
Examples:
Enter the following command to update the expiration time of partitions in
mydataset.mytable
to 5 days (432000 seconds). mydataset
is in your
default project.
bq update --time_partitioning_expiration 432000 mydataset.mytable
Enter the following command to update the expiration time of partitions in
mydataset.mytable
to 5 days (432000 seconds). mydataset
is in
myotherproject
, not your default project.
bq update \
--time_partitioning_expiration 432000 \
myotherproject:mydataset.mytable
API
Call the tables.patch
method and use the timePartitioning.expirationMs
property to update the
partition expiration in milliseconds. Because the tables.update
method
replaces the entire table resource, the tables.patch
method is preferred.
Updating partition filter requirements
When you create a partitioned table, you can require the use of predicate
filters by enabling the Require partition filter option. When this option
is applied, attempts to query the partitioned table without specifying a WHERE
clause produce the following error:
Cannot query over table 'project_id.dataset.table' without a
filter that can be used for partition elimination
.
For more information on adding the Require partition filter option when you create a partitioned table, see Creating partitioned tables.
If you do not enable the Require partition filter option when you create a partitioned table, you can update the table to add the option.
Updating the require partition filter option
To update a partitioned table to require queries that contain a WHERE
clause
that prunes partitions:
Console
You cannot use the Cloud Console to require partition filters after a partitioned table is created.
bq
To update a partitioned table to require partition filters by using the bq
command-line tool,
enter the bq update
command and supply the --require_partition_filter
flag.
To update a partitioned table in a project other than your default project, add the project ID to the dataset in the following format: project_id:dataset.
For example:
To update mypartitionedtable
in mydataset
in your default project,
enter:
bq update --require_partition_filter mydataset.mytable
To update mypartitionedtable
in mydataset
in myotherproject
,
enter:
bq update --require_partition_filter myotherproject:mydataset.mytable
API
Call the tables.patch
method and set the requirePartitionFilter
property to true
to require
partition filters. Because the tables.update
method replaces the entire
table resource, the tables.patch
method is preferred.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Updating the schema definition
The process for updating the schema definition for a partitioned table is the same as the process for updating the schema definition of a standard table. For more information, see Modifying table schemas.
Renaming a partitioned table
Currently, you cannot change the name of an existing partitioned table. If you need to change the table name, follow the steps to copy the table. When you specify the destination table in the copy operation, use the new table name.
Copying partitioned tables
Copying a single partitioned table
The process for copying a partitioned table is the same as the process for copying a standard table. For more information, see Copying a table.
When you copy a partitioned table, note the following:
The source and destination tables must be in datasets in the same location.
- Copying a partitioned table into a new, partitioned destination table
- If you copy a time-partitioned table to a new table, all of the partitioning information is copied with the table. The new table and the old table will have identical partitions.
- Copying a non-partitioned table into a partitioned table
- If you copy a non-partitioned table into a partitioned table, BigQuery copies the source data into the partition that represents the current date.
- Copying a partitioned table to another partitioned table
- To copy a partitioned table into another partitioned table, the partition specifications for the source and destination tables must match. You can specify whether to append or overwrite the destination table.
- Copying a partitioned table into a non-partitioned table
- If you copy a partitioned table into a non-partitioned table, the destination table remains unpartitioned. The data is either appended to the non-partitioned table or used to overwrite the non-partitioned table, depending on your settings.
Copying multiple partitioned tables
The process for copying multiple partitioned tables is the same as the process for copying multiple standard tables. For more information, see Copying multiple source tables.
When you copy multiple partitioned tables, note the following:
- If you copy multiple source tables into a partitioned table in the same job, the source tables can't contain a mixture of partitioned and non-partitioned tables.
- If all of the source tables are partitioned tables, the partition specifications for all source tables must match the destination table's partition specification. Your settings determine whether the destination table is appended or overwritten.
- The source and destination tables must be in datasets in the same location.
Copying partitions
You can copy one or more partitions by:
- Using the
bq
command-line tool'sbq cp
command - Calling the jobs.insert
API method and configuring a
copy
job - Using the client libraries
Currently, copying partitions is not supported by the Cloud Console.
Required permissions
At a minimum, to copy tables and partitions, you must be granted the following permissions.
On the source dataset:
bigquery.tables.get
bigquery.tables.getData
On the destination dataset:
bigquery.tables.create
to create the copy of the table or partition in the destination dataset
The following predefined IAM roles include
bigquery.tables.create
, bigquery.tables.get
, and bigquery.tables.getData
permissions:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
Also, to run the copy job, you must be
granted bigquery.jobs.create
permissions.
The following predefined IAM roles include bigquery.jobs.create
permissions:
bigquery.user
bigquery.jobUser
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that
user creates a dataset, they are granted bigquery.dataOwner
access to it.
bigquery.dataOwner
access gives the user the ability to copy tables and
partitions in the dataset, but access to the destination dataset is required
unless the user also created the destination dataset.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Copying a single partition
Console
Copying partitions is not supported by the Cloud Console.
bq
To copy a partition, use the bq
command-line tool's bq cp
(copy)
command with a partition decorator ($date
) such as
$20160201
.
Optional flags can be used to control the write disposition of the destination partition:
-a
or--append_table
appends the data from the source partition to an existing table or partition in the destination dataset.-f
or--force
overwrites an existing table or partition in the destination dataset and doesn't prompt you for confirmation.-n
or--no_clobber
returns the following error message if the table or partition exists in the destination dataset:Table '<var>project_id:dataset.table</var> or <var>table$date</var>' already exists, skipping.
If-n
is not specified, the default behavior is to prompt you to choose whether to replace the destination table or partition.--destination_kms_key
is the customer-managed Cloud KMS key used to encrypt the destination table or partition.
The cp
command does not support the --time_partitioning_field
or
--time_partitioning_type
flags. You cannot use a copy job to convert an
ingestion-time partitioned table into a partitioned table.
--destination_kms_key
is not demonstrated here. See
Protecting data with Cloud KMS keys
for more information.
If the source or destination dataset is in a project other than your default
project, add the project ID to the dataset names in the following format:
project_id:dataset
.
(Optional) Supply the --location
flag and set the value to your
location.
bq --location=location cp \ -a -f -n \ project_id:dataset.source_table$source_partition \ project_id:dataset.destination_table$destination_partition
Where:
- location is the name of your location. The
--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value toasia-northeast1
. You can set a default value for the location using the .bigqueryrc file. - project_id is your project ID.
- dataset is the name of the source or destination dataset.
- source_table is the table you're copying.
- source_partition is the partition decorator of the source partition.
- destination_table is the name of the table in the destination dataset.
- destination_partition is the partition decorator of the destination partition.
Examples:
Copying a partition to a new table
Enter the following command to copy the January 30, 2018 partition from
mydataset.mytable
to a new table — mydataset.mytable2
. mydataset
is in your default project.
bq cp -a 'mydataset.mytable$20180130' mydataset.mytable2
Copying a partition to a non-partitioned table
Enter the following command to copy the January 30, 2018 partition from
mydataset.mytable
to a non-partitioned table —
mydataset2.mytable2
. The -a
shortcut is used to append the partition's
data to the non-partitioned destination table. Both datasets are in your
default project.
bq cp -a 'mydataset.mytable$20180130' mydataset2.mytable2
Enter the following command to copy the January 30, 2018 partition from
mydataset.mytable
to a non-partitioned table —
mydataset2.mytable2
. The -f
shortcut is used to overwrite the
non-partitioned destination table without prompting.
bq --location=US cp -f 'mydataset.mytable$20180130' mydataset2.mytable2
Copying a partition to another partitioned table
Enter the following command to copy the January 30, 2018 partition from
mydataset.mytable
to another partitioned table —
mydataset2.mytable2
. The -a
shortcut is used to append the partition's
data to the destination table. Since no partition decorator is specified on
the destination table, the source partition key is preserved and the data is
copied to the January 30, 2018 partition in the destination table. You can
also specify a partition decorator on the destination table to copy data to
a specific partition. mydataset
is in your default project. mydataset2
is in myotherproject
, not your default project.
bq --location=US cp \
-a \
'mydataset.mytable$20180130' \
myotherproject:mydataset2.mytable2
Enter the following command to copy the January 30, 2018 partition from
mydataset.mytable
to the February 20, 2018 partition of another
partitioned table — mydataset2.mytable2
. The -f
shortcut is used
to overwrite the February 20, 2018 partition in the destination table
without prompting. If no partition decorator is used, all data in the
destination table is overwritten. mydataset
is in your default project.
mydataset2
is in myotherproject
, not your default project.
bq cp \
-f \
'mydataset.mytable$20180130' \
'myotherproject:mydataset2.mytable2$20180220'
Enter the following command to copy the January 30, 2018 partition from
mydataset.mytable
to another partitioned table —
mydataset2.mytable2
. mydataset
is in your default project. mydataset2
is in myotherproject
, not your default project. If there is data in the
destination table, the default behavior is to prompt you to overwrite.
bq cp \
'mydataset.mytable$20180130' \
myotherproject:mydataset2.mytable2
API
Call the jobs.insert
method, and configure a copy
job. (Optional) Specify your region in the
location
property in the jobReference
section of the
job resource.
Specify the following properties in your job configuration:
- Enter the source dataset, table, and partition in the
sourceTables
property. - Enter the destination dataset and table in the
destinationTable
property. - Use the
writeDisposition
property to specify whether to append or overwrite the destination table or partition.
Copying multiple partitions
To copy multiple partitions:
Console
Currently, copying partitions is not supported by the Cloud Console.
bq
Copying multiple partitions is the same process as copying a single partition, but you specify multiple source partitions as a comma-separated list:
bq cp \ 'mydataset.mytable$20180130,mydataset.mytable$20180131' \ myotherproject:mydataset.mytable2
API
Call the jobs.insert
method, and configure a copy
job. Specify your region in the location
property in the jobReference
section of the
job resource.
Specify the following properties in your job configuration:
- Enter multiple source partitions (including the dataset and table names)
in the
sourceTables
property. - Enter the destination dataset and table in the
destinationTable
property. - Use the
writeDisposition
property to specify whether to append or overwrite the destination table or partition.
Deleting a partitioned table
The process for deleting a time-partitioned table and all the partitions in it is the same as the process for deleting a standard table. For information on deleting a table, see Deleting tables.
Deleting partitions in partitioned tables
You can delete partitions in partitioned tables using the bq
command-line tool's
bq rm
command, or by calling the tables.delete
API method.
You can use the partition decorator to delete a specific partition. For example,
the partition for March 1, 2016 ($20160301
) in a partitioned table named
mydataset.mytable
can be deleted using the command:
bq rm 'mydataset.mytable$20160301'
To retrieve a list of partitions in a partitioned table, see Listing partitions in ingestion-time partitioned tables or Listing partitions in partitioned tables.
Currently, you can only delete one partition at a time.
Required permissions
At a minimum, to delete a partition, you must be granted
bigquery.tables.delete
and bigquery.tables.get
permissions. The following
predefined IAM roles include bigquery.tables.delete
and
bigquery.tables.get
permissions:
bigquery.dataOwner
bigquery.dataEditor
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that
user creates a dataset, they are granted bigquery.dataOwner
access to it.
bigquery.dataOwner
access gives the user the ability to delete tables and
partitions in the dataset.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Deleting a partition in a partitioned table
You can delete a partition by specifying the partition's decorator unless it is
one of the two
special partitions.
Currently, you cannot delete the __NULL__
or __UNPARTITIONED__
partitions.
To delete a partition in a partitioned table:
Console
Deleting partitions is not supported by the Cloud Console.
bq
Use the bq rm
command with the --table
flag (or -t
shortcut) and
reference the partition decorator ($date
) to delete
a specific partition in a partitioned table. When you use the bq
command-line tool to remove
a partition, you must confirm the action. You can use the --force
flag
(or -f
shortcut) to skip confirmation.
If the partitioned table is in a dataset in a project other than your
default project, add the project ID to the dataset name in the following
format: project_id:dataset
.
bq rm -f -t project_id:dataset.table$date
Where:
- project_id is your project ID.
- dataset is the name of the dataset that contains the table.
- table is the name of the table.
- $date is the partition decorator of the partition you're deleting.
Examples:
Enter the following command to delete the partition for March 1, 2016
($20160301
) in a partitioned table named mydataset.mytable
. mydataset
is in your default project.
bq rm 'mydataset.mytable$20160301'
Enter the following command to delete the partition for January 1, 2017
($20170101
) in a partitioned table named mydataset.mytable
. mydataset
is in myotherproject
, not your default project.
bq rm 'myotherproject:mydataset.mytable$20170101'
Enter the following command to delete the partition for January 18, 2018
($20180118
) in a partitioned table named mydataset.mytable
. mydataset
is in myotherproject
, not your default project. The -f
shortcut is used
to skip confirmation.
bq rm -f 'myotherproject:mydataset.mytable$20180118'
API
Call the tables.delete
method and specify the table and partition decorator using the tableId
parameter.