A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and reduce the number of bytes that are billed by restricting the amount of data that is scanned. BigQuery offers date-partitioned tables, which means that the table is divided into a separate partition for each date.
Schema of partitions
A partition does not have a schema by itself. All partitions in a partitioned table share the schema of the table. If you need to load data into a partition with a schema that is not the same as the schema of the table, you need to update the schema of the table before loading the data. Alternatively, you can use schema update options to update the schema of the table as part of a load or query job. See configuration.load.schemaUpdateOptions.
The _PARTITIONTIME pseudo column
Partitioned tables include a pseudo column named
_PARTITIONTIME that contains
a date-based timestamp for data that is loaded into the table. The timestamp is
based on UTC time and represents the number of microseconds since the unix epoch.
For example, if data is appended to a table on April 15, 2016, all of the rows
of data that are appended on that day contain the value
TIMESTAMP("2016-04-15") in the
_PARTITIONTIME column name is reserved, which means that you cannot
create a column by that name in any of your tables.
To select the value of
_PARTITIONTIME, you must use an alias. For example,
the following query selects
_PARTITIONTIME by assigning the alias
the pseudo column:
SELECT _PARTITIONTIME AS pt, field1 FROM mydataset.table1
Data in the streaming buffer has
a NULL value for the
The __UNPARTITIONED__ partition
__UNPARTITIONED__ partition temporarily holds data that is streamed to a
partitioned table while it is in the streaming buffer. Data that is streamed
directly to a specific partition of a partitioned table does not use the
__UNPARTITIONED__ partition. Instead, the data is streamed directly to the
partition. See streaming into partitioned tables.
To query data in the
__UNPARTITIONED__ partition, use the
pseudo column with the NULL value.
SELECT field1 FROM [DATASET_ID].[PARTITIONED_TABLE] WHERE _PARTITIONTIME IS NULL
[DATASET_ID] is the name of the dataset that stores the table and
[PARTITIONED_TABLE] is the name of the partitioned table.
Addressing table partitions
With legacy SQL, you can use partition decorators to reference data from a specific partition. Partition decorators take the form:
where [TABLE_NAME] is the name of a partitioned table, and
a date. For example, the partition for May 19, 2016 in a table named
mydataset.table can be referenced using:
The partition decorator separator ($) is a special variable in the unix shell. You might have to escape the decorator when you use the command-line tool. For example, the following commands escape the partition decorator:
bq query 'SELECT * from mydataset.table$20160519'
bq query "SELECT * from mydataset.table\$20160519"
BigQuery treats a table name with a partition decorator as a table, which means that a table name with a partition decorator can be used as a table in any api that receives a table name.
Partition expiration is based on the elapsed time since the partition date. Partitions are deleted a set amount of time after the partition date, and the expiration happens regardless of when the partition was updated.
For example, if the partition expiration is set to 60 days and data is written to partition 20170101 on January 30, 2017, then the data expires on January 1, 2017 + 60 days.
For projects that have partitioned tables that were created before December 13, 2016, the partition expiration is based on the last date that the partition was modified. This behavior also applies to new tables created in these projects. To migrate your project to the new behavior, open a request in the BigQuery issue tracker.
For information on setting the expiration date, see Setting the expiration of a partition.
Partitioned table limits
Each partitioned table can have up to 2,500 partitions.
Daily limit: 2,000 partition updates per table, per day.
Rate limit: 50 partition updates every 10 seconds.
Long-term storage pricing
Each partition of a partitioned table is considered separately for long-term storage pricing. If a partition hasn't been modified in the last 90 days, the data in that partition is considered long term storage and is charged at the discounted price.
For more information on partitioned table pricing, see long term storage pricing.