Introduction to partitioned tables

This page provides an overview of partitioned table support in BigQuery.

A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.

You can partition BigQuery tables by:

Ingestion-time partitioned tables

When you create a table partitioned by ingestion time, BigQuery automatically loads data into daily, date-based partitions that reflect the data's ingestion or arrival date. Pseudo column and suffix identifiers allow you to restate (replace) and redirect data to partitions for a specific day.

Ingestion-time partitioned tables include a pseudo column named _PARTITIONTIME that contains a date-based timestamp for data that is loaded into the table. Queries against time-partitioned tables can restrict the data read by supplying _PARTITIONTIME filters that represent a partition's location. All the data in the specified partition is read by the query, but the _PARTITIONTIME predicate filter restricts the number of partitions scanned.

When you create ingestion-time partitioned tables, the partitions have the same schema definition as the table. If you need to load data into a partition with a schema that is not the same as the schema of the table, you must update the schema of the table before loading the data. Alternatively, you can use schema update options to update the schema of the table in a load job or query job.

Date or timestamp partitioned tables

BigQuery also allows partitioned tables based on a specific DATE or TIMESTAMP column. Data written to a date/timestamp partitioned table is automatically delivered to the appropriate partition based on the date value (expressed in UTC) in the partitioning column.

Each partition in a date/timestamp partitioned table can be thought of as a range where the start of the range is the beginning of a day, and the interval of the range is one day.

Date/timestamp partitioned tables do not need a _PARTITIONTIME pseudo column. Queries against date/timestamp partitioned tables can specify predicate filters based on the partitioning column to reduce the amount of data scanned.

When you create date/timestamp partitioned tables, two special partitions are created:

  • The __NULL__ partition: represents rows with NULL values in the partitioning column
  • The __UNPARTITIONED__ partition: represents data that exists outside the allowed range of dates

With the exception of the __NULL__ and __UNPARTITIONED__ partitions, all data in the partitioning column matches the date of the partition identifier. This allows a query to determine which partitions contain no data that satisfies the filter conditions. Queries that filter data on the partitioning column can restrict values and completely prune unnecessary partitions.

Date/timestamp partitioning versus sharding

As an alternative to date/timestamp partitioned tables, you can shard tables using a time-based naming approach such as [PREFIX]_YYYYMMDD. This is referred to as creating date-sharded tables. Using either standard SQL or legacy SQL, you can specify a query with a UNION operator to limit the tables scanned by the query.

Date/timestamp partitioned tables perform better than tables sharded by date. When you create date-named tables, BigQuery must maintain a copy of the schema and metadata for each date-named table. Also, when date-named tables are used, BigQuery might be required to verify permissions for each queried table. This practice also adds to query overhead and impacts query performance. The recommended best practice is to use date/timestamp partitioned tables instead of date-sharded tables.

Comparing partitioning options

The following table compares sharded tables and date/timestamp partitioned tables.

Capability Sharded tables Ingestion-time partitioned tables Partitioned tables
Partitioning method None: Sharding tables and querying them using a UNION operator can simulate partitioning. Partitioned based on the ingestion or arrival date of the data. Partition information can be referenced using a pseudo column. Partitioned based on data in a specified TIMESTAMP or DATE column.
Partition identifiers None You can use any valid date between 0001-01-01 and 9999-12-31, but DML statements cannot reference dates prior to 1970-01-01 or after 2159-12-31. A valid entry from the bound DATE or TIMESTAMP column. Currently, date values prior to 1960-01-01 and later than 2159-12-31 are placed in a shared UNPARTITIONED partition. NULL values reside in an explicit NULL partition.
Limiting scanned data Reference only the shards you need and limit the data by excluding unnecessary columns from the query. Use the _PARTITIONTIME pseudo column to prune partitions. Use predicate filters on the partitioning column.
Number of partitions The number of tables is unrestricted, but queries can only reference up to 1,000 tables. Up to 4,000 partitions. Up to 4,000 partitions.
Update operations You are limited to 1,000 updates per day. An individual operation can commit to a single partition. The latest partition (by default), or one specified using a partition decorator such as [TABLE]$[DATE]. An individual operation can commit data into up to 2,000 distinct partitions.
Streaming inserts One global buffer for the table. Using partition suffixes, you can stream to partitions within the last 31 days in the past and 16 days in the future relative to the current date, based on current UTC time. You can stream data between 1 year in the past and 6 months in the future. Data outside of this range is rejected. When the data is streamed, data between 7 days in the past and 3 days in the future is placed in the streaming buffer, and then it is extracted to the corresponding partitions. Data outside of this window (but inside the 1 year, 6 month range) is placed in the UNPARTITIONED partition. When there's enough unpartitioned data, it is loaded to the corresponding partitions.
Time zone evaluation Defined by user semantics UTC UTC

Integer range partitioned tables

BigQuery allows partitioned tables based on a specific INTEGER column, with your choice of start, end, and interval values. Queries against integer range partitioned tables can specify predicate filters based on the partitioning column to reduce the amount of data scanned.

To create an integer range partitioned table, you provide:

  • the column used to create the integer range partitions
  • the start of range partitioning (inclusive)
  • the end of range partitioning (exclusive)
  • the interval of each range within the partition

Values that are outside of the range of the table go into the UNPARTITIONED partition.

For example, if you create an integer range partition with the following values:

Argument Value
column name customer_id
start 0
end 100
interval 10

The table will be partitioned on the customer_id column into ranges of interval 10. The values 0 to 9 will be in one partition, values 10 to 19 in another partition, ..., and finally values 90 to 99 will be in another partition. Values outside of 0 to 99 (such as -1 or 100) will be in the UNPARTITIONED partition. Null values will be in the NULL partition.

When you create integer range partitioned tables, two special partitions are created:

  • The __NULL__ partition: represents rows with NULL values in the partitioning column
  • The __UNPARTITIONED__ partition: represents data that exists outside the allowed range of the integer start and interval

With the exception of the __NULL__ and __UNPARTITIONED__ partitions, all data in the partitioning column is within the range of the integer start and interval. This allows a query to determine which partitions contain no data that satisfies the filter conditions. Queries that filter data on the partitioning column can restrict values and completely prune unnecessary partitions.

The limit on the number of possible ranges between the start and end values is 100,000. However, the number of ranges with data is limited to 4,000 partitions per table, since each range is a partition.

Integer range partitioning versus clustering

Both integer range partitioning and clustering can improve performance and reduce query cost. They have key differences and use cases.

Use integer range partitioning if you want to:

  • Explicitly define the ranges used to partition the table. You specify how the data will be partitioned, and what data is in each partition.

  • Know query cost before a query runs. Partition pruning is done before the query runs, so you can get the query cost after partitioning pruning through a dry run. Cluster pruning is done when the query runs, so the cost is known only after the query finishes.

  • Address a partition, such as when you want to load data to a specific partition, or wipe out data for a specific partition.

Use clustering if you:

  • Do not care how the data will be clustered, as long as you get the potential performance improvement and cost reduction. BigQuery will automatically figure out how the data should be clustered for optimal performance and cost.

  • Need more than 100,000 partitions. BigQuery has a limit of 100,000 partitions for a partitioned table. There is no limit for the number of clusters in a table.

Note that you can partition and cluster on the same integer column, to get the benefits of both. Data will first partition according to the specified integer ranges. Within each range, if the volume of data is large enough, the data will also be clustered. When the table is queried, partitioning sets an upper bound of the query cost based on partition pruning. There might be other query cost savings when the query actually runs.

Using require_partitioning_filter

With the release of integer range partitioning, BigQuery now supports multiple types of partitioning types:

  • Ingestion time
  • Date/timestamp
  • Integer range

To simplify the BigQuery API, we moved the require_partitioning_filter parameter out of the partitioning type level, and into the table level. For backwards compatibility of date/timestamp partitioning, the require_partitioning_filter is still supported at the partition level. It can also be specified at the table level. For integer range partitioning, you can specify require_partitioning_filter only at the table level. The bq command-line tool already uses the table level option, so there is no change to how you use the bq command. If you use the BigQuery API, you need to use the require_partitioning_filter option at the table level.

Partitioned table quotas and limits

Partitioned tables have defined limits in BigQuery.

Quotas and limits also apply to the different types of jobs you can run against partitioned tables, including:

For more information on all quotas and limits, see Quotas and limits.

Partitioned table pricing

When you create and use partitioned tables in BigQuery, your charges are based on how much data is stored in the partitions and on the queries you run against the data:

Many partitioned table operations are free, including loading data into partitions, copying partitions, and exporting data from partitions. Though free, these operations are subject to BigQuery's Quotas and limits. For information on all free operations, see Free operations on the pricing page.

Next steps

Was this page helpful? Let us know how we did:

Send feedback about...

Need help? Visit our support page.