Introduction to partitioned tables

This page provides an overview of partitioned table support in BigQuery.

A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.

You can partition BigQuery tables by:

Ingestion-time partitioned tables

When you create a table partitioned by ingestion time, BigQuery automatically loads data into daily, date-based partitions that reflect the data's ingestion or arrival time. Pseudo column and suffix identifiers allow you to restate (replace) and redirect data to partitions for a specific day.

Ingestion-time partitioned tables include a pseudo column named _PARTITIONTIME that contains a date-based timestamp for data that is loaded into the table. Queries against time partitioned tables can restrict the data read by supplying _PARTITIONTIME filters that represent a partition's location. All the data in the specified partition is read by the query, but the _PARTITIONTIME predicate filter restricts the number of partitions scanned.

When you create ingestion-time partitioned tables, the partitions have the same schema definition as the table. If you need to load data into a partition with a schema that is not the same as the schema of the table, you must update the schema of the table before loading the data. Alternatively, you can use schema update options to update the schema of the table in a load job or query job.

Date, timestamp, or datetime partitioned tables

BigQuery also allows partitioned tables based on a specific DATE,TIMESTAMP, or DATETIME column. Data written to a date/timestamp/datetime partitioned table is automatically delivered to the appropriate partition based on the time-unit value (expressed in UTC for TIMESTAMP) specified in the partitioning column.

If the table is partitioned on a DATE column, you can create partitions with either daily, monthly, or yearly granularity. Each partition contains a range of values where the start of the range is the beginning of a day, month, or year, and the interval of the range is one day, month, or year based on the partitioning granularity. If the table is partitioned on a TIMESTAMP or DATETIME column, you can create partitions with any time-unit granularity type, including HOUR.

Date/timestamp/datetime partitioned tables do not need a _PARTITIONTIME pseudo column. Queries against date/timestamp/datetime partitioned tables can specify predicate filters based on the partitioning column to reduce the amount of data scanned.

When you create date/timestamp/datetime partitioned tables, two special partitions are created:

  • The __NULL__ partition: represents rows with NULL values in the partitioning column
  • The __UNPARTITIONED__ partition: represents data that exists outside the allowed range of dates

With the exception of the __NULL__ and __UNPARTITIONED__ partitions, all data in the partitioning column matches the date of the partition identifier. This allows a query to determine which partitions contain no data that satisfies the filter conditions. Queries that filter data on the partitioning column can restrict values and completely prune unnecessary partitions.

Date/timestamp/datetime partitioning versus sharding

As an alternative to date/timestamp/datetime partitioned tables, you can shard tables using a time-based naming approach such as [PREFIX]_YYYYMMDD. This is referred to as creating date-sharded tables. Using either standard SQL or legacy SQL, you can specify a query with a UNION operator to limit the tables scanned by the query.

Date/timestamp/datetime partitioned tables perform better than tables sharded by date. When you create date-named tables, BigQuery must maintain a copy of the schema and metadata for each date-named table. Also, when date-named tables are used, BigQuery might be required to verify permissions for each queried table. This practice also adds to query overhead and impacts query performance. The recommended best practice is to use date/timestamp/datetime partitioned tables instead of date-sharded tables.

Comparing partitioning options

The following table compares sharded tables and date/timestamp/datetime partitioned tables.

Capability Sharded tables Ingestion-time partitioned tables Partitioned tables
Partitioning method None: Sharding tables and querying them using a UNION operator can simulate partitioning. Partitioned based on the ingestion or arrival date of the data. Partition information can be referenced using a pseudo column. Partitioned based on data in a specified TIMESTAMP, DATE or DATETIME column.

For hourly partitioned tables, only TIMESTAMP and DATETIME columns are supported for partitioning.
Partition identifiers None You can use any valid date between 0001-01-01 and 9999-12-31, but DML statements cannot reference dates prior to 1970-01-01 or after 2159-12-31. A valid entry from the bound DATE, TIMESTAMP or DATETIME column. Currently, date values prior to 1960-01-01 and later than 2159-12-31 are placed in a shared UNPARTITIONED partition. NULL values reside in an explicit NULL partition.

Partitioning identifiers must follow the following formats:
  • yyyyMMddHH for hourly partitioning.
  • yyyyMMdd for daily partitioning.
  • yyyyMM for monthly partitioning.
  • yyyy for yearly partitioning.
Limiting scanned data Reference only the shards you need and limit the data by excluding unnecessary columns from the query. Use the _PARTITIONTIME pseudo column to prune partitions. Use predicate filters on the partitioning column.
Number of partitions The number of tables is unrestricted, but queries can only reference up to 1,000 tables. Up to 4,000 partitions. Up to 4,000 partitions.
Update operations You are limited to 1,000 updates per day. An individual operation can commit to a single partition. The latest partition (by default), or one specified using a partition decorator such as [TABLE]$[DATE]. An individual operation can commit data into up to 2,000 distinct partitions.
Streaming inserts You can stream to any shard table, but have to manually specify the shard. Data is initially placed in the UNPARTITIONED partition, and then extracted to the current date by default. Using partition suffixes, you can also stream directly to partitions within the last 31 days in the past and 16 days in the future, relative to the current date, based on current UTC time. You can stream data between 5 years in the past and 1 year in the future. Data outside of this range is rejected. Data within this range is initially placed in the UNPARTITIONED partition. When there's enough unpartitioned data, BigQuery automatically repartitions the data.
Time zone evaluation Defined by user semantics UTC UTC

Integer range partitioned tables

BigQuery allows partitioned tables based on a specific INTEGER column, with your choice of start, end, and interval values. Queries against integer range partitioned tables can specify predicate filters based on the partitioning column to reduce the amount of data scanned.

To create an integer range partitioned table, you provide:

  • the column used to create the integer range partitions
  • the start of range partitioning (inclusive)
  • the end of range partitioning (exclusive)
  • the interval of each range within the partition

Values that are outside of the range of the table go into the UNPARTITIONED partition.

For example, if you create an integer range partition with the following values:

Argument Value
column name customer_id
start 0
end 100
interval 10

The table will be partitioned on the customer_id column into ranges of interval 10. The values 0 to 9 will be in one partition, values 10 to 19 in another partition, ..., and finally values 90 to 99 will be in another partition. Values outside of 0 to 99 (such as -1 or 100) will be in the UNPARTITIONED partition. Null values will be in the NULL partition.

When you create integer range partitioned tables, two special partitions are created:

  • The __NULL__ partition: represents rows with NULL values in the partitioning column
  • The __UNPARTITIONED__ partition: represents data that exists outside the allowed range of the integer start and interval

With the exception of the __NULL__ and __UNPARTITIONED__ partitions, all data in the partitioning column is within the range of the integer start and interval. This allows a query to determine which partitions contain no data that satisfies the filter conditions. Queries that filter data on the partitioning column can restrict values and completely prune unnecessary partitions.

The limit on the number of possible ranges between the start and end values is 10,000. However, the number of ranges with data is limited to 4,000 partitions per table, since each range is a partition.

Integer range partitioning versus clustering

Both integer range partitioning and clustering can improve performance and reduce query cost. They have key differences and use cases.

Use integer range partitioning if you want to:

  • Explicitly define the ranges used to partition the table. You specify how the data will be partitioned, and what data is in each partition.

  • Know query cost before a query runs. Partition pruning is done before the query runs, so you can get the query cost after partitioning pruning through a dry run. Cluster pruning is done when the query runs, so the cost is known only after the query finishes.

  • Address a partition, such as when you want to load data to a specific partition, or wipe out data for a specific partition.

Use clustering if you:

  • Do not care how the data will be clustered, as long as you get the potential performance improvement and cost reduction. BigQuery will automatically figure out how the data should be clustered for optimal performance and cost.

  • Need more than 4,000 partitions. BigQuery has a limit of 4,000 partitions for a partitioned table. There is no limit for the number of clusters in a table.

Note that you can partition and cluster on the same integer column, to get the benefits of both. Data will first partition according to the specified integer ranges. Within each range, if the volume of data is large enough, the data will also be clustered. When the table is queried, partitioning sets an upper bound of the query cost based on partition pruning. There might be other query cost savings when the query actually runs.

Using require_partitioning_filter

With the release of integer range partitioning, BigQuery now supports multiple types of partitioning types:

  • Ingestion time
  • Date/timestamp/datetime
  • Integer range

To simplify the BigQuery API, we moved the require_partitioning_filter parameter out of the partitioning type level, and into the table level. For backwards compatibility of date/timestamp/datetime partitioning, the require_partitioning_filter is still supported at the partition level. It can also be specified at the table level. For integer range partitioning, you can specify require_partitioning_filter only at the table level. The bq command-line tool already uses the table level option, so there is no change to how you use the bq command. If you use the BigQuery API, you need to use the require_partitioning_filter option at the table level.

Partitioned table quotas and limits

Partitioned tables have defined limits in BigQuery.

Quotas and limits also apply to the different types of jobs you can run against partitioned tables, including:

For more information on all quotas and limits, see Quotas and limits.

Partitioned table pricing

When you create and use partitioned tables in BigQuery, your charges are based on how much data is stored in the partitions and on the queries you run against the data:

Many partitioned table operations are free, including loading data into partitions, copying partitions, and exporting data from partitions. Though free, these operations are subject to BigQuery's Quotas and limits. For information on all free operations, see Free operations on the pricing page.

Next steps