Introduction to partitioned tables

This page provides an overview of partitioned tables in BigQuery.

A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.

You can partition BigQuery tables by:

  • Time-unit column: Tables are partitioned based on a TIMESTAMP, DATE, or DATETIME column in the table.

  • Ingestion time: Tables are partitioned based on the timestamp when BigQuery ingests the data.

  • Integer range: Tables are partitioned based on an integer column.

If a query filters on the value of the partitioning column, BigQuery can scan the partitions that match the filter and skip the remaining partitions. This process is called pruning.

Time-unit column partitioning

You can partition a table on a DATE,TIMESTAMP, or DATETIME column in the table. When you write data to the table, BigQuery automatically puts the data into the correct partition, based on the values in the column.

For TIMESTAMP and DATETIME columns, the partitions can have either hourly, daily, monthly, or yearly granularity. For DATE columns, the partitions can have daily, monthly, or yearly granularity. Partitions boundaries are based on UTC time.

For example, suppose that you partition a table on a DATETIME column with monthly partitioning. If you insert the following values into the table, the rows are written to the following partitions:

Column value Partition (monthly)
DATETIME("2019-01-01") 201901
DATETIME("2019-01-15") 201901
DATETIME("2019-04-30") 201904

In addition, two special partitions are created:

  • __NULL__: Contains rows with NULL values in the partitioning column.
  • __UNPARTITIONED__: Contains rows where the value of the partitioning column is earlier than 1960-01-01 or later than 2159-12-31.

Ingestion time partitioning

When you create a table partitioned by ingestion time, BigQuery automatically assigns rows to partitions based on the time when BigQuery ingests the data. You can choose hourly, daily, monthly, or yearly granularity for the partitions. Partitions boundaries are based on UTC time.

An ingestion-time partitioned table has a pseudocolumn named _PARTITIONTIME. The value of this column is the ingestion time for each row, truncated to the partition boundary (such as hourly or daily). For example, suppose that you create an ingestion-time partitioned table with hourly partitioning and send data at the following times:

Ingestion time _PARTITIONTIME Partition (hourly)
2021-05-07 17:22:00 2021-05-07 17:00:00 2021050717
2021-05-07 17:40:00 2021-05-07 17:00:00 2021050717
2021-05-07 18:31:00 2021-05-07 18:00:00 2021050718

Because the table in this example uses hourly partitioning, the value of _PARTITIONTIME is truncated to an hour boundary. BigQuery uses this value to determine the correct partition for the data.

You can also write data to a specific partition. For example, you might want to load historical data or adjust for time zones. You can use any valid date between 0001-01-01 and 9999-12-31. However, DML statements cannot reference dates prior to 1970-01-01 or after 2159-12-31. For more information, see Write data to a specific partition.

Instead of using _PARTITIONTIME, you can also use _PARTITIONDATE. The _PARTITIONDATE pseudocolumn contains the UTC date corresponding to the value in the _PARTITIONTIME pseudocolumn.

Integer range partitioning

You can partition a table based on ranges of values in a specific INTEGER column. To create an integer-range partitioned table, you provide:

  • The partitioning column.
  • The starting value for range partitioning (inclusive).
  • The ending value for range partitioning (exclusive).
  • The interval of each range within the partition.

For example, suppose you create an integer range partition with the following specification:

Argument Value
column name customer_id
start 0
end 100
interval 10

The table is partitioned on the customer_id column into ranges of interval 10. The values 0 to 9 go into one partition, values 10 to 19 go into the next partition, etc., up to 99. Values outside this range go into a partition named __UNPARTITIONED__. Any rows where customer_id is NULL go into a partition named __NULL__.

Choosing daily, hourly, monthly, or yearly partitioning.

When you partition a table by time-unit column or ingestion time, you choose whether the partitions have daily, hourly, monthly, or yearly granularity.

  • Daily partitioning is the default partitioning type. Daily partitioning is a good choice when your data is spread out over a wide range of dates, or if data is continuously added over time.

  • Choose hourly partitioning if your tables have a high volume of data that spans a short date range — typically less than six months of timestamp values. If you choose hourly partitioning, make sure the partition count stays within the partition limits.

  • Choose monthly or yearly partitioning if your tables have a relatively small amount of data for each day, but span a wide date range. This option is also recommended if your workflow requires frequently updating or adding rows that span a wide date range (for example, more than 500 dates). In these scenarios, use monthly or yearly partitioning along with clustering on the partitioning column to achieve the best performance. For more information, see Partitioning versus clustering on this page.

Partitioning versus clustering

Both partitioning and clustering can improve performance and reduce query cost.

Use clustering under the following circumstances:

  • You don't need strict cost guarantees before running the query.
  • You need more granularity than partitioning alone allows. To get clustering benefits in addition to partitioning benefits, you can use the same column for both partitioning and clustering.
  • Your queries commonly use filters or aggregation against multiple particular columns.
  • The cardinality of the number of values in a column or group of columns is large.

Use partitioning under the following circumstances:

  • You want to know query costs before a query runs. Partition pruning is done before the query runs, so you can get the query cost after partitioning pruning through a dry run. Cluster pruning is done when the query runs, so the cost is known only after the query finishes.

  • You need partition-level management. For example, you want to set a partition expiration time, load data to a specific partition, or delete partitions.

  • You want to specify how the data is partitioned and what data is in each partition. For example, you want to define time granularity or define the ranges used to partition the table for integer range partitioning.

Prefer clustering over partitioning under the following circumstances:

  • Partitioning results in a small amount of data per partition (approximately less than 1 GB).
  • Partitioning results in a large number of partitions beyond the limits on partitioned tables.
  • Partitioning results in your mutation operations modifying most partitions in the table frequently (for example, every few minutes).

You can also combine partitioning with clustering. Data is first partitioned and then data in each partition is clustered by the clustering columns.

When the table is queried, partitioning sets an upper bound of the query cost based on partition pruning. There might be other query cost savings when the query actually runs, because of cluster pruning.

Partitioning versus sharding

Table sharding is the practice of storing data in multiple tables, using a naming prefix such as [PREFIX]_YYYYMMDD.

Partitioning is recommended over table sharding, because partitioned tables perform better. With sharded tables, BigQuery must maintain a copy of the schema and metadata for each table. BigQuery might also need to verify permissions for each queried table. This practice also adds to query overhead and affects query performance.

If you previously created date-sharded tables, you can convert them into an ingestion-time partitioned table. For more information, see Convert date-sharded tables into ingestion-time partitioned tables.

Limitations

You cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.

Time-unit column-partitioned tables are subject to the following limitations:

  • The partitioning column must be either a scalar DATE, TIMESTAMP, or DATETIME column. While the mode of the column can be REQUIRED or NULLABLE, it cannot be REPEATED (array-based).
  • The partitioning column must be a top-level field. You cannot use a leaf field from a RECORD (STRUCT) as the partitioning column.

Integer-range partitioned tables are subject to the following limitations:

  • The partitioning column must be an INTEGER column. While the mode of the column may be REQUIRED or NULLABLE, it cannot beREPEATED (array-based).
  • The partitioning column must be a top-level field. You cannot use a leaf field from a RECORD (STRUCT) as the partitioning column.

Partitioned table quotas and limits

Partitioned tables have defined limits in BigQuery.

Quotas and limits also apply to the different types of jobs you can run against partitioned tables, including:

For more information on all quotas and limits, see Quotas and limits.

Partitioned table pricing

When you create and use partitioned tables in BigQuery, your charges are based on how much data is stored in the partitions and on the queries you run against the data:

Many partitioned table operations are free, including loading data into partitions, copying partitions, and exporting data from partitions. Though free, these operations are subject to BigQuery's Quotas and limits. For information on all free operations, see Free operations on the pricing page.

For best practices for controlling costs in BigQuery, see Controlling costs in BigQuery

Partitioned table security

Access control for partitioned tables is the same as access control for standard tables. For more information, see Introduction to table access controls.

What's next