This page provides an overview of partitioned table support in BigQuery.
A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.
You can partition BigQuery tables by:
Ingestion time: Tables are partitioned based on the data's ingestion (load) time or arrival time.
Date/timestamp/datetime: Tables are partitioned based on a
TIMESTAMP
,DATE
, orDATETIME
column.Integer range: Tables are partitioned based on an integer column.
Ingestion-time partitioned tables
When you create a table partitioned by ingestion time, BigQuery automatically loads data into daily, date-based partitions that reflect the data's ingestion or arrival time. Pseudo column and suffix identifiers allow you to restate (replace) and redirect data to partitions for a specific day.
Ingestion-time partitioned tables include a pseudo column named _PARTITIONTIME
that contains a date-based timestamp for data that is loaded into the table.
Queries against time partitioned tables can restrict the data read by supplying
_PARTITIONTIME
filters that represent a partition's location. All the data in
the specified partition is read by the query, but the _PARTITIONTIME
predicate
filter restricts the number of partitions scanned.
When you create ingestion-time partitioned tables, the partitions have the same schema definition as the table. If you need to load data into a partition with a schema that is not the same as the schema of the table, you must update the schema of the table before loading the data. Alternatively, you can use schema update options to update the schema of the table in a load job or query job.
Date, timestamp, or datetime partitioned tables
BigQuery also allows partitioned tables based on a specific
DATE
,TIMESTAMP
, or DATETIME
column. Data written to a
date/timestamp/datetime partitioned table is automatically delivered to the
appropriate partition based on the time-unit value (expressed in UTC for
TIMESTAMP
) specified in the partitioning column.
If the table is partitioned on a DATE
column, you can create partitions with
either daily, monthly, or yearly granularity. Each partition contains a range
of values where the start of the range is the beginning of a day, month, or
year, and the interval of the range is one day, month, or year based on the
partitioning granularity. If the table is partitioned on a TIMESTAMP
or
DATETIME
column, you can create partitions with any time-unit granularity
type, including HOUR
.
Date/timestamp/datetime partitioned tables do not need a _PARTITIONTIME
pseudo
column. Queries against date/timestamp/datetime partitioned tables can specify
predicate filters based on the partitioning column to reduce the amount of data
scanned.
When you create date/timestamp/datetime partitioned tables, two special partitions are created:
- The
__NULL__
partition: represents rows withNULL
values in the partitioning column - The
__UNPARTITIONED__
partition: represents data that exists outside the allowed range of dates
With the exception of the __NULL__
and __UNPARTITIONED__
partitions, all
data in the partitioning column matches the date of the partition identifier.
This allows a query to determine which partitions contain no data that satisfies
the filter conditions. Queries that filter data on the partitioning column can
restrict values and completely prune unnecessary partitions.
Date/timestamp/datetime partitioning versus sharding
As an alternative to date/timestamp/datetime partitioned tables, you can shard
tables using a time-based naming approach such as [PREFIX]_YYYYMMDD
. This is
referred to as creating date-sharded tables. Using either standard SQL or legacy
SQL, you can specify a query with a UNION
operator to limit the tables scanned
by the query.
Date/timestamp/datetime partitioned tables perform better than tables sharded by date. When you create date-named tables, BigQuery must maintain a copy of the schema and metadata for each date-named table. Also, when date-named tables are used, BigQuery might be required to verify permissions for each queried table. This practice also adds to query overhead and impacts query performance. The recommended best practice is to use date/timestamp/datetime partitioned tables instead of date-sharded tables.
Comparing partitioning options
The following table compares sharded tables and date/timestamp/datetime partitioned tables.
Capability | Sharded tables | Ingestion-time partitioned tables | Partitioned tables |
---|---|---|---|
Partitioning method | None: Sharding tables and querying them using a
UNION
operator can simulate partitioning. |
Partitioned based on the ingestion or arrival date of the data. Partition information can be referenced using a pseudo column. | Partitioned based on data in a specified
TIMESTAMP ,
DATE
or DATETIME
column.For hourly partitioned tables, only TIMESTAMP
and DATETIME columns are supported for partitioning. |
Partition identifiers | None | You can use any valid date between 0001-01-01 and 9999-12-31, but
DML
statements cannot reference dates prior to 1970-01-01 or after 2159-12-31. |
A valid entry from the bound DATE , TIMESTAMP
or DATETIME column. Currently, date values prior to
1960-01-01 and later than 2159-12-31 are placed in a shared
UNPARTITIONED partition.
NULL values reside in an explicit NULL
partition.Partitioning identifiers must follow the following formats:
|
Limiting scanned data | Reference only the shards you need and limit the data by excluding unnecessary columns from the query. | Use the _PARTITIONTIME pseudo column to prune partitions. |
Use predicate filters on the partitioning column. |
Number of partitions | The number of tables is unrestricted, but queries can only reference up to 1,000 tables. | Up to 4,000 partitions. | Up to 4,000 partitions. |
Update operations | You are limited to 1,000 updates per day. | An individual operation can commit to a single partition. The latest
partition (by default), or one specified using a partition decorator such
as [TABLE]$[DATE] . |
An individual operation can commit data into up to 2,000 distinct partitions. |
Streaming inserts | You can stream to any shard table, but have to manually specify the shard. | Data is initially placed in the UNPARTITIONED partition,
and then extracted to the current date by default. Using partition
suffixes, you can also stream directly to partitions within the last
31 days in the past and 16 days in the future, relative to the current
date, based on current UTC time. |
You can stream data between 5 years in the past and 1 year in the
future. Data outside of this range is rejected. Data within this range is
initially placed in the UNPARTITIONED partition. When there's
enough unpartitioned data, BigQuery automatically
repartitions the data.
|
Time zone evaluation | Defined by user semantics | UTC | UTC |
Integer range partitioned tables
BigQuery allows partitioned tables based on a specific INTEGER
column, with your choice of start, end, and interval values. Queries against
integer range partitioned tables can specify predicate filters based on the
partitioning column to reduce the amount of data scanned.
To create an integer range partitioned table, you provide:
- the column used to create the integer range partitions
- the start of range partitioning (inclusive)
- the end of range partitioning (exclusive)
- the interval of each range within the partition
Values that are outside of the range of the table go into the UNPARTITIONED partition.
For example, if you create an integer range partition with the following values:
Argument | Value |
---|---|
column name | customer_id |
start | 0 |
end | 100 |
interval | 10 |
The table will be partitioned on the customer_id column into ranges of interval 10. The values 0 to 9 will be in one partition, values 10 to 19 in another partition, ..., and finally values 90 to 99 will be in another partition. Values outside of 0 to 99 (such as -1 or 100) will be in the UNPARTITIONED partition. Null values will be in the NULL partition.
When you create integer range partitioned tables, two special partitions are created:
- The
__NULL__
partition: represents rows withNULL
values in the partitioning column - The
__UNPARTITIONED__
partition: represents data that exists outside the allowed range of the integer start and interval
With the exception of the __NULL__
and __UNPARTITIONED__
partitions, all
data in the partitioning column is within the range of the integer start and
interval. This allows a query to determine which partitions contain no data that
satisfies the filter conditions. Queries that filter data on the partitioning
column can restrict values and completely prune unnecessary partitions.
The limit on the number of possible ranges between the start and end values is 10,000. However, the number of ranges with data is limited to 4,000 partitions per table, since each range is a partition.
Integer range partitioning versus clustering
Both integer range partitioning and clustering can improve performance and reduce query cost. They have key differences and use cases.
Use integer range partitioning if you want to:
Explicitly define the ranges used to partition the table. You specify how the data will be partitioned, and what data is in each partition.
Know query cost before a query runs. Partition pruning is done before the query runs, so you can get the query cost after partitioning pruning through a dry run. Cluster pruning is done when the query runs, so the cost is known only after the query finishes.
Address a partition, such as when you want to load data to a specific partition, or wipe out data for a specific partition.
Use clustering if you:
Do not care how the data will be clustered, as long as you get the potential performance improvement and cost reduction. BigQuery will automatically figure out how the data should be clustered for optimal performance and cost.
Need more than 4,000 partitions. BigQuery has a limit of 4,000 partitions for a partitioned table. There is no limit for the number of clusters in a table.
Note that you can partition and cluster on the same integer column, to get the benefits of both. Data will first partition according to the specified integer ranges. Within each range, if the volume of data is large enough, the data will also be clustered. When the table is queried, partitioning sets an upper bound of the query cost based on partition pruning. There might be other query cost savings when the query actually runs.
Using require_partitioning_filter
With the release of integer range partitioning, BigQuery now supports multiple types of partitioning types:
- Ingestion time
- Date/timestamp/datetime
- Integer range
To simplify the BigQuery API, we moved the require_partitioning_filter
parameter out of the partitioning type level, and into the table level. For
backwards compatibility of date/timestamp/datetime partitioning, the
require_partitioning_filter
is still supported at the partition level. It can
also be specified at the table level. For integer range partitioning, you can
specify require_partitioning_filter
only at the table level. The bq
command-line tool
already uses the table level option, so there is no change to
how you use the bq
command. If you use the BigQuery API, you need to use the
require_partitioning_filter
option at the table level.
Partitioned table quotas and limits
Partitioned tables have defined limits in BigQuery.
Quotas and limits also apply to the different types of jobs you can run against partitioned tables, including:
- Loading data (load jobs)
- Exporting data (export jobs)
- Querying data (query jobs)
- Copying tables (copy jobs)
For more information on all quotas and limits, see Quotas and limits.
Partitioned table pricing
When you create and use partitioned tables in BigQuery, your charges are based on how much data is stored in the partitions and on the queries you run against the data:
- For information on storage pricing, see Storage pricing.
- For information on query pricing, see Query pricing.
Many partitioned table operations are free, including loading data into partitions, copying partitions, and exporting data from partitions. Though free, these operations are subject to BigQuery's Quotas and limits. For information on all free operations, see Free operations on the pricing page.
Next steps
- To learn how to create and use ingestion-time partitioned tables, see Creating and using ingestion-time partitioned tables.
- To learn how to create and use column-based time partitioned tables, see Creating and using date/timestamp/datetime partitioned tables.
- To learn how to create and use integer range partitioned tables, see Creating and using integer-range partitioned tables.
- To learn how to manage and update partitioned tables, see Managing partitioned tables.
- For information on querying partitioned tables, see Querying partitioned tables.