This document describes how to create and use tables partitioned by an integer column. For information on other types of partitioned tables, see Creating and using date/timestamp partitioned tables or Creating and using ingestion-time partitioned tables.
For additional information about integer range partitions, see Integer range partitioned tables.
After creating an integer range partitioned table, you can:
- Control access to your table data
- Get information about your partitioned tables
- List the partitioned tables in a dataset
- Get partitioned table metadata using meta-tables
For more information on managing partitioned tables including updating partitioned table properties, copying a partitioned table, and deleting a partitioned table, see Managing partitioned tables.
Limitations
Integer range partitioned tables are subject to the following limitations:
- The partitioning column must be an
INTEGER
column. While the mode of the column may beREQUIRED
orNULLABLE
, it cannot beREPEATED
(array-based). - The partitioning column must be a top-level field. You cannot use a leaf field
from a
RECORD
(STRUCT
) as the partitioning column. - You cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.
Creating partitioned tables
You can create an integer range partitioned table in BigQuery in any of the following ways:
- Using the Cloud Console
- Using a DDL
CREATE TABLE
statement with aPARTITION BY RANGE_BUCKET
clause that contains apartition expression
- By using the
bq
command-line tool'sbq mk
command - Programmatically by calling the
tables.insert
API method - From query results
- When you load data
Table naming
When you create a table in BigQuery, the table name must be unique per dataset. The table name can:
- Contain up to 1,024 characters.
- Contain Unicode characters in category L (letter), M (mark), N (number), Pc (connector, including underscore), Pd (dash), Zs (space). For more information, see General Category.
For example, the following are all valid table names: table-01
, ग्राहक
,
00_お客様
, étudiant
.
Some table names and table name prefixes are reserved. If you receive an error saying that your table name or prefix is reserved, then select a different name and try again.
Required permissions
At a minimum, to create a table, you must be granted the following permissions:
bigquery.tables.create
permissions to create the tablebigquery.tables.updateData
to write data to the table by using a load job, a query job, or a copy jobbigquery.jobs.create
to run a query job, load job, or copy job that writes data to the table
Additional permissions such as bigquery.tables.getData
might be required to
access the data you're writing to the table.
The following predefined IAM roles include both
bigquery.tables.create
and bigquery.tables.updateData
permissions:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
The following predefined IAM roles include bigquery.jobs.create
permissions:
bigquery.user
bigquery.jobUser
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that
user creates a dataset, they are granted bigquery.dataOwner
access to it.
bigquery.dataOwner
access gives the user the ability to create and
update tables in the dataset.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Creating an empty partitioned table with a schema definition
You cannot create an empty partitioned table that does not have a schema definition. The schema is required in order to identify the column used to create the partitions.
When you create an empty partitioned table with a schema definition, you can:
- Provide the schema inline using the
bq
command-line tool - Specify a JSON schema file using the
bq
command-line tool - Provide the schema in a table resource
when calling the API's
tables.insert
method
For more information on specifying a table schema, see Specifying a schema.
After the partitioned table is created, you can:
- Load data into it
- Write query results to it
- Copy data into it
To create an empty partitioned table with a schema definition:
Console
Open the BigQuery page in the Cloud Console.
In the Explorer panel, expand your project and select a dataset.
In the details panel, click Create table.
In the Create table panel, in the Source section:
- For Create table from, select Empty table.
In the Destination section:
- For Dataset name, choose the appropriate dataset, and in the Table name field, enter the name of the table you're creating.
- Verify that Table type is set to Native table.
In the Schema section, enter the schema definition. Enter schema information manually by doing one of the following:
Enable Edit as text and enter the table schema as a JSON array.
Click Add field and enter the schema information.
Under Partition and cluster settings, from the Partition dropdown list, choose a column of type
INTEGER
.Provide values for start, end, and interval:
- start is the start of range partitioning, inclusive
- end is the end of range partitioning, exclusive
- interval is the width of each range within the partition
(Optional) For Partitioning filter, click the Require partition filter box to require users to include a
WHERE
clause that specifies the partitions to query. Requiring a partition filter may reduce cost and improve performance. For more information, see Querying partitioned tables.(Optional) Click Advanced options and for Encryption, click Customer-managed key to use a Cloud Key Management Service key. If you leave the Google-managed key setting, BigQuery encrypts the data at rest.
Click Create table.
SQL
Data definition language (DDL) statements allow you to create and modify tables and views using standard SQL query syntax.
See more on Using data definition language statements.
To create a partitioned table by using a DDL statement in the Cloud Console:
Open the BigQuery page in the Cloud Console.
Click Compose new query.
Type your
CREATE TABLE
DDL statement into the Query editor text area.The following query creates a table named
newtable
with an integer range partition on the customer_id column with start 0, end 100, and interval 10.CREATE TABLE mydataset.newtable PARTITION BY RANGE_BUCKET(customer_id, GENERATE_ARRAY(0, 100, 10)) AS SELECT 1 AS customer_id, DATE "2019-10-01" AS date1
Click Run. When the query completes, the table appears in the list of tables for that dataset.
bq
Use the mk
command with the --range_partitioning
flag:
bq mk \ --range_partitioning=column_name,start,end,interval \ project_id:dataset.table \ "column_name:integer,value:integer"
Where:
- column_name is the column used to create the integer range partitions.
- start is the start of range partitioning, inclusive
- end is the end of range partitioning, exclusive
- interval is the width of each range within the partition
- project_id is your project ID.
- dataset is a dataset in your project.
- table is the name of the partitioned table you're creating.
Examples:
Enter the following command to create an integer range partitioned table
named mypartitionedtable
in mydataset
in your default project. The
partitioning is based on start of 0, end of 100, and interval of 10.
The --require_partition_filter
flag is used to require users
to include a WHERE
clause that specifies the partitions to query.
Requiring a partition filter may reduce cost and improve performance.
For more information, see Querying partitioned tables.
bq mk \
--require_partition_filter \
--range_partitioning=customer_id,0,100,10 \
mydataset.mypartitionedtable \
"customer_id:integer,value:integer"
After the table is created, you can use the bq
command-line tool to update the partitioned
table's
table expiration,
partition expiration,
description,
and labels.
API
Call the tables.insert
method with a defined table resource
that specifies the rangePartitioning
property and the schema
property.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
To verify a table is partitioned on an integer column, examine the schema. For example, run the following command:
bq show --format=prettyjson mydataset.mytable
If an integer column is partitioned, the output from bq show
will contain
rangePartitioning
data:
...
"rangePartitioning": {
"field": "customer_id",
"range": {
"end": "100",
"interval": "10",
"start": "0"
}
},
...
Writing to an integer range partitioned table
Data written to an integer range partitioned table will automatically be partitioned. This includes writing to the table via load jobs, queries, and streaming.
For streaming, data in the streaming buffer is in the UNPARTITIONED partition. When the data is extracted, it initially stays in the UNPARTITIONED partition. When there is enough unpartitioned data, it will be repartitioned into the specific partitions.
Here's an example to save the query results to an integer range partitioned table:
bq query --nouse_legacy_sql \
--destination_table=mydataset.mytable \
'SELECT value AS customer_id, value+1 AS value FROM UNNEST(GENERATE_ARRAY(-5, 110, 5)) AS value'
Querying an integer range partitioned table
Integer range partitioned table can only be queried through Standard SQL. When an integer range partitioned table is queried, if there are filters on the integer partitioning column, the filters will be used to prune the partitions and reduce the query cost.
The following query will scan the 3 partitions that start with 30, 40, and 50.
bq query --nouse_legacy_sql \
'SELECT * FROM mydataset.mytable WHERE customer_id BETWEEN 30 AND 50'
Here's example output from the bq query
command:
Waiting on bqjob_r4fce65fa3381528e_000001670994aeb6_1 ... (0s) Current status: DONE
+---------+-------+
| customer_id | value |
+---------+-------+
| 40 | 41 |
| 45 | 46 |
| 30 | 31 |
| 35 | 36 |
| 50 | 51 |
+---------+-------+
For this example, each partition has 2 rows, and each row has 2 integer columns, so the query should scan 3 * 2 * 2 * 8 = 96 bytes. You can examine the job information:
bq show -j bqjob_r4fce65fa3381528e_000001670994aeb6_1
Here's example output from the bq show
command:
Job myproject:bqjob_r4fce65fa3381528e_000001670994aeb6_1
Job Type State Start Time Duration User Email Bytes Processed Bytes Billed Billing Tier Labels
---------- --------- ----------------- ---------- --------------------- ----------------- -------------- -------------- --------
query SUCCESS 24 Sep 12:19:58 0:00:01 joe@google.com 96 10485760 1
DML statements are supported. For example:
bq query --nouse_legacy_sql \
'DELETE FROM mydataset.mytable WHERE customer_id = 30'
Currently, partition pruning is not supported for functions over an integer range partitioned column. As an example, the following query will scan the entire table.
bq query --nouse_legacy_sql \
'SELECT * FROM mydataset.mytable WHERE customer_id+1 BETWEEN 30 AND 50'
Table decorators on integer range partitioned tables
Legacy SQL supports using table decorators to address a partition in an integer range partitioned table. The key to address a range partition is the start of the range.
The following example queries the range partition that starts with 0. The
partitioning column is customer_id
and the partition has two rows.
bq query --use_legacy_sql=true 'SELECT * FROM mydataset.mytable$0'
+-------------+-------+
| customer_id | value |
+-------------+-------+
| 0 | 1 |
| 5 | 6 |
+-------------+-------+
Integer range partitioning with clustering
You can use integer range partitioning with clustering. Data is first partitioned on the integer range partitioned column, and then data in each partition is clustered by the clustering columns.
As an example, this command creates a table with an integer range column and a cluster.
bq mk \
--range_partitioning=customer_id,0,100,10 \
--clustering_fields=value \
mydataset.mytable_2 \
"customer_id:integer,value:integer"
When you retrieve the format of the table, you will see that both range partitioning and clustering are in effect:
...
"clustering": {
"fields": [
"value"
]
},
...
"rangePartitioning": {
"field": "customer_id",
"range": {
"end": "100",
"interval": "10",
"start": "0"
}
},
...
Creating a partitioned table from a query result
To create a partitioned table from a query result:
Console
You cannot specify partitioning options for a destination table when you query data using the Cloud Console.
bq
Enter the bq query
command, specify the --destination_table
flag to
create a permanent table based on the query results, and specify the
--range_partitioning
flag to create a partitioned
destination table.
Specify the use_legacy_sql=false
flag to use standard SQL syntax.
To write the query results to a table that is not in your default project,
add the project ID to the dataset name in the following format:
project_id:dataset
.
(Optional) Supply the --location
flag and set the value to your
location.
Enter the following command to create a new, partitioned destination table from a query result:
bq --location=location query \ --destination_table project_id:dataset.table \ --range_partitioning column,start,end,interval \ --use_legacy_sql=false \ 'query'
Where:
- location is the name of your location. The
--location
flag is optional. For example, if you are using BigQuery in the Tokyo location, set the flag's value toasia-northeast1
. You can set a default value for the location using the .bigqueryrc file. - project_id is your project ID.
- dataset is the name of the dataset that will contain the new partitioned table.
- table is the name of the partitioned table you're creating using the query results.
- column is the INTEGER column to use for the partition.
- start is the start of range partitioning, inclusive
- end is the end of range partitioning, exclusive
- interval is the width of each range within the partition
- query is a query in standard SQL syntax. Currently, you cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.
Example:
Enter the following command to write query results to a partitioned
destination table named mypartitionedtable
in mydataset
. mydataset
is
in your default project. This query retrieves data from a non-partitioned
table: the
Medicare public dataset.
The table's provider_id
INTEGER
column is used to create the partitions.
bq query \ --destination_table mydataset.mypartitionedtable \ --use_legacy_sql=false \ --range_partitioning provider_id,10000,700000,10000 \ 'SELECT provider_id, total_discharges, average_covered_charges FROM `bigquery-public-data`.medicare.inpatient_charges_2011 LIMIT 300'
API
To save query results to a permanent partitioned table, call the
jobs.insert
method,
configure a query
job, and include a value for the destinationTable
and
rangePartitioning
properties.
Specify your location in the location
property in the
jobReference
section of the job resource.
Creating a partitioned table when loading data
You can create a partitioned table by specifying partitioning options when you load data into a new table. You do not need to create an empty partitioned table before loading data into it. You can create the partitioned table and load your data at the same time.
When you load data into BigQuery, you can supply the table schema, or for supported data formats, you can use schema auto-detection.
Partition decorators enable you to load data into a specific partition. For
example, consider a table that is integer range partitioned on the customer_id
column, with a start of 0, end of 100, and interval of 10. To load all data for
customer IDs that are in the range of 0 to 9, use the following $0
partition
decorator:
table_name$0
When loading data into a specific partition using a partition decorator, the data that is being loaded into the partition must conform to the table's partitioning scheme. All rows written to the partition should have values that fall within the partition's date.
For more information on loading data, see Introduction to loading data into BigQuery.
Controlling access to partitioned tables
To configure access to tables and views, you can grant an IAM role to an entity at the following levels, listed in order of range of resources allowed (largest to smallest):
- a high level in the Google Cloud resource hierarchy such as the project, folder, or organization level
- the dataset level
- the table/view level
Access with any resource protected by IAM is additive. For example, if an entity does not have access at the high level such as a project, you could grant the entity access at the dataset level, and then the entity will have access to the tables and views in the dataset. Similarly, if the entity does not have access at the high level or the dataset level, you could grant the entity access at the table of view level.
Granting IAM roles at a higher level in the Google Cloud resource hierarchy such as the project, folder, or organization level gives the entity access to a broad set of resources. For example, granting a role to an entity at the project level gives that entity permissions that apply to all datasets throughout the project.
Granting a role at the dataset level specifies the operations an entity is allowed to perform on tables and views in that specific dataset, even if the entity does not have access at a higher level. For information on configuring dataset-level access controls, see Controlling access to datasets.
Granting a role at the table or view level specifies the operations an entity is allowed to perform on specific tables and views, even if the entity does not have access at a higher level. For information on configuring table-level access controls, see Controlling access to tables and views.
You can also create IAM custom roles. If you create a custom role, the permissions you grant depend on the specific operations you want the entity to be able to perform.
You can't set a "deny" permission on any resource protected by IAM.
For more information on roles and permissions, see:
- Understanding roles in the IAM documentation
- BigQuery Predefined roles and permissions
- Controlling access to datasets
- Controlling access to tables and views
- Restricting access with BigQuery Column-level security
Using partitioned tables
Getting information about partitioned tables
You can get information about tables by:
- Using the Cloud Console
- Using the
bq show
command in thebq
command-line tool - Calling the
tables.get
API method - Using the client libraries
Required permissions
At a minimum, to get information about tables, you must be granted
bigquery.tables.get
permissions. The following predefined IAM
roles include bigquery.tables.get
permissions:
bigquery.metadataViewer
bigquery.dataViewer
bigquery.dataOwner
bigquery.dataEditor
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that
user creates a dataset, they are granted bigquery.dataOwner
access to it.
bigquery.dataOwner
access gives the user the ability to retrieve table
metadata.
For more information on IAM roles and permissions in BigQuery, see Access control.
Getting partitioned table information
To view information about a partitioned table:
Console
Open the BigQuery page in the Cloud Console.
In the Explorer panel, expand your project and dataset, then select the table.
In the details panel, click Details. This tab displays the table's description and table information.
Click the Schema tab to view the table's schema definition.
bq
Issue the bq show
command to display all table information. Use the
--schema
flag to display only table schema information. The --format
flag can be used to control the output.
If you are getting information about a table in a project other than
your default project, add the project ID to the dataset in the following
format: project_id:dataset
.
bq show --schema --format=prettyjson project_id:dataset.table
Where:
- project_id is your project ID.
- dataset is the name of the dataset.
- table is the name of the table.
Examples:
Enter the following command to display all information about mytable
in
mydataset
. mydataset
is in your default project.
bq show --format=prettyjson mydataset.mytable
Enter the following command to display all information about mytable
in
mydataset
. mydataset
is in myotherproject
, not your default project.
bq show --format=prettyjson myotherproject:mydataset.mytable
The output should look like the following:
{ "creationTime": "1569429717657", "etag": "AcSEmWFQdbYEGT0auTE9NA==", "id": "myproject:mydataset.newtable", "kind": "bigquery#table", "lastModifiedTime": "1569429717657", "location": "US", "numBytes": "16", "numLongTermBytes": "0", "numRows": "1", "rangePartitioning": { "field": "customer_id", "range": { "end": "100", "interval": "10", "start": "0" } }, "schema": { "fields": [ { "name": "customer_id", "type": "INTEGER" }, { "name": "date1", "type": "DATE" } ] }, "selfLink": "https://bigquery.googleapis.com/bigquery/v2/projects/myproject/datasets/mydataset/tables/newtable", "tableReference": { "datasetId": "mydataset, "projectId": "myproject", "tableId": "newtable" }, "type": "TABLE" }
API
Call the tables.get
method and provide any relevant parameters.
Listing partitioned tables in a dataset
You can list tables in datasets (including partitioned tables) by:
- Using the Cloud Console
- Using the
bq ls
command in thebq
command-line tool - Calling the
tables.list
API method - Using the client libraries
Required permissions
At a minimum, to list tables in a dataset, you must be granted
bigquery.tables.list
permissions. The following predefined IAM
roles include bigquery.tables.list
permissions:
bigquery.user
bigquery.metadataViewer
bigquery.dataViewer
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
For more information on IAM roles and permissions in BigQuery, see Access control.
Listing partitioned tables
To list the tables in a dataset (including partitioned tables):
Console
Open the BigQuery page in the Cloud Console.
In the Explorer panel, expand your project and select a dataset.
Scroll through the list to see the tables in the dataset. Tables, partitioned tables, models, and views are identified by different icons.
API
To list tables using the API, call the tables.list
method.
Getting partition metadata
You can get information about partitioned tables in the following ways:
- Use the
INFORMATION_SCHEMA.PARTITIONS
view (Preview). - Use the
__PARTITIONS_SUMMARY__
meta-table (legacy SQL only).
Getting partition metadata using INFORMATION_SCHEMA
views
When you query the INFORMATION_SCHEMA.PARTITIONS
view, the query results
contain one row for each partition. For example, the following query lists all
of the table partitions in the dataset named mydataset
:
SELECT table_name, partition_id, total_rows
FROM `mydataset.INFORMATION_SCHEMA.PARTITIONS`
WHERE partition_id IS NOT NULL
For more information,
see INFORMATION_SCHEMA.PARTITIONS
.
Getting partition metadata using meta-tables
In legacy SQL, you can get metadata about table partitions by querying the
__PARTITIONS_SUMMARY__
meta-table. Meta-tables are read-only tables that
contain metadata.
Query the __PARTITIONS_SUMMARY__
meta-table as follows:
#legacySQL SELECT column FROM [dataset.table$__PARTITIONS_SUMMARY__]
The __PARTITIONS_SUMMARY__
meta-table has the following columns:
Value | Description |
---|---|
project_id |
Name of the project. |
dataset_id |
Name of the dataset. |
table_id |
Name of the time-partitioned table. |
partition_id |
Name (date) of the partition. |
creation_time |
The time at which the partition was created, in milliseconds since January 1, 1970 UTC. |
last_modified_time |
The time at which the partition was last modified, in milliseconds since January 1, 1970 UTC. |
At a minimum, to run a query job that uses the __PARTITIONS_SUMMARY__
meta-
table, you must be granted bigquery.jobs.create
permissions and
bigquery.tables.getData
permissions.
For more information on IAM roles in BigQuery, see Access control.
Next steps
- For an overview of partitioned table support in BigQuery, see Introduction to partitioned tables.
- To learn how to create and use date/timestamp partitioned tables, see Creating and using date/timestamp partitioned tables.
- To learn how to create and use ingestion-time partitioned tables, see Creating and using ingestion-time partitioned tables.
- To learn how to manage and update partitioned tables, see Managing partitioned tables.
- For information on querying partitioned tables, see Querying partitioned tables.