Creating and using integer range partitioned tables

This document describes how to create and use tables partitioned by an integer column. For information on other types of partitioned tables, see Creating and using date/timestamp partitioned tables or Creating and using ingestion-time partitioned tables.

For additional information about integer range partitions, see Integer range partitioned tables.

After creating an integer range partitioned table, you can:

  • Control access to your table data
  • Get information about your partitioned tables
  • List the partitioned tables in a dataset
  • Get partitioned table metadata using meta-tables

For more information on managing partitioned tables including updating partitioned table properties, copying a partitioned table, and deleting a partitioned table, see Managing partitioned tables.

Limitations

Integer range partitioned tables are subject to the following limitations:

  • The partitioning column must be an INTEGER column. While the mode of the column may be REQUIRED or NULLABLE, it cannot beREPEATED (array-based).
  • The partitioning column must be a top-level field. You cannot use a leaf field from a RECORD (STRUCT) as the partitioning column.
  • You cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.

Creating partitioned tables

You can create an integer range partitioned table in BigQuery by:

Table naming

When you create a table in BigQuery, the table name must be unique per dataset. The table name can:

  • Contain up to 1,024 characters
  • Contain letters (upper or lower case), numbers, and underscores

Required permissions

At a minimum, to create a table, you must be granted the following permissions:

  • bigquery.tables.create permissions to create the table
  • bigquery.tables.updateData to write data to the table by using a load job, a query job, or a copy job
  • bigquery.jobs.create to run a query job, load job, or copy job that writes data to the table

Additional permissions such as bigquery.tables.getData may be required to access the data you're writing to the table.

The following predefined Cloud IAM roles include both bigquery.tables.create and bigquery.tables.updateData permissions:

  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

The following predefined Cloud IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to create and update tables in the dataset.

For more information on Cloud IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Creating an empty partitioned table with a schema definition

You cannot create an empty partitioned table that does not have a schema definition. The schema is required in order to identify the column used to create the partitions.

When you create an empty partitioned table with a schema definition, you can:

  • Provide the schema inline using the CLI
  • Specify a JSON schema file using the CLI
  • Provide the schema in a table resource when calling the API's tables.insert method

For more information on specifying a table schema, see Specifying a schema.

After the partitioned table is created, you can:

  • Load data into it
  • Write query results to it
  • Copy data into it

To create an empty partitioned table with a schema definition:

DDL

Data definition language (DDL) statements allow you to create and modify tables and views using standard SQL query syntax.

See more on Using Data Definition Language statements.

To create a partitioned table by using a DDL statement in the Cloud Console:

  1. Open the BigQuery web UI in the Cloud Console.
    Go to the Cloud Console

  2. Click Compose new query.

  3. Type your CREATE TABLE DDL statement into the Query editor text area.

    The following query creates a table named newtable with an integer range partition on the customer_id column with start 0, end 100, and interval 10.

     CREATE TABLE
       mydataset.newtable
     PARTITION BY
       RANGE_BUCKET(customer_id, GENERATE_ARRAY(0, 100, 10))
     AS SELECT 1 AS customer_id, DATE "2019-10-01" AS date1
     

  4. Click Run. When the query completes, the table will appear in the Resources pane.

CLI

Use the mk command with the --range_partitioning flag:

bq mk \
--range_partitioning=column_name,start,end,interval \
project_id:dataset.table \
"column_name:integer,value:integer"

Where:

  • column_name is the column used to create the integer range partitions.
  • start is the start of range partitioning, inclusive
  • end is the end of range partitioning, exclusive
  • interval is the width of each range within the partition
  • project_id is your project ID.
  • dataset is a dataset in your project.
  • table is the name of the partitioned table you're creating.

Examples:

Enter the following command to create an integer range partitioned table named mypartitionedtable in mydataset in your default project. The partitioning is based on start of 0, end of 100, and interval of 10.

The --require_partition_filter flag is used to require users to include a WHERE clause that specifies the partitions to query. Requiring a partition filter may reduce cost and improve performance. For more information, see Querying partitioned tables.

bq mk \
--require_partition_filter \
--range_partitioning=customer_id,0,100,10 \
mydataset.mypartitionedtable \
"customer_id:integer,value:integer"

After the table is created, you can use the CLI to update the partitioned table's table expiration, partition expiration, description, and labels.

API

Call the tables.insert method with a defined table resource that specifies the rangePartitioning property and the schema property.

To verify a table is partitioned on an integer column, examine the schema. For example, run the following command:

bq show --format=prettyjson my_dataset.my_table

If an integer column is partitioned, the output from bq show will contain rangePartitioning data:

...
  "rangePartitioning": {
    "field": "customer_id",
    "range": {
      "end": "100",
      "interval": "10",
      "start": "0"
    }
  },
...

Writing to an integer range partitioned table

Data written to an integer range partitioned table will automatically be partitioned. This includes writing to the table via load jobs, queries, and streaming.

For streaming, data in the streaming buffer is in the UNPARTITIONED partition. When the data is extracted, it initially stays in the UNPARTITIONED partition. When there is enough unpartitioned data, it will be repartitioned into the specific partitions.

Here's an example to save the query results to an integer range partitioned table:

bq query --nouse_legacy_sql \
--destination_table=my_dataset.my_table \
'SELECT value AS customer_id, value+1 AS value FROM UNNEST(GENERATE_ARRAY(-5, 110, 5)) AS value'

Querying an integer range partitioned table

Integer range partitioned table can only be queried through Standard SQL. When an integer range partitioned table is queried, if there are filters on the integer partitioning column, the filters will be used to prune the partitions and reduce the query cost.

The following query will scan the 3 partitions that start with 30, 40, and 50.

bq query --nouse_legacy_sql \
'SELECT * FROM my_dataset.my_table WHERE customer_id BETWEEN 30 AND 50'

Here's example output from the bq query command:

Waiting on bqjob_r4fce65fa3381528e_000001670994aeb6_1 ... (0s) Current status: DONE
+---------+-------+
| customer_id | value |
+---------+-------+
|      40 |    41 |
|      45 |    46 |
|      30 |    31 |
|      35 |    36 |
|      50 |    51 |
+---------+-------+

For this example, each partition has 2 rows, and each row has 2 integer columns, so the query should scan 3 * 2 * 2 * 8 = 96 bytes. You can examine the job information:

bq show -j bqjob_r4fce65fa3381528e_000001670994aeb6_1

Here's example output from the bq show command:

Job myproject:bqjob_r4fce65fa3381528e_000001670994aeb6_1

  Job Type    State      Start Time      Duration       User Email        Bytes Processed   Bytes Billed   Billing Tier   Labels
 ---------- --------- ----------------- ---------- --------------------- ----------------- -------------- -------------- --------
  query      SUCCESS   24 Sep 12:19:58   0:00:01    joe@google.com       96                10485760       1

DML statements are supported. For example:

bq query --nouse_legacy_sql \
'DELETE FROM my_dataset.my_table WHERE customer_id = 30'

Currently, partition pruning is not supported for functions over an integer range partitioned column. As an example, the following query will scan the entire table.

bq query --nouse_legacy_sql \
'SELECT * FROM my_dataset.my_table WHERE customer_id+1 BETWEEN 30 AND 50'

Table decorators on integer range partitioned tables

Similar to date/timestamp partitioning, table decorators can be used to address a partition in an integer range partitioned table. The key to address a range partition is the start of the range.

The following example queries the range partition that starts with 0. It has 2 values, 0 and 5.

bq query 'SELECT * FROM my_dataset.my_table$0'

+---------+-------+
| customer_id | value |
+---------+-------+
|       0 |     1 |
|       5 |     6 |
+---------+-------+

Integer range partitioning with clustering

You can use integer range partitioning with clustering. Data is first partitioned on the integer range partitioned column, and then data in each partition is clustered by the clustering columns.

As an example, this command creates a table with an integer range column and a cluster.

bq mk \
--range_partitioning=customer_id,0,100,10 \
--clustering_fields=value \
my_dataset.my_table_2 \
"customer_id:integer,value:integer"

When you retrieve the format of the table, you will see that both range partitioning and clustering are in effect:

...
  "clustering": {
    "fields": [
      "value"
    ]
  },
...
  "rangePartitioning": {
    "field": "customer_id",
    "range": {
      "end": "100",
      "interval": "10",
      "start": "0"
    }
  },
...

Creating a partitioned table from a query result

To create a partitioned table from a query result:

Console

You cannot specify partitioning options for a destination table when you query data using the Cloud Console.

CLI

Enter the bq query command, specify the --destination_table flag to create a permanent table based on the query results, and specify the --range_partitioning flag to create a partitioned destination table.

Specify the use_legacy_sql=false flag to use standard SQL syntax. To write the query results to a table that is not in your default project, add the project ID to the dataset name in the following format: project_id:dataset.

(Optional) Supply the --location flag and set the value to your location.

Enter the following command to create a new, partitioned destination table from a query result:

bq --location=location query \
--destination_table project_id:dataset.table \
--range_partitioning column,start,end,interval \
--use_legacy_sql=false \
'query'

Where:

  • location is the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo location, set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • project_id is your project ID.
  • dataset is the name of the dataset that will contain the new partitioned table.
  • table is the name of the partitioned table you're creating using the query results.
  • column is the INTEGER column to use for the partition.
  • start is the start of range partitioning, inclusive
  • end is the end of range partitioning, exclusive
  • interval is the width of each range within the partition
  • query is a query in standard SQL syntax. Currently, you cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.

Example:

Enter the following command to write query results to a partitioned destination table named mypartitionedtable in mydataset. mydataset is in your default project. This query retrieves data from a non-partitioned table: the NHTSA Traffic Fatality public dataset. The table's provider_id INTEGER column is used to create the partitions.

bq query \
--destination_table mydataset.mypartitionedtable \
--use_legacy_sql=false \
--range_partitioning provider_id,10000,700000,10000 \
'SELECT
   provider_id, total_discharges, average_covered_charges
 FROM
   `bigquery-public-data`.medicare.inpatient_charges_2011
 LIMIT
   300'

API

To save query results to a permanent partitioned table, call the jobs.insert method, configure a query job, and include a value for the destinationTable and rangePartitioning properties.

Specify your location in the location property in the jobReference section of the job resource.

Creating a partitioned table when loading data

You can create a partitioned table by specifying partitioning options when you load data into a new table. You do not need to create an empty partitioned table before loading data into it. You can create the partitioned table and load your data at the same time.

When you load data into BigQuery, you can supply the table schema, or for supported data formats, you can use schema auto-detection.

Partition decorators enable you to load data into a specific partition. For example, consider a table that is integer range partitioned on the customer_id column, with a start of 0, end of 100, and interval of 10. To load all data for customer IDs that are in the range of 0 to 9, use the following $0 partition decorator:

table_name$0

When loading data into a specific partition using a partition decorator, the data that is being loaded into the partition must conform to the table's partitioning scheme. All rows written to the partition should have values that fall within the partition's date.

For more information on loading data, see Introduction to loading data into BigQuery.

Controlling access to partitioned tables

You cannot assign access controls directly to tables or views. The lowest level of BigQuery resources to which you are able to grant access is the dataset level. To configure access to tables and views, you grant a Cloud IAM role to an entity at the dataset level or higher.

Granting a role at the dataset-level specifies the operations an entity is allowed to perform on tables and views in that specific dataset. For information on configuring dataset-level access controls, see Controlling access to datasets.

You can also grant Cloud IAM roles at a higher level in the Google Cloud resource hierarchy such as the project, folder, or organization level. Granting roles at a higher level gives the entity access to a broader set of resources. For example, granting a role to an entity at the project level gives that entity permissions that apply to all datasets throughout the project. For more information on granting access to resources, see Granting, changing, and revoking access to resources in the Cloud IAM documentation.

You can also create Cloud IAM custom roles. If you create a custom role, the permissions you grant depend on the specific operations you want the entity to be able to perform.

For more information on roles and permissions, see:

Using partitioned tables

Getting information about partitioned tables

You can get information about tables by:

  • Using the Cloud Console or the classic BigQuery web UI
  • Using the bq show CLI command
  • Calling the tables.get API method
  • Using the client libraries

Required permissions

At a minimum, to get information about tables, you must be granted bigquery.tables.get permissions. The following predefined Cloud IAM roles include bigquery.tables.get permissions:

  • bigquery.metadataViewer
  • bigquery.dataViewer
  • bigquery.dataOwner
  • bigquery.dataEditor
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to retrieve table metadata.

For more information on Cloud IAM roles and permissions in BigQuery, see Access control.

Getting partitioned table information

To view information about a partitioned table:

Console

  1. Open the BigQuery web UI in the Cloud Console.
    Go to the BigQuery web UI

  2. In the navigation panel, in the Resources section, expand your project and dataset, then click the table name in the list.

  3. Click Details below the Query editor. This tab displays the table's description and table information.

    Table details

  4. Click the Schema tab to view the table's schema definition. Notice partitioned tables do not include the _PARTITIONTIME pseudo column.

CLI

Issue the bq show command to display all table information. Use the --schema flag to display only table schema information. The --format flag can be used to control the output.

If you are getting information about a table in a project other than your default project, add the project ID to the dataset in the following format: project_id:dataset.

bq show --schema --format=prettyjson project_id:dataset.table

Where:

  • project_id is your project ID.
  • dataset is the name of the dataset.
  • table is the name of the table.

Examples:

Enter the following command to display all information about mytable in mydataset. mydataset is in your default project.

bq show --format=prettyjson mydataset.mytable

Enter the following command to display all information about mytable in mydataset. mydataset is in myotherproject, not your default project.

bq show --format=prettyjson myotherproject:mydataset.mytable

The output should look like the following:

{
  "creationTime": "1569429717657",
  "etag": "AcSEmWFQdbYEGT0auTE9NA==",
  "id": "myproject:mydataset.newtable",
  "kind": "bigquery#table",
  "lastModifiedTime": "1569429717657",
  "location": "US",
  "numBytes": "16",
  "numLongTermBytes": "0",
  "numRows": "1",
  "rangePartitioning": {
    "field": "customer_id",
    "range": {
      "end": "100",
      "interval": "10",
      "start": "0"
    }
  },
  "schema": {
    "fields": [
      {
        "name": "customer_id",
        "type": "INTEGER"
      },
      {
        "name": "date1",
        "type": "DATE"
      }
    ]
  },
  "selfLink": "https://bigquery.googleapis.com/bigquery/v2/projects/myproject/datasets/mydataset/tables/newtable",
  "tableReference": {
    "datasetId": "mydataset,
    "projectId": "myproject",
    "tableId": "newtable"
  },
  "type": "TABLE"
}

API

Call the tables.get method and provide any relevant parameters.

Listing partitioned tables in a dataset

You can list tables in datasets (including partitioned tables) by:

  • Using the Cloud Console or the classic BigQuery web UI
  • Using the bq ls CLI command
  • Calling the tables.list API method
  • Using the client libraries

Required permissions

At a minimum, to list tables in a dataset, you must be granted bigquery.tables.list permissions. The following predefined Cloud IAM roles include bigquery.tables.list permissions:

  • bigquery.user
  • bigquery.metadataViewer
  • bigquery.dataViewer
  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

For more information on Cloud IAM roles and permissions in BigQuery, see Access control.

Listing partitioned tables

To list the tables in a dataset (including partitioned tables):

Console

  1. Open the BigQuery web UI in the Cloud Console.
    Go to the BigQuery web UI

  2. In the navigation panel, in the Resources section, expand your project and click on your dataset.

  3. Scroll through the list to see the tables in the dataset. Tables, partitioned tables, models, and views are identified by different icons.

API

To list tables using the API, call the tables.list method.

Listing partitions in partitioned tables

You can list the partitions in a partitioned table by querying the __PARTITIONS_SUMMARY__ meta table using legacy SQL.

You can run the query in the Cloud Console, the classic BigQuery web UI, using the bq query command, or by calling the jobs.insert method and configuring a query job.

Required permissions

At a minimum, to run a query job that uses the __PARTITIONS_SUMMARY__ meta-table, you must be granted bigquery.jobs.create permissions. The following predefined Cloud IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

You must also be granted bigquery.tables.getData permissions. The following predefined Cloud IAM roles include bigquery.tables.getData permissions:

  • bigquery.dataViewer
  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

For more information on Cloud IAM roles in BigQuery, see Access control.

Listing partitions in a partitioned table

You can list partitions in a partitioned table using legacy SQL. To list partitions in a partitioned table:

Console

  1. Open the BigQuery web UI in the Cloud Console.
    Go to the Cloud Console

  2. Click the Compose new query button.

  3. Enter the following text into the Query editor box to query the __PARTITIONS_SUMMARY__ meta-table:

    #legacySQL
    SELECT
      partition_id
    FROM
      [dataset.table$__PARTITIONS_SUMMARY__]
    

    Where:

    • dataset is the dataset that contains the table.
    • table is the name of the table.
  4. Click Run.

CLI

Enter the following query using the bq query command:

bq --location=location query \
--use_legacy_sql=true \
'SELECT
  partition_id
FROM
  [dataset.table$__PARTITIONS_SUMMARY__]'

Where:

  • location is the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo location, set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • dataset is the dataset that contains the table.
  • table is the name of the table.

API

Call the jobs.insert method and configure a query job that queries the table's __PARTITIONS_SUMMARY__ meta table.

Getting partitioned table metadata using meta tables

You can get information about partitioned tables by using special tables called meta tables. Meta tables contain metadata such as the list of tables and views in a dataset. The meta tables are read only.

Currently, you cannot use the INFORMATION_SCHEMA service to get partitioned table metadata.

Getting partition metadata using meta tables

The __PARTITIONS_SUMMARY__ meta table is a special table whose contents represent metadata about partitions in a time-partitioned table. The __PARTITIONS_SUMMARY__ meta table is read-only.

To access metadata about the partitions in a time-partitioned table, use the __PARTITIONS_SUMMARY__ meta-table in a query's SELECT statement. You can run the query by:

  • Using the Cloud Console or the classic BigQuery web UI
  • Using the command-line tool's bq query command
  • Calling the jobs.insert API method and configuring a query job
  • Using the client libraries

Currently, standard SQL does not support the partition decorator separator ($) so you cannot query __PARTITIONS_SUMMARY__ in standard SQL. A legacy SQL query that uses the __PARTITIONS_SUMMARY__ meta-table looks like the following:

SELECT
  column
FROM
  [dataset.table$__PARTITIONS_SUMMARY__]

Where:

  • dataset is the name of your dataset.
  • table is the name of the time-partitioned table.
  • column is one of the following:
Value Description
project_id Name of the project.
dataset_id Name of the dataset.
table_id Name of the time-partitioned table.
partition_id Name (date) of the partition.
creation_time The time at which the partition was created, in milliseconds since January 1, 1970 UTC.
last_modified_time The time at which the partition was last modified, in milliseconds since January 1, 1970 UTC.

Partition meta table permissions

At a minimum, to run a query job that uses the __PARTITIONS_SUMMARY__ meta-table, you must be granted bigquery.jobs.create permissions. The following predefined Cloud IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

You must also be granted bigquery.tables.getData permissions. The following predefined Cloud IAM roles include bigquery.tables.getData permissions:

  • bigquery.dataViewer
  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

For more information on Cloud IAM roles in BigQuery, see Access control.

Partition meta table examples

The following query retrieves all partition metadata for a range-partitioned table named mydataset.mytable.

Console

#legacySQL
SELECT
  *
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]

CLI

bq query --use_legacy_sql=true '
SELECT
  *
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]'

The output looks like the following:

+----------------+------------+----------------+--------------+---------------+--------------------+
|   project_id   | dataset_id |    table_id    | partition_id | creation_time | last_modified_time |
+----------------+------------+----------------+--------------+---------------+--------------------+
| myproject      | mydataset  | mytable        | 10000        | 1517190224120 | 1517190224997      |
| myproject      | mydataset  | mytable        | 20000        | 1517190224120 | 1517190224997      |
+----------------+------------+----------------+--------------+---------------+--------------------+

The following query lists the times when the partitions in mydataset.mytable were last modified.

Console

#legacySQL
SELECT
  partition_id,
  last_modified_time
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]

CLI

bq query --use_legacy_sql=true '
SELECT
  partition_id,
  last_modified_time
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]'

The output looks like the following:

+--------------+--------------------+
| partition_id | last_modified_time |
+--------------+--------------------+
| 100000       |      1471632556179 |
| 20000        |      1471632538142 |
| 30000        |      1471632570463 |
+--------------+--------------------+

To display the last_modified_time field in human-readable format, use the FORMAT_UTC_USEC function. For example:

Console

#legacySQL
SELECT
  partition_id,
  FORMAT_UTC_USEC(last_modified_time*1000) AS last_modified
FROM
  [mydataset.table1$__PARTITIONS_SUMMARY__]

CLI

bq query --use_legacy_sql=true '
SELECT
  partition_id,
  FORMAT_UTC_USEC(last_modified_time*1000) AS last_modified
FROM
  [mydataset.mytable$__PARTITIONS_SUMMARY__]'

The output looks like the following:

+--------------+----------------------------+
| partition_id |       last_modified        |
+--------------+----------------------------+
| 10000        | 2016-08-19 18:49:30.463000 |
| 20000        | 2016-08-19 18:49:16.179000 |
| 30000        | 2016-08-19 18:48:58.142000 |
+--------------+----------------------------+

Next steps

Var denne side nyttig? Giv os en anmeldelse af den:

Send feedback om...

Har du brug for hjælp? Besøg vores supportside.