Creating and Updating Date-Partitioned Tables

BigQuery users typically split large tables into many smaller partitions, where each partition contains a day's worth of data. Tables that are divided into date-based partitions are called date-partitioned tables.

If you create a date-partitioned table, BigQuery automatically loads data into a date-based partition of the table. You can then reduce the number of bytes processed by restricting your queries to specific partitions in the table.

For conceptual information about partitioned tables, see Partitioned Tables. For a simple example of creating and populating a date-partitioned table, see the Example section.

Creating a partitioned table

To create a partitioned table, you must declare the table as partitioned at creation time. You do not need to specify a schema, as the schema can be specified when data is subsequently loaded or copied into the table.

To create a partitioned table:

Command-line


Use the bq mk command with the --time_partitioning_type flag. For example, the following command creates a partitioned table named table1 in the mydataset dataset:

bq mk --time_partitioning_type=DAY mydataset.table1

To create a partitioned table and specify an expiration time for data in the partitions, use the time_partitioning_expiration flag. For example, the following command creates a partitioned table that keeps data in a partition for three days, or 259,200 seconds, before the data expires:

bq mk --time_partitioning_type=DAY --time_partitioning_expiration=259200 mydataset.table2

Note that the command-line time_partitioning_expiration flag uses seconds, whereas the equivalent API command uses milliseconds.

API


Specify the partitioning configuration in the Tables::insert request. For example, the following configuration settings create a partitioned table named table1 in the mydataset dataset in the myProject project:

{
  "tableReference": {
    "projectId": "myProject",
    "tableId": "table1",
    "datasetId": "mydataset"
  },
  "timePartitioning": {
    "type": "DAY"
  }
}

To create a partitioned table and specify an expiration time for data in the partitions, use the expirationMs configuration setting. For example, the following configuration settings create a partitioned table that keeps data in a partition for three days, or 259,200,000 milliseconds, before the data expires:

{
  "tableReference": {
    "projectId": "myProject",
    "tableId": "table2",
    "datasetId": "mydataset"
  },
  "timePartitioning": {
    "type": "DAY",
    "expirationMs": 259200000
  }
}

Viewing the partition settings

To retrieve the partition configuration of a partitioned table, use the bq show command. For example, the following command retrieves the partition configuration for a table named table2 in the mydataset dataset:

bq show --format=prettyjson mydataset.table2

The following JSON is an excerpt of the output from the bq show command:

{
  ...
  "tableReference": {
    "datasetId": "mydataset",
    "projectId": "myproject",
    "tableId": "table2"
  },
  "timePartitioning": {
    "expirationMs": "2592000000",
    "type": "DAY"
  },
  "type": "TABLE"
}

Restating data in a partition

To update data in a specific partition, append a partition decorator to the name of the partitioned table when loading data into the table. A partition decorator represents a specific date and takes the form:

$YYYYMMDD

For example, the following command replaces the data in the entire partition for the date January 1, 2016 (20160101) in a partitioned table named mydataset.table1 with content loaded from a Cloud Storage bucket:

bq load  --replace --source_format=NEWLINE_DELIMITED_JSON 'mydataset.table1$20160101' gs://[MY_BUCKET]/replacement_json.json

Copying tables into partitions

Partition decorators also work when copying tables. For example, the following command copies an unpartitioned table, tableA, into the partition of table1 for the date February 1, 2016 (20160201):

bq cp --append_table mydataset.tableA 'mydataset.table1$20160101'

Updating partitions using query results

To use the output of a query to restate data for a partition, use a partition decorator when specifying the destination table. For example, the following command restates the data for the March 1, 2016 (20160301) partition of table1 using the output of the query.

bq query --allow_large_results --replace --noflatten_results --destination_table 'mydataset.table1$20160301' 'SELECT field1 + 10, field2 FROM mydataset.table1$20160301'

Listing partitions in a table

To list partitions in a table, query the table's summary partition by using the partition decorator separator ($) followed by __PARTITIONS_SUMMARY__. For example, the following command retrieves the partition IDs for table1:

SELECT partition_id from [mydataset.table1$__PARTITIONS_SUMMARY__];

Converting dated tables into a partitioned table

If you have previously created date-sharded tables, you can convert the entire set of related tables into a single partitioned table by using the bq partition command. The date-sharded tables must follow the dated table pattern:

[TABLE_NAME]_YYYYMMDD

For example, the following command converts a series of date-sharded tables, sharded_20160101, ... , sharded_20160331, to a single partitioned table named partitioned:

bq partition mydataset.sharded_ mydataset.partitioned

Example

In this section, you create a partitioned table using the command-line tool and add data to three of its partitions. The table will contain weather data, partitioned by date, for the first three days in 2016. To add data to the partitions, use the bq query command as described in Updating partitions using query results to write query results from a public weather dataset to the partitions.

Step 1. Create an empty date-partitioned table named temps in a dataset named mydataset.

bq mk --time_partitioning_type=DAY mydataset.temps

To view the configuration settings, use the bq show command:

bq show --format=prettyjson mydataset.temps

Look for the timePartitioning entry in the bq show command output:

{
  ...
  "timePartitioning": {
    "type": "DAY"
  },
  "type": "TABLE"
}

Step 2. Query the NOAA GSOD weather dataset for temperatures for the first three days of 2016 and write the results to the respective partitions in the temps table. The following queries use the --destination_table option to write 100 rows of query results to a partition.

legacy SQL

January 1, 2016 temperatures:

bq query --allow_large_results --replace --noflatten_results \
--destination_table 'mydataset.temps$20160101' \
'SELECT stn,temp from [bigquery-public-data:noaa_gsod.gsod2016] WHERE mo="01" AND da="01" limit 100'

January 2, 2016 temperatures:

bq query --allow_large_results --replace --noflatten_results \
--destination_table 'mydataset.temps$20160102' \
'SELECT stn,temp from [bigquery-public-data:noaa_gsod.gsod2016] WHERE mo="01" AND da="02" limit 100'

January 3, 2016 temperatures:

bq query --allow_large_results --replace --noflatten_results \
--destination_table 'mydataset.temps$20160103' \
'SELECT stn,temp from [bigquery-public-data:noaa_gsod.gsod2016] WHERE mo="01" AND da="03" limit 100'

standard SQL

January 1, 2016 temperatures:

bq query --use_legacy_sql=false --allow_large_results --replace \
--noflatten_results --destination_table 'mydataset.temps$20160101' \
'SELECT stn,temp from `bigquery-public-data.noaa_gsod.gsod2016` WHERE mo="01" AND da="01" limit 100'

January 2, 2016 temperatures:

bq query --use_legacy_sql=false --allow_large_results --replace \
--noflatten_results --destination_table 'mydataset.temps$20160102' \
'SELECT stn,temp from `bigquery-public-data.noaa_gsod.gsod2016` WHERE mo="01" AND da="02" limit 100'

January 3, 2016 temperatures:

bq query --use_legacy_sql=false --allow_large_results --replace \
--noflatten_results --destination_table 'mydataset.temps$20160103' \
'SELECT stn,temp from `bigquery-public-data.noaa_gsod.gsod2016` WHERE mo="01" AND da="03" limit 100'

Step 3. Confirm that you have 300 rows in your table using the bq show command.

bq show mydataset.temps

The results show the schema and Total Rows.

Last modified        Schema       Total Rows   Total Bytes   Expiration
----------------- ---------------- ------------ ------------- ------------
 30 Sep 11:50:20   |- stn: string   300          4800
                   |- temp: float

Next steps

Send feedback about...

BigQuery Documentation