Creating and Updating Date-Partitioned Tables

BigQuery users typically split large tables into many smaller partitions, where each partition contains a day's worth of data. Tables that are divided into date-based partitions are called date-partitioned tables.

If you create a date-partitioned table, BigQuery automatically loads data into a date-based partition of the table. You can then reduce the number of bytes processed by restricting your queries to specific partitions in the table.

For conceptual information about partitioned tables, see Partitioned Tables. For a simple example of creating and populating a date-partitioned table, see the Example section.

Creating a partitioned table

To create a partitioned table, you must declare the table as partitioned at creation time. You do not need to specify a schema, as the schema can be specified when data is subsequently loaded or copied into the table.

To create a partitioned table:

Command-line


Use the bq mk command with the --time_partitioning_type flag. For example, the following command creates a partitioned table named table1 in the mydataset dataset:

bq mk --time_partitioning_type=DAY mydataset.table1

To create a partitioned table and specify an expiration time for data in the partitions, use the time_partitioning_expiration flag. For example, the following command creates a partitioned table that keeps data in a partition for three days, or 259,200 seconds, before the data expires:

bq mk --time_partitioning_type=DAY --time_partitioning_expiration=259200 mydataset.table2

Note that the command-line time_partitioning_expiration flag uses seconds, whereas the equivalent API command uses milliseconds.

API


Specify the partitioning configuration in the Tables::insert request. For example, the following configuration settings create a partitioned table named table1 in the mydataset dataset in the myProject project:

{
  "tableReference": {
    "projectId": "myProject",
    "tableId": "table1",
    "datasetId": "mydataset"
  },
  "timePartitioning": {
    "type": "DAY"
  }
}

To create a partitioned table and specify an expiration time for data in the partitions, use the expirationMs configuration setting. For example, the following configuration settings create a partitioned table that keeps data in a partition for three days, or 259,200,000 milliseconds, before the data expires:

{
  "tableReference": {
    "projectId": "myProject",
    "tableId": "table2",
    "datasetId": "mydataset"
  },
  "timePartitioning": {
    "type": "DAY",
    "expirationMs": 259200000
  }
}

Viewing the partition settings

To retrieve the partition configuration of a partitioned table, use the bq show command. For example, the following command retrieves the partition configuration for a table named table2 in the mydataset dataset:

bq show --format=prettyjson mydataset.table2

The following JSON is an excerpt of the output from the bq show command:

{
  ...
  "tableReference": {
    "datasetId": "mydataset",
    "projectId": "myproject",
    "tableId": "table2"
  },
  "timePartitioning": {
    "expirationMs": "2592000000",
    "type": "DAY"
  },
  "type": "TABLE"
}

Restating data in a partition

To update data in a specific partition, append a partition decorator to the name of the partitioned table when loading data into the table. A partition decorator represents a specific date and takes the form:

$YYYYMMDD

For example, the following command replaces the data in the entire partition for the date January 1, 2016 (20160101) in a partitioned table named mydataset.table1 with content loaded from a Cloud Storage bucket:

bq load  --replace --source_format=NEWLINE_DELIMITED_JSON 'mydataset.table1$20160101' gs://[MY_BUCKET]/replacement_json.json

Copying tables into partitions

Partition decorators also work when copying tables. For example, the following command copies an unpartitioned table, tableA, into the partition of table1 for the date February 1, 2016 (20160201):

bq cp --append_table mydataset.tableA 'mydataset.table1$20160101'

Updating partitions using query results

To use the output of a query to restate data for a partition, use a partition decorator when specifying the destination table. For example, the following command restates the data for the March 1, 2016 (20160301) partition of table1 using the output of the query.

bq query --allow_large_results --replace --noflatten_results --destination_table 'mydataset.table1$20160301' 'SELECT field1 + 10, field2 FROM mydataset.table1$20160301'

Streaming data into partitions

To stream data to a specific partition, use the partition decorator when specifying the tableId of the table to which you are streaming. For example, the following command streams a single row to a partition for the date January 1, 2017(20170101) in a partitioned table named mydataset.table1:

echo '{"a":1, "b":2}' | bq insert 'mydataset.table1$20170101'

When streaming using a partition decorator, you can stream to partitions within the last 30 days in the past and 5 days in the future relative to the current date, based on current UTC time. To write to partitions for dates outside these allowed bounds, you can use load or query jobs, as described above.

Listing partitions in a table

To list partitions in a table, query the table's summary partition by using the partition decorator separator ($) followed by __PARTITIONS_SUMMARY__. For example, the following command retrieves the partition IDs for table1:

#legacySQL
SELECT
  partition_id
FROM
  [mydataset.table1$__PARTITIONS_SUMMARY__]

Checking when a partition was last modified

To list the times when the partitions in a table were last modified, use legacy SQL to select the last_modified_time field from the table's summary partition. You must use legacy SQL because standard SQL does not support the partition decorator separator ($). For example, the following command lists the partition ID and last modified time for all partitions in table1 that is in a dataset named mydataset:

#legacySQL
SELECT
  partition_id,
  last_modified_time
FROM
  [mydataset.table1$__PARTITIONS_SUMMARY__]

Sample output:

+--------------+--------------------+
| partition_id | last_modified_time |
+--------------+--------------------+
| 20160102     |      1471632556179 |
| 20160101     |      1471632538142 |
| 20160103     |      1471632570463 |
+--------------+--------------------+

To display the last_modified_time field in human-readable format, use the FORMAT_UTC_USEC function:

#legacySQL
SELECT
  partition_id,
  FORMAT_UTC_USEC(last_modified_time*1000) AS last_modified
FROM
  [mydataset.table1$__PARTITIONS_SUMMARY__]

Sample output:

+--------------+----------------------------+
| partition_id |       last_modified        |
+--------------+----------------------------+
| 20160103     | 2016-08-19 18:49:30.463000 |
| 20160102     | 2016-08-19 18:49:16.179000 |
| 20160101     | 2016-08-19 18:48:58.142000 |
+--------------+----------------------------+

Converting dated tables into a partitioned table

If you have previously created date-sharded tables, you can convert the entire set of related tables into a single partitioned table by using the bq partition command. The date-sharded tables must follow the dated table pattern:

[TABLE_NAME]_YYYYMMDD

For example, the following command converts a series of date-sharded tables, sharded_20160101, ... , sharded_20160331, to a single partitioned table named partitioned:

bq partition mydataset.sharded_ mydataset.partitioned

Example

In this section, you create a partitioned table using the command-line tool and add data to three of its partitions. The table will contain weather data, partitioned by date, for the first three days in 2016. To add data to the partitions, use the bq query command as described in Updating partitions using query results to write query results from a public weather dataset to the partitions.

Step 1. Create an empty date-partitioned table named temps in a dataset named mydataset.

bq mk --time_partitioning_type=DAY mydataset.temps

To view the configuration settings, use the bq show command:

bq show --format=prettyjson mydataset.temps

Look for the timePartitioning entry in the bq show command output:

{
  ...
  "timePartitioning": {
    "type": "DAY"
  },
  "type": "TABLE"
}

Step 2. Query the NOAA GSOD weather dataset for temperatures for the first three days of 2016 and write the results to the respective partitions in the temps table. The following queries use the --destination_table option to write 100 rows of query results to a partition.

legacy SQL

January 1, 2016 temperatures:

bq query --allow_large_results --replace --noflatten_results \
--destination_table 'mydataset.temps$20160101' \
'SELECT stn,temp from [bigquery-public-data:noaa_gsod.gsod2016] WHERE mo="01" AND da="01" limit 100'

January 2, 2016 temperatures:

bq query --allow_large_results --replace --noflatten_results \
--destination_table 'mydataset.temps$20160102' \
'SELECT stn,temp from [bigquery-public-data:noaa_gsod.gsod2016] WHERE mo="01" AND da="02" limit 100'

January 3, 2016 temperatures:

bq query --allow_large_results --replace --noflatten_results \
--destination_table 'mydataset.temps$20160103' \
'SELECT stn,temp from [bigquery-public-data:noaa_gsod.gsod2016] WHERE mo="01" AND da="03" limit 100'

standard SQL

January 1, 2016 temperatures:

bq query --use_legacy_sql=false --allow_large_results --replace \
--noflatten_results --destination_table 'mydataset.temps$20160101' \
'SELECT stn,temp from `bigquery-public-data.noaa_gsod.gsod2016` WHERE mo="01" AND da="01" limit 100'

January 2, 2016 temperatures:

bq query --use_legacy_sql=false --allow_large_results --replace \
--noflatten_results --destination_table 'mydataset.temps$20160102' \
'SELECT stn,temp from `bigquery-public-data.noaa_gsod.gsod2016` WHERE mo="01" AND da="02" limit 100'

January 3, 2016 temperatures:

bq query --use_legacy_sql=false --allow_large_results --replace \
--noflatten_results --destination_table 'mydataset.temps$20160103' \
'SELECT stn,temp from `bigquery-public-data.noaa_gsod.gsod2016` WHERE mo="01" AND da="03" limit 100'

Step 3. Confirm that you have 300 rows in your table using the bq show command.

bq show mydataset.temps

The results show the schema and Total Rows.

Last modified        Schema       Total Rows   Total Bytes   Expiration
----------------- ---------------- ------------ ------------- ------------
 30 Sep 11:50:20   |- stn: string   300          4800
                   |- temp: float

Next steps

Send feedback about...

BigQuery Documentation