Creating Partitioned Tables

You can split large tables into many smaller partitions, where each partition contains a day's worth of data. Tables that are divided into date-based partitions are called date-partitioned tables.

If you create a date-partitioned table, BigQuery automatically loads data into a date-based partition of the table. You can then reduce the number of bytes that are processed by restricting your queries to specific partitions in the table.

Creating a partitioned table

To create a partitioned table, you can declare the table as partitioned at creation time. You do not need to specify a schema. You can specify a schema when you load or copy data to the table.

To create a partitioned table:

Web UI

  1. Click the down arrow icon down arrow icon next to your dataset name in the navigation and then click Create new table.

  2. On the Create table page:

    • For Source Data click Create empty table.
    • For Destination Table choose your dataset and enter the table name in the Destination table name field.
    • In the Options section, for Partitioning choose Day.
    • Click Create Table.

Command-line

Use the mk command with the --time_partitioning_type flag.

To create an empty table in an existing dataset:

bq mk --time_partitioning_type=DAY [DATASET].[TABLE]

Where:

  • [DATASET] is a dataset in your project.
  • [TABLE] is the name of the table you're creating.

API

Specify the partitioning configuration in the Tables::insert request. For example, the following configuration settings create a partitioned table named table1 in the mydataset dataset in the myProject project:

{
  "tableReference": {
    "projectId": "myProject",
    "tableId": "table1",
    "datasetId": "mydataset"
  },
  "timePartitioning": {
    "type": "DAY"
  }
}

Creating a partitioned table when loading data

You can create a partitioned table by specifying partitioning options when you load data into a new destination table.

If the destination table exists and it's not partitioned, specifying partitioning options results in an error.

Web UI

  1. Click the down arrow icon down arrow icon next to your dataset name in the navigation and then click Create new table.

  2. In the Source Data section:

    • Select Create from source.
    • For Location choose Google Cloud Storage and specify the location of the source data by using the following path: gs://[BUCKET]/[OBJECT]
  3. In the Destination Table section, choose your dataset and enter the table name in the Destination table name field.

  4. In the Options section, for Partitioning choose Day.

  5. Click Create Table.

Command-line

Use the load command with the --time_partitioning_type flag.

To load a JSON file into a new partitioned table:

bq load --source_format=NEWLINE_DELIMITED_JSON --time_partitioning_type=DAY [DATASET].[TABLE] gs://[BUCKET]/my_file.json

Where:

  • [DATASET] is a dataset in your project.
  • [TABLE] is the name of the table you're creating.
  • [BUCKET] is the name of a storage bucket.

API

Specify the partitioning configuration and the destination table in the Jobs::insert under the configuration.load property:

{ // Job resource
  ...
  "configuration" : {
    ...
: {"load" : {
      ...
      "destinationTable": {
        "projectId": "myproject",
        "datasetId": "mydataset",
        "tableId": "new_table"
      },
      "timePartitioning": {
        "type": "DAY"
      }
    }
  }
}

Writing query results to a partitioned table

You can create a partitioned table by specifying partitioning options when you write query results to a new destination table.

If the destination table already exists and it's not partitioned, specifying partitioning options results in an error.

Command-line

Use the query command with the --time_partitioning_type flag.

To run a query and write the results into a new partitioned table:

bq query --destination_table [DATASET].[TABLE] --time_partitioning_type=DAY 'SELECT * FROM [[DATASET].[TABLE]_latest]'

Where:

  • [DATASET] is a dataset in your project.
  • [TABLE] is the name of the table you're creating.
  • [BUCKET] is the name of a storage bucket.

API

Specify the partitioning configuration and the destination table in the Jobs::insert under the configuration.query property:

{ // Job resource
  ...
  "configuration" : {
    ...
    "query" : {
      ...
      "destinationTable": {
        "projectId": "myproject",
        "datasetId": "mydataset",
        "tableId": "new_table"
      },
      "timePartitioning": {
        "type": "DAY"
      }
    }
  }
}

Setting the expiration of a partition

To create a partitioned table and specify an expiration time for data in partitions:

Command-line

Use the mk command with the time_partitioning_expiration flag.

To create a partitioned table that keeps data in a partition for three days, or 259,200 seconds, and before the data expires:

bq mk --time_partitioning_type=DAY --time_partitioning_expiration=259200 [DATASET].[TABLE]

Where:

  • [DATASET] is a dataset in your project.
  • [TABLE] is the name of the table you're creating.

Notice that the command-line time_partitioning_expiration flag uses seconds, whereas the equivalent API command uses milliseconds.

API


Use the expirationMs configuration setting. For example, the following configuration settings create a partitioned table that keeps data in a partition for three days, or 259,200,000 milliseconds, before the data expires:

{
  "tableReference": {
    "projectId": "myProject",
    "tableId": "table2",
    "datasetId": "mydataset"
  },
  "timePartitioning": {
    "type": "DAY",
    "expirationMs": 259200000
  }
}

Converting dated tables into a partitioned table

If you have previously created date-sharded tables, you can convert the entire set of related tables into a single partitioned table by using the partition command. The date-sharded tables must follow the dated table pattern:

[TABLE_NAME]_YYYYMMDD

For example, your date-sharded tables might be named sharded_20160101, ... , sharded_20160331.

To convert a series of date-sharded tables to a single partitioned table:

bq partition [DATASET].[TABLE_NAME]_ [DATASET].[PARTITIONED]

Where:

  • [DATASET] is a dataset in your project.
  • [TABLE_NAME]_ is the prefix of your date-sharded tables.
  • [PARTITIONED] is the name of the partitioned table.

Viewing the partition settings

You can view information about partitioned table.

Web UI

  1. In the navigation pane, click the blue arrow to the left of your dataset to expand it, or double-click the dataset name. This displays the tables and views in the dataset.

  2. Click the table name.

  3. Click Details. The Table Details page displays the table's description and table information.

Command-line

To retrieve the partition configuration of a partitioned table, use the show command:

bq show --format=prettyjson [DATASET].[TABLE]

Where:

  • [DATASET] is a dataset in your project.
  • [TABLE] is the name of your table.

The following JSON is an excerpt of the output from the show command:

{
  ...
  "tableReference": {
    "datasetId": "mydataset",
    "projectId": "myproject",
    "tableId": "table2"
  },
  "timePartitioning": {
    "expirationMs": "2592000000",
    "type": "DAY"
  },
  "type": "TABLE"
}

API

Call the bigquery.tables.get API method and provide any relevant parameters.

Copying partitioned tables

If you copy a partitioned table to a new table, all of the partitioning information is copied with the table. The new table and the old table will have identical partitions.

Example: Creating a partitioned table and adding data to its partitions

In this example, you create a partitioned table by using the command-line tool and add data to three of its partitions. The table will contain weather data, partitioned by date, for the first three days in 2016. To add data to the partitions, use the bq query command as described in Updating partitions using query results to write query results from a public weather dataset to the partitions.

Step 1. Create an empty date-partitioned table named temps in a dataset named mydataset.

bq mk --time_partitioning_type=DAY mydataset.temps

To view the configuration settings, use the bq show command:

bq show --format=prettyjson mydataset.temps

Look for the timePartitioning entry in the bq show command output:

{
  ...
  "timePartitioning": {
    "type": "DAY"
  },
  "type": "TABLE"
}

Step 2. Query the NOAA GSOD weather dataset for temperatures for the first three days of 2016 and write the results to the respective partitions in the temps table. The following queries use the --destination_table option to write 100 rows of query results to a partition.

legacy SQL

January 1, 2016 temperatures:

bq query --allow_large_results --replace --noflatten_results \
--destination_table 'mydataset.temps$20160101' \
'SELECT stn,temp from [bigquery-public-data:noaa_gsod.gsod2016] WHERE mo="01" AND da="01" limit 100'

January 2, 2016 temperatures:

bq query --allow_large_results --replace --noflatten_results \
--destination_table 'mydataset.temps$20160102' \
'SELECT stn,temp from [bigquery-public-data:noaa_gsod.gsod2016] WHERE mo="01" AND da="02" limit 100'

January 3, 2016 temperatures:

bq query --allow_large_results --replace --noflatten_results \
--destination_table 'mydataset.temps$20160103' \
'SELECT stn,temp from [bigquery-public-data:noaa_gsod.gsod2016] WHERE mo="01" AND da="03" limit 100'

standard SQL

January 1, 2016 temperatures:

bq query --use_legacy_sql=false --allow_large_results --replace \
--noflatten_results --destination_table 'mydataset.temps$20160101' \
'SELECT stn,temp from bigquery-public-data.noaa_gsod.gsod2016 WHERE mo="01" AND da="01" limit 100'

January 2, 2016 temperatures:

bq query --use_legacy_sql=false --allow_large_results --replace \
--noflatten_results --destination_table 'mydataset.temps$20160102' \
'SELECT stn,temp from bigquery-public-data.noaa_gsod.gsod2016 WHERE mo="01" AND da="02" limit 100'

January 3, 2016 temperatures:

bq query --use_legacy_sql=false --allow_large_results --replace \
--noflatten_results --destination_table 'mydataset.temps$20160103' \
'SELECT stn,temp from bigquery-public-data.noaa_gsod.gsod2016 WHERE mo="01" AND da="03" limit 100'

Step 3. Confirm that you have 300 rows in your table by using the bq show command.

bq show mydataset.temps

The results show the schema and Total Rows.

Last modified        Schema       Total Rows   Total Bytes   Expiration
----------------- ---------------- ------------ ------------- ------------
 30 Sep 11:50:20   |- stn: string   300          4800
                   |- temp: float

Next steps

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...