Managing Partitioned Table Data

This document describes how to manage partitioned table data in BigQuery. You can work with partitioned table data in the following ways:

  • Load data into a partitioned table
  • Browse (or preview) partitioned table data
  • Query partitioned table data
  • Append to or overwrite partitioned table data
  • Modify partitioned table data using Data Manipulation Language statements
  • Copy partitioned table data
  • Stream data into partitioned tables
  • Export partitioned table data

For information on managing table schemas, see Modifying Table Schemas.

Loading data into a partitioned table

You can create a partitioned table when you load data, or you can create an empty partitioned table and load the data later. When you load data into a partitioned table, you can use schema auto-detect for supported data formats, or you can specify the schema.

For more information on loading data, see the documentation for your source data's format and location:

Browsing table data

You can browse partitioned table data using the BigQuery web UI, the command-line tool's bq head command, or by calling the tabledata.list API method.

Required permissions

At the dataset level, browsing table data requires READER access to the dataset that contains the partitioned table being browsed.

Instead of using dataset-level permissions, you can leverage an IAM role that includes bigquery.tables.getData permissions. All predefined, project-level IAM roles include bigquery.tables.getData permissions except for bigquery.user, bigquery.jobUser, and bigquery.metadataViewer.

In addition, because the bigquery.user role has bigquery.datasets.create permissions, a user assigned to the bigquery.user role can read the data in any table in any dataset that user creates. When a user assigned to the bigquery.user role creates a dataset, that user is given OWNER access to the dataset if no other owner is set. OWNER access to a dataset gives the user full control over it and all the tables in it.

For more information on IAM roles and permissions in BigQuery, see access control. For more information on dataset-level roles, see Primitive roles for datasets.

Browsing partitioned table data

To browse partitioned table data:

Web UI

  1. In the BigQuery web UI, in the navigation pane, click the blue arrow to the left of your dataset to expand it, or double-click the dataset name. This displays the tables and views in the dataset.

  2. Click a partitioned table in the list. You cannot preview individual partitions by using the BigQuery web UI.

  3. Cick Details and note the value in Number of Rows. You may need this value to control the starting point for your results using the CLI or API.

  4. Click Preview. A sample set of data is displayed.

    Table preview

Command-line

Issue the bq head command with the --max_rows flag to list all fields in a particular number of table rows. If --max_rows is not specified, the default is 100. Specify a partition to browse using the partition decorator, for example, $20180224.

Because the bq head command does not create a query job, bq head commands do not appear in your query history, and you are not charged for them.

To browse a subset of fields in the table (including nested and repeated fields), use the --selected_fields flag and enter the fields as a comma-separated list.

To specify the number of rows to skip before displaying table data, use the --start_row=[INTEGER] flag (or the -s shortcut). The default value is 0. You can retrieve the number of rows in a table by using the bq show command to retrieve table information.

If the table you're browsing is in a project other than your default project, add the project ID to the command in the following format: [PROJECT_ID]:[DATASET].[TABLE].

bq head --max_rows [INTEGER1] --start_row [INTEGER2] --selected_fields "[FIELDS]" [PROJECT_ID]:[DATASET].[TABLE]$[PARTITION]

Where:

  • [INTEGER1] is the number of rows to display.
  • [INTEGER2] is the number of rows to skip before displaying data.
  • [FIELDS] is a comma-separated list of fields.
  • [PROJECT_ID] is your project ID.
  • [DATASET] is the name of the dataset containing the table.
  • [TABLE] is the name of the table to browse.
  • [PARTITION] is the partition decorator.

Examples:

Enter the following command to list all fields in the first 10 rows in mydataset.mytable in the "2018-02-24" partition. mydataset is in your default project.

bq head --max_rows=10 mydataset.mytable$20180224

Enter the following command to list all fields in the first 100 rows in mydataset.mytable in the "2016-09-01" partition. mydataset is in myotherproject, not your default project.

bq head --format=prettyjson 'mydataset.mycolumntable2$20160901'

Enter the following command to display only field1 and field2 in mydataset.mytable in the "2016-09-01" partition. The command uses the --start_row flag to skip to row 100. mydataset.mytable is in your default project.

bq head --format=prettyjson --start_row 100 --selected_fields "state_number" 'mydataset.mycolumntable2$20160901'

API

Browse through a table's data by calling tabledata.list. Specify the name of the table and partition decorator in the tableId parameter.

Configure these optional parameters to control the output:

  • maxResults — Maximum number of results to return
  • selectedFields — Comma-separated list of fields to return; If unspecified, all fields are returned
  • startIndex — Zero-based index of the starting row to read

Values are returned wrapped in a JSON object that you must parse, as described in the tabledata.list reference documentation.

Querying partitioned table data

After you load your data into BigQuery, you can query the data in your tables. BigQuery supports two types of queries:

By default, BigQuery runs interactive queries, which means that the query is executed as soon as possible.

BigQuery also offers batch queries. BigQuery queues each batch query on your behalf and starts the query as soon as idle resources are available, usually within a few minutes.

You can run interactive and batch queries by using the:

For more information, see Querying Partitioned Tables.

Appending to and overwriting partitioned table data

You can overwrite partitioned table data using a load or query operation. You can append additional data to an existing partitioned table by performing a load-append operation or by appending query results.

Required permissions

To overwrite an existing partition or to append data to an existing partition, you must have WRITER access at the dataset level, or you must be assigned a project-level IAM role that includes bigquery.tables.updateData permissions. The following predefined, project-level IAM roles include bigquery.tables.updateData permissions:

In addition, because the bigquery.user role has bigquery.datasets.create permissions, a user assigned to the bigquery.user role can overwrite or append data in any partition of any table that user creates in the dataset. When a user assigned to the bigquery.user role creates a dataset, that user is given OWNER access to the dataset if no other owner is specified. OWNER access to a dataset gives the user full control over it, all the tables in it, and all the table partitions.

You must also be granted bigquery.jobs.create permissions to run load jobs or query jobs. The following predefined, project-level IAM roles include bigquery.jobs.create permissions:

For more information on IAM roles and permissions in BigQuery, see Access Control. For more information on dataset-level roles, see Primitive roles for datasets.

Using a load job

You can append to or overwrite partitions by using the bq load command or by calling the jobs.insert method and configuring a load job. The BigQuery web UI does not support appending to or overwriting a partition in a load job.

When append to or overwrite data in a specific partition by using a load job, note the following:

  • When you load data from Cloud Storage, the bucket must be in the same location as the BigQuery dataset.
  • The data you're loading must conform to the table's partitioning scheme. All rows written to the partition should have values that fall within the partition's date.
  • Because partitions in a partitioned table share the table schema, replacing data in a partition will not replace the schema of the table. Instead, the schema of the new data must be compatible with the table schema. For information on updating the table's schema in the load job, see Managing Table Schemas.
  • When you append data to an ingestion-time partitioned table, if you do not specify a partition decorator, the current partition is used.

To append to or overwrite partitioned table data using a load job, specify a destination table and partition decorator and set the write disposition flag to either:

CLI option API option Description
--noreplace WRITE_APPEND Appends the data to an existing partition. If no write disposition option is specified, the default action is to append the data to the partition.
--replace WRITE_TRUNCATE Overwrites (restates) a partition.

A partition decorator represents a specific date and takes the form:

$YYYYMMDD

For example, the following command replaces the data in the entire partition for the date January 1, 2016 (20160101) in a partitioned table named mydataset.table1. The JSON data is loaded from a Cloud Storage bucket. The Cloud Storage bucket and the BigQuery dataset were created in the asia-northeast1 region.

bq --location=asia-northeast1 load --replace --source_format=NEWLINE_DELIMITED_JSON 'mydataset.table1$20160101' gs://mybucket/myfile.json

For more information on appending to or overwriting data by using a load job, see the documentation for your source data format:

Using a query job

You can append to or overwrite partitions by using the bq query command or by calling the jobs.insert method and configuring a query job. The BigQuery web UI does not support appending to or overwriting a partition in a query job.

When you use a query job to append to or overwrite a partition, note the following:

  • The tables you're querying must be in the same location as the table you're appending or overwriting.
  • When you append to or overwrite a partition in an ingestion-time partitioned table, you can use either legacy SQL or standard SQL syntax.
  • When you append to or overwrite a partition in a partitioned table, your query must use standard SQL syntax. Currently, legacy SQL is not supported for querying partitioned tables or for writing query results to partitioned tables.
  • When writing query results to a partition, the data that is being written to the partition must conform to the table's partitioning scheme. All rows written to the partition should have values that fall within the partition's date.
  • When you append data to an ingestion-time partitioned table, if you do not specify a partition decorator, the current partition is used.

To append to or overwrite a partition using query results, specify a destination table with a partition decorator and set the write disposition to either:

CLI option API option Description
--append_table WRITE_APPEND Appends the query results to an existing partition.
--replace WRITE_TRUNCATE Overwrites (restates) a partition using the query results.

For example, the following command restates the data for the March 1, 2016 (20160301) partition of table1 using query results. The destination table and the queried table are both in the US multi-region location:

bq --location=US query --use_legacy_sql=false --replace --destination_table 'mydataset.table1$20160301' 'SELECT column1, column2 FROM mydataset.mytable'

If the destination table exists, and it is not partitioned, the following error is returned: BigQuery error in query operation: Error processing job '[PROJECT][JOB_ID]' Incompatible table partitioning specification. Expects partitioning specification interval (type:day), but input partitioning specification is none.

For more information on using query results to append to or overwrite data, see Writing Query Results.

Modifying partitioned table data using DML statements

You can modify data in a partitioned table using DML statements in the standard SQL dialect. DML statements allow you to perform bulk row updates, insertions, and deletions. For examples on using DML with partitioned tables, see Updating Partitioned Table Data Using DML Statements.

The legacy SQL dialect does not support DML statements. To update or delete data using legacy SQL, you must delete the partitioned table and then recreate it with new data. Alternatively, you can write a query that modifies the data and write the query results to a new, partitioned table.

Copying partitioned table data

You can copy a partitioned table by using the BigQuery web UI, the command-line tool's bq cp command, or by calling the jobs.insert API method and configuring a copy job.

For more information on copying tables, see Copying a table.

You can copy one or more partitions by using the command-line tool's bq cp command, or by calling the jobs.insert API method and configuring a copy job. Copying partitions is not currently supported by the BigQuery web UI.

For more information on copying partitions, see Copying partitions.

Streaming data into partitioned tables

To stream data to a specific partition, use the partition decorator when specifying the tableId of the table to which you are streaming. For example, the following command streams a single row to a partition for the date January 1, 2017 ($20170101) into a partitioned table named mydataset.mytable:

echo '{"a":1, "b":2}' | bq insert 'mydataset.mytable$20170101'

This command is used to demonstrate using the partition decorator. The bq insert command is meant for testing only. To stream data into BigQuery, use the API's tabledata.insertAll method. For more information on streaming data into partitions, see streaming into partitioned tables.

When streaming using a partition decorator, you can stream to partitions within the last 30 days in the past and 5 days in the future relative to the current date, based on current UTC time. To write to partitions for dates outside these allowed bounds, you can use load or query jobs.

When you specify a time-partitioned table as the destination table when you stream data, each partition has a streaming buffer. The streaming buffer is retained when you perform a load, query, or copy job that overwrites a partition by setting the writeDisposition property to WRITE_TRUNCATE. If you want to remove the streaming buffer, verify that the streaming buffer is empty by calling tables.get on the partition.

Exporting table data

Exporting data from a partitioned table is the same process as exporting data from a non-partitioned table. Currently, it is not possible to export individual partitions. For more information, see Exporting Table Data.

You can export partitioned table data in CSV, JSON, or Avro format. Currently, you must export data into a Cloud Storage bucket. Exporting to your local machine is not supported; however, you can download and save query results using the BigQuery web UI.

Next steps

To learn more about working with partitioned tables, see:

Was this page helpful? Let us know how we did:

Send feedback about...

Need help? Visit our support page.