Managing partitioned table data

This document describes how to manage partitioned table data in BigQuery. You can work with partitioned table data in the following ways:

  • Load data into a partitioned table
  • Browse (or preview) partitioned table data
  • Query partitioned table data
  • Append to or overwrite partitioned table data
  • Modify partitioned table data using Data Manipulation Language statements
  • Copy partitioned table data
  • Stream data into partitioned tables
  • Export partitioned table data

For information on managing table schemas, see Modifying table schemas.

Loading data into a partitioned table

You can create a partitioned table when you load data, or you can create an empty partitioned table and load the data later. When you load data into a partitioned table, you can use schema auto-detect for supported data formats, or you can specify the schema.

For more information on loading data, see the documentation for your source data's format and location:

Browsing table data

You can browse partitioned table data by:

  • Using the Cloud Console or the classic BigQuery web UI
  • Using the command-line tool's bq head command
  • Calling the tabledata.list API method
  • Using the client libraries

Required permissions

At a minimum, to browse table and partition data, you must be granted bigquery.tables.getData permissions. The following predefined Cloud IAM roles include bigquery.tables.getData permissions:

  • bigquery.dataViewer
  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to browse data in tables and partitions in the dataset.

For more information on Cloud IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Browsing partitioned table data

To browse partitioned table data:

Console

  1. In the Cloud Console, in the navigation pane click a dataset to list its tables and views.

  2. Click a partitioned table in the list.

  3. Cick the Details tab.

    Table details

  4. Note the value in Number of Rows. You may need this value to control the starting point for your results using the CLI or API.

    Number of rows

  5. Click the Preview tab. A sample set of data is displayed. Note that you cannot preview individual partitions by using the Cloud Console.

Classic UI

  1. In the BigQuery web UI, in the navigation pane, click the blue arrow to the left of your dataset to expand it, or double-click the dataset name. This displays the tables and views in the dataset.

  2. Click a partitioned table in the list.

  3. Cick Details and note the value in Number of Rows. You may need this value to control the starting point for your results using the CLI or API.

  4. Click Preview. A sample set of data is displayed. Note that you cannot preview individual partitions by using the BigQuery web UI.

    Table preview

CLI

Issue the bq head command with the --max_rows flag to list all fields in a particular number of table rows. If --max_rows is not specified, the default is 100. Specify a partition to browse using the partition decorator, for example, $20180224.

Because the bq head command does not create a query job, bq head commands do not appear in your query history, and you are not charged for them.

To browse a subset of fields in the table (including nested and repeated fields), use the --selected_fields flag and enter the fields as a comma- separated list.

To specify the number of rows to skip before displaying table data, use the --start_row=integer flag (or the -s shortcut). The default value is 0. You can retrieve the number of rows in a table by using the bq show command to retrieve table information.

If the table you're browsing is in a project other than your default project, add the project ID to the command in the following format: project_id:dataset.table.

bq head \
--max_rows integer1 \
--start_row integer2 \
--selected_fields "fields" \
project_id:dataset.table$partition

Where:

  • integer1 is the number of rows to display.
  • integer2 is the number of rows to skip before displaying data.
  • fields is a comma-separated list of fields.
  • project_id is your project ID.
  • dataset is the name of the dataset containing the table.
  • table is the name of the table to browse.
  • $partition is the partition decorator.

Examples:

Enter the following command to list all fields in the first 10 rows in mydataset.mytable in the "2018-02-24" partition. mydataset is in your default project.

bq head --max_rows=10 mydataset.mytable$20180224

Enter the following command to list all fields in the first 100 rows in mydataset.mytable in the "2016-09-01" partition. mydataset is in myotherproject, not your default project.

bq head --format=prettyjson 'mydataset.mycolumntable2$20160901'

Enter the following command to display only field1 and field2 in mydataset.mytable in the "2016-09-01" partition. The command uses the - -start_row flag to skip to row 100. mydataset.mytable is in your default project.

bq head \
--format=prettyjson \
--start_row 100 \
--selected_fields "state_number" \
'mydataset.mycolumntable2$20160901'

API

Browse through a table's data by calling tabledata.list. Specify the name of the table and partition decorator in the tableId parameter.

Configure these optional parameters to control the output:

  • maxResults — Maximum number of results to return.
  • selectedFields — Comma-separated list of fields to return. If unspecified, all fields are returned.
  • startIndex — Zero-based index of the starting row to read.

Values are returned wrapped in a JSON object that you must parse, as described in the tabledata.list reference documentation.

Querying partitioned table data

After you load your data into BigQuery, you can query the data in your tables. BigQuery supports two types of queries:

By default, BigQuery runs interactive queries, which means that the query is executed as soon as possible.

BigQuery also offers batch queries. BigQuery queues each batch query on your behalf and starts the query as soon as idle resources are available, usually within a few minutes.

You can run interactive and batch queries by using the:

For more information, see Querying partitioned tables.

Appending to and overwriting partitioned table data

You can overwrite partitioned table data using a load or query operation. You can append additional data to an existing partitioned table by performing a load-append operation or by appending query results.

Required permissions

At a minimum, to overwrite an existing partition or to append data to an existing partition, you must be granted the following permissions:

  • bigquery.tables.create
  • bigquery.tables.updateData
  • bigquery.jobs.create

Additional permissions such as bigquery.tables.getData may be required to access the data you're using to append or overwrite.

The following predefined Cloud IAM roles include bigquery.tables.updateData and bigquery.tables.create permissions:

  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

The following predefined Cloud IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to append to and overwrite tables and partitions in the dataset.

For more information on Cloud IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Using a load job

You can append to or overwrite partitions by using the bq load command or by calling the jobs.insert method and configuring a load job. The Cloud Console and the classic BigQuery web UI do not support appending to or overwriting a partition in a load job.

When append to or overwrite data in a specific partition by using a load job, note the following:

  • When you load data from Cloud Storage, the bucket must be in the same location as the BigQuery dataset.
  • The data you're loading must conform to the table's partitioning scheme. All rows written to the partition should have values that fall within the partition's date.
  • Because partitions in a partitioned table share the table schema, replacing data in a partition will not replace the schema of the table. Instead, the schema of the new data must be compatible with the table schema. For information on updating the table's schema in the load job, see Managing table schemas.
  • When you append data to an ingestion-time partitioned table, if you do not specify a partition decorator, the current partition is used.

To append to or overwrite partitioned table data using a load job, specify a destination table and partition decorator and set the write disposition flag to either:

CLI option API option Description
--noreplace WRITE_APPEND Appends the data to an existing partition. If no write disposition option is specified, the default action is to append the data to the partition.
--replace WRITE_TRUNCATE Overwrites (restates) a partition.

A partition decorator represents a specific date and takes the form:

$YYYYMMDD

For example, the following command replaces the data in the entire partition for the date January 1, 2016 (20160101) in a partitioned table named mydataset.table1. The JSON data is loaded from a Cloud Storage bucket.

bq load \
--replace \
--source_format=NEWLINE_DELIMITED_JSON \
'mydataset.table1$20160101' \
gs://mybucket/myfile.json

Using a query job

You can append to or overwrite partitions by using the bq query command or by calling the jobs.insert method and configuring a query job. The Cloud Console and the classic BigQuery web UI do not support appending to or overwriting a partition in a query job.

When you use a query job to append to or overwrite a partition, note the following:

  • The tables you're querying must be in the same location as the table you're appending or overwriting.
  • When you append to or overwrite a partition in an ingestion-time partitioned table, you can use either legacy SQL or standard SQL syntax.
  • When you append to or overwrite a partition in a partitioned table, your query must use standard SQL syntax. Currently, legacy SQL is not supported for querying partitioned tables or for writing query results to partitioned tables.
  • When writing query results to a partition, the data that is being written to the partition must conform to the table's partitioning scheme. All rows written to the partition should have values that fall within the partition's date.
  • When you append data to an ingestion-time partitioned table, if you do not specify a partition decorator, the current partition is used.

To append to or overwrite a partition using query results, specify a destination table with a partition decorator and set the write disposition to either:

CLI option API option Description
--append_table WRITE_APPEND Appends the query results to an existing partition.
--replace WRITE_TRUNCATE Overwrites (restates) a partition using the query results.

For example, the following command restates the data for the March 1, 2016 (20160301) partition of table1 using query results.

bq query \
--use_legacy_sql=false \
--replace \
--destination_table 'mydataset.table1$20160301' \
'SELECT
  column1,
  column2
FROM
  mydataset.mytable'

If the destination table exists, and it is not partitioned, the following error is returned: BigQuery error in query operation: Error processing job 'project_id job_id' Incompatible table partitioning specification. Expects partitioning specification interval (type:day), but input partitioning specification is none`.

For more information on using query results to append to or overwrite data, see Writing query results.

Modifying partitioned table data using DML statements

You can modify data in a partitioned table using DML statements in the standard SQL dialect. DML statements allow you to perform bulk row updates, insertions, and deletions. For examples on using DML with partitioned tables, see Updating partitioned table data using DML statements.

The legacy SQL dialect does not support DML statements. To update or delete data using legacy SQL, you must delete the partitioned table and then recreate it with new data. Alternatively, you can write a query that modifies the data and write the query results to a new, partitioned table.

Copying partitioned table data

You can copy a partitioned table by:

  • Using the Cloud Console or the classic BigQuery web UI
  • Using the command-line tool's bq cp command
  • Calling the jobs.insert API method and configuring a copy job
  • Using the client libraries

For more information on copying tables, see Copying a table.

You can copy one or more partitions by using the command-line tool's bq cp command, or by calling the jobs.insert API method and configuring a copy job. Copying partitions is not currently supported by the Cloud Console or the classic BigQuery web UI.

For more information on copying partitions, see Copying partitions.

Streaming data into partitioned tables

To stream data to a specific partition, use the partition decorator when specifying the tableId of the table to which you are streaming. For example, the following command streams a single row to a partition for the date January 1, 2017 ($20170101) into a partitioned table named mydataset.mytable:

echo '{"a":1, "b":2}' | bq insert 'mydataset.mytable$20170101'

This command is used to demonstrate using the partition decorator. The bq insert command is meant for testing only. To stream data into BigQuery, use the API's tabledata.insertAll method. For more information on streaming data into partitions, see Streaming into partitioned tables.

When streaming using a partition decorator, you can stream to partitions within the last 30 days in the past and 5 days in the future relative to the current date, based on current UTC time. To write to partitions for dates outside these allowed bounds, you can use load or query jobs.

When you specify a time-partitioned table as the destination table when you stream data, each partition has a streaming buffer. The streaming buffer is retained when you perform a load, query, or copy job that overwrites a partition by setting the writeDisposition property to WRITE_TRUNCATE. If you want to remove the streaming buffer, verify that the streaming buffer is empty by calling tables.get on the partition.

Exporting table data

Exporting all data from a partitioned table is the same process as exporting data from a non-partitioned table. For more information, see Exporting table data. To export data from an individual partition, append the partition decorator, $date, to the table name. For example: mytable$20160201.

You can also export data from the __NULL__ and __UNPARTITIONED__ partitions by appending the partition names to the table name. For example, mytable$__NULL__ or mytable$__UNPARTITIONED__.

You can export partitioned table data in CSV, JSON, or Avro format. Currently, you must export data into a Cloud Storage bucket. Exporting to your local machine is not supported; however, you can download and save query results using the Cloud Console or the classic BigQuery web UI.

Next steps

To learn more about working with partitioned tables, see:

Was deze pagina nuttig? Laat ons weten hoe goed we u hebben geholpen:

Feedback verzenden over...

Hulp nodig? Ga naar onze ondersteuningspagina.