This document describes how to manage partitioned table data in BigQuery. You can work with partitioned table data in the following ways:
- Load data into a partitioned table
- Browse (or preview) partitioned table data
- Query partitioned table data
- Append to or overwrite partitioned table data
- Modify partitioned table data using Data Manipulation Language statements
- Copy partitioned table data
- Stream data into partitioned tables
- Export partitioned table data
For information on managing table schemas, see Modifying table schemas.
Loading data into a partitioned table
You can create a partitioned table when you load data, or you can create an empty partitioned table and load the data later. When you load data into a partitioned table, you can use schema auto-detect for supported data formats, or you can specify the schema.
To use schema auto-detection when you load data into a partitioned table, use
the Cloud Console, the bq
command-line tool, or the API.
For more information on loading data, see the documentation for your source data's format and location:
For more information on loading data from Cloud Storage into a partitioned table, see:
For more information on loading data from a local source into a partitioned table, see Loading data from local files.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Browsing table data
You can browse partitioned table data by:
- Using the Cloud Console
- Using the
bq
command-line tool'sbq head
command - Calling the
tabledata.list
API method - Using the client libraries
Required permissions
At a minimum, to browse table and partition data, you must be granted
bigquery.tables.getData
permissions. The following predefined
IAM roles include bigquery.tables.getData
permissions:
bigquery.dataViewer
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that
user creates a dataset, they are granted bigquery.dataOwner
access to it.
bigquery.dataOwner
access gives the user the ability to browse data in tables
and partitions in the dataset.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Browsing partitioned table data
To browse partitioned table data:
Console
In the Cloud Console, in the navigation pane click a dataset to list its tables and views.
Click a partitioned table in the list.
Cick the Details tab.
Note the value in Number of Rows. You may need this value to control the starting point for your results using the API or
bq
command-line tool.Click the Preview tab. A sample set of data is displayed. Note that you cannot preview individual partitions by using the Cloud Console.
bq
Issue the bq head
command with the --max_rows
flag to list all fields in
a particular number of table rows. If --max_rows
is not specified, the
default is 100. Specify a partition to browse using the partition decorator,
for example, $20180224
.
Because the bq head
command does not create a query job, bq head
commands do not appear in your query history, and you are not charged for
them.
To browse a subset of fields in the table (including nested and repeated
fields), use the --selected_fields
flag and enter the fields as a comma-
separated list.
To specify the number of rows to skip before displaying table data, use the
--start_row=integer
flag (or the -s
shortcut). The
default value is 0
. You can retrieve the number of rows in a table by
using the bq show
command to retrieve table information.
If the table you're browsing is in a project other than your default
project, add the project ID to the command in the following format:
project_id:dataset.table
.
bq head \ --max_rows integer1 \ --start_row integer2 \ --selected_fields "fields" \ project_id:dataset.table$partition
Where:
- integer1 is the number of rows to display.
- integer2 is the number of rows to skip before displaying data.
- fields is a comma-separated list of fields.
- project_id is your project ID.
- dataset is the name of the dataset containing the table.
- table is the name of the table to browse.
- $partition is the partition decorator.
Examples:
Enter the following command to list all fields in the first 10 rows in
mydataset.mytable
in the "2018-02-24"
partition. mydataset
is in your
default project.
bq head --max_rows=10 'mydataset.mytable$20180224'
Enter the following command to list all fields in the first 100 rows in
mydataset.mytable
in the "2016-09-01"
partition. The output is written
in JSON format. This example specifies myotherproject
as the project,
instead of using the default project.
bq head --format=prettyjson 'myotherproject:mydataset.mytable$20160901'
Enter the following command to display only field1
and field2
in
mydataset.mytable
in the "2016-09-01"
partition. The command uses the
--start_row
flag to skip to row 100. mydataset.mytable
is in your default
project.
bq head \
--format=prettyjson \
--start_row 100 \
--selected_fields "field1,field2" \
'mydataset.mytable$20160901'
API
Browse through a table's data by calling tabledata.list
.
Specify the name of the table and partition decorator in the tableId
parameter.
Configure these optional parameters to control the output:
maxResults
— Maximum number of results to return.selectedFields
— Comma-separated list of fields to return. If unspecified, all fields are returned.startIndex
— Zero-based index of the starting row to read.
Values are returned wrapped in a JSON object that you must parse, as
described in the
tabledata.list
reference documentation.
Querying partitioned table data
After you load your data into BigQuery, you can query the data in your tables. BigQuery supports two types of queries:
- Interactive queries
- Batch queries
By default, BigQuery runs interactive queries, which means that the query is executed as soon as possible.
BigQuery also offers batch queries. BigQuery queues each batch query on your behalf and starts the query as soon as idle resources are available, usually within a few minutes.
You can run interactive and batch queries by using the:
- Compose new query option in the Cloud Console
bq
command-line tool'sbq query
command- BigQuery REST API to programmatically call the jobs.query or query-type jobs.insert methods
- BigQuery client libraries
For more information, see Querying partitioned tables.
Appending to and overwriting partitioned table data
You can overwrite partitioned table data using a load or query operation. You can append additional data to an existing partitioned table by performing a load-append operation or by appending query results. Operations in this section are supported for both daily- and hourly partitioned tables.
Required permissions
At a minimum, to overwrite an existing partition or to append data to an existing partition, you must be granted the following permissions:
bigquery.tables.create
bigquery.tables.updateData
bigquery.jobs.create
Additional permissions such as bigquery.tables.getData
may be required to
access the data you're using to append or overwrite.
The following predefined IAM roles include
bigquery.tables.updateData
and bigquery.tables.create
permissions:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
The following predefined IAM roles include bigquery.jobs.create
permissions:
bigquery.user
bigquery.jobUser
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that
user creates a dataset, they are granted bigquery.dataOwner
access to it.
bigquery.dataOwner
access gives the user the ability to append to and
overwrite tables and partitions in the dataset.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Using a load job
You can append to or overwrite partitions by using the bq load
command or by
calling the jobs.insert
method
and configuring a load
job. The Cloud Console does not support
appending to or overwriting a partition in a load job.
When append to or overwrite data in a specific partition by using a load job, note the following:
- When you load data from Cloud Storage, the bucket must be in the same location as the BigQuery dataset.
- The data you're loading must conform to the table's partitioning scheme. All rows written to the partition should have values that fall within the partition's date.
- Because partitions in a partitioned table share the table schema, replacing data in a partition will not replace the schema of the table. Instead, the schema of the new data must be compatible with the table schema. For information on updating the table's schema in the load job, see Managing table schemas.
- When you append data to an ingestion-time partitioned table, if you do not specify a partition decorator, the current partition is used.
To append to or overwrite partitioned table data using a load job, specify a destination table and partition decorator and set the write disposition flag to either:
Command-line option | API option | Description |
---|---|---|
--noreplace |
WRITE_APPEND | Appends the data to an existing partition. If no write disposition option is specified, the default action is to append the data to the partition. |
--replace |
WRITE_TRUNCATE | Overwrites (restates) a partition. |
A partition decorator represents a specific date and takes the form:
$YYYYMMDD
For example, the following command replaces the data in the entire partition
for the date January 1, 2016 (20160101
) in a partitioned table named
mydataset.table1
. The JSON data is loaded from a Cloud Storage bucket.
bq load \
--replace \
--source_format=NEWLINE_DELIMITED_JSON \
'mydataset.table1$20160101' \
gs://mybucket/myfile.json
Using a query job
You can append to or overwrite partitions by using the bq query
command or by
calling the jobs.insert
method
and configuring a query
job. The Cloud Console does not support
appending to or overwriting a partition in a query job.
When you use a query job to append to or overwrite a partition, note the following:
- The tables you're querying must be in the same location as the table you're appending or overwriting.
- When you append to or overwrite a partition in an ingestion-time partitioned table, you can use either legacy SQL or standard SQL syntax.
- When you append to or overwrite a partition in a partitioned table, your query must use standard SQL syntax. Currently, legacy SQL is not supported for querying partitioned tables or for writing query results to partitioned tables.
- When writing query results to a partition, the data that is being written to the partition must conform to the table's partitioning scheme. All rows written to the partition should have values that fall within the partition's date.
- When you append data to an ingestion-time partitioned table, if you do not specify a partition decorator, the current partition is used.
To append to or overwrite a partition using query results, specify a destination table with a partition decorator and set the write disposition to either:
Command-line option | API option | Description |
---|---|---|
--append_table |
WRITE_APPEND | Appends the query results to an existing partition. |
--replace |
WRITE_TRUNCATE | Overwrites (restates) a partition using the query results. |
For example, the following command restates the data for the March 1, 2016
(20160301
) partition of table1
using query results.
bq query \
--use_legacy_sql=false \
--replace \
--destination_table 'mydataset.table1$20160301' \
'SELECT
column1,
column2
FROM
mydataset.mytable'
If the destination table exists, and it is not partitioned, the following error
is returned: BigQuery error in query operation: Error processing
job 'project_id job_id' Incompatible table partitioning
specification. Expects partitioning specification interval (type:day), but input
partitioning specification is none`.
For more information on using query results to append to or overwrite data, see Writing query results.
Modifying partitioned table data using DML statements
You can modify data in a partitioned table using DML statements in the standard SQL dialect. DML statements allow you to perform bulk row updates, insertions, and deletions. For examples on using DML with partitioned tables, see Updating partitioned table data using DML statements.
The legacy SQL dialect does not support DML statements. To update or delete data using legacy SQL, you must delete the partitioned table and then recreate it with new data. Alternatively, you can write a query that modifies the data and write the query results to a new, partitioned table.
Copying partitioned table data
You can copy a partitioned table by:
- Using the Cloud Console
- Using the
bq
command-line tool'sbq cp
command - Calling the jobs.insert API method and configuring a copy job
- Using the client libraries
For more information on copying tables, see Copying a table.
You can copy one or more partitions by using the bq
command-line tool's bq cp
command, or by calling the
jobs.insert API method and
configuring a copy
job. Copying partitions is not currently supported by the
Cloud Console.
For more information on copying partitions, see Copying partitions.
Streaming data into partitioned tables
To stream data to a specific partition, use the partition decorator when
specifying the tableId
of the table to which you are streaming. For example,
the following command streams a single row to a partition for the date January
1, 2017 ($20170101
) into a partitioned table named mydataset.mytable
:
echo '{"a":1, "b":2}' | bq insert 'mydataset.mytable$20170101'
This command is used to demonstrate using the partition decorator. The
bq insert
command is meant for testing only. To stream data into
BigQuery, use the API's tabledata.insertAll
method. For more information on streaming data into partitions, see
Streaming into partitioned tables.
When streaming to a daily partitioned table using a partition decorator, you can stream to partitions within the last 30 days in the past and 5 days in the future relative to the current date, based on current UTC time. To write to partitions for dates outside these allowed bounds, you can use load or query jobs. For hourly partitioned tables, streaming using a partition decorator is not supported, so you must stream to the base table instead.
When you specify a time partitioned table as the destination table when you
stream data, each partition has a streaming buffer. The streaming buffer is
retained when you perform a load, query, or copy job that overwrites a partition
by setting the writeDisposition
property to WRITE_TRUNCATE
. If you want to
remove the streaming buffer, verify that the streaming buffer is empty by
calling tables.get
on the
partition. For details about the streaming buffer, see Comparing partitioning options.
Exporting table data
Exporting all data from a partitioned table is the same process as exporting
data from a non-partitioned table. For more information, see
Exporting table data. To export data from an
individual partition, append the partition decorator,
$date
, to the table name. For example:
mytable$20160201
.
You can also export data from the __NULL__
and __UNPARTITIONED__
partitions by appending the partition names to the table name. For example,
mytable$__NULL__
or mytable$__UNPARTITIONED__
.
You can export partitioned table data in CSV, JSON, or Avro format. Currently, you must export data into a Cloud Storage bucket. Exporting to your local machine is not supported; however, you can download and save query results using the Cloud Console.
Next steps
To learn more about working with partitioned tables, see:
- Creating and using ingestion-time partitioned tables
- Creating and using partitioned tables
- Managing partitioned tables
- Querying partitioned tables