Hide
BigQuery

Exporting Data From BigQuery

Once you've loaded your data into BigQuery, you can export the data in several formats. BigQuery can export up to 1 GB of data per file, and supports exporting to multiple files.

To export data, create a job and populate the configuration.extract object.

Contents

Access control

Exporting data requires the following access levels.

Product

Access

BigQuery

Dataset-level READER access. For more information, see access control.

Google Cloud Storage

WRITE access for the Google Cloud Storage buckets. For more information, see Access Control - Google Cloud Storage.

Back to top

Quota policy

The following limits apply for exporting data from BigQuery.

  • Daily Limit: 1,000 exports per day, up to 10 TB
  • Multiple Wildcard URI Limit: 500 URIs per export

Back to top

Configuration options

You can configure two aspects of the exported data: the format, and the compression type.

  • destinationFormat controls the format. BigQuery supports CSV, JSON and Avro format. The Avro format can't be used in combination with GZIP compression.
  • compression controls the compression type. BigQuery supports GZIP compression or NONE.

Avro format

BigQuery expresses Avro formatted data in the following ways:

  • The resulting export files are Avro container files.
  • Each BigQuery row is represented as an Avro Record. Nested data is represented by nested Record objects.
  • REQUIRED fields are represented as the corresponding Avro types. For example, a BigQuery INTEGER type maps to an Avro LONG type.
  • NULLABLE fields are represented as an Avro Union of the corresponding type and "null".
  • REPEATED fields are represented as Avro arrays.
  • TIMESTAMP data types are represented as Avro LONG types.

Back to top

Extract configuration example

The following code example shows the configuration of a job that exports data to single newline-delimited JSON file.

The destinationUris property indicates the location(s) and file name(s) where BigQuery should export your files to. Usually, you'll pass a single value so that BigQuery exports to a single file, but you can alternately pass one or more wildcard URIs. For more information on what to specify for the destinationUris property, see exporting data into one or more files.

Back to top

Exporting data into one or more files

The destinationUris property indicates the location(s) and file name(s) where BigQuery should export your files to.

BigQuery supports a single wildcard operator (*) in each URI. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern. The wildcard operator is replaced with a file number (starting at 0), left-padded to 12 digits. For example, 000000000001 for the second file.

The following table describes several possible options for the destinationUris property:

Option Description Example
Single URI Use a single URI if you want BigQuery to export your data to a single file. This option is the most common use case, as exported data is generally less than BigQuery's 1 GB per file maximum value.

Property definition:

['gs://my-bucket/file-name.json']

Creates:

gs://my-bucket/file-name.json
Single wildcard URI Use a single wildcard URI if you think your exported data will be larger than BigQuery's 1 GB per file maximum value. BigQuery shards your data into multiple files based on the provided pattern.

Property definition:

['gs://my-bucket/file-name-*.json']

Creates:

gs://my-bucket/file-name-000000000000.json
gs://my-bucket/file-name-000000000001.json
gs://my-bucket/file-name-000000000002.json
...
Multiple wildcard URIs

Use multiple wildcard URIs if you want to partition the export output. You would use this option if you're running a parallel processing job with a service like Hadoop on Google Cloud Platform. Determine how many workers that are available to process the job, and create one URI per worker. BigQuery treats each URI location as a partition, and uses parallel processing to shard your data into multiple files in each location. You can use whatever pattern you'd like in your file name, assuming there is a single wildcard operator in each URI, each URI is unique, and the number of URIs does not exceed the quota policy.

When you pass more than one wildcard URI, BigQuery creates a special file at the end of each partition that indicates the "final" file in the set. This file name indicates how many shards BigQuery created.

For example, if your wildcard URI is gs://my-bucket/file-name-<worker number>-*.json, and BigQuery creates 80 sharded files, the zero record file name is gs://my-bucket/file-name-<worker number>-000000000080.json. You can use this file name to determine that BigQuery created 80 sharded files (named 000000000000-000000000079).

Note that a zero record file might contain more than 0 bytes depending on the data format, such as when exporting data in CSV format with a column header.

String pattern:

gs://my-bucket/file-name-<worker number>-*.json

Property definition:

['gs://my-bucket/file-name-1-*.json',
'gs://my-bucket/file-name-2-*.json', 
'gs://my-bucket/file-name-3-*.json']

Creates:

This example assumes that BigQuery creates 80 sharded files in each partition.

gs://my-bucket/file-name-1-000000000000.json
gs://my-bucket/file-name-1-000000000001.json
...
gs://my-bucket/file-name-1-000000000080.json
gs://my-bucket/file-name-2-000000000000.json
gs://my-bucket/file-name-2-000000000001.json
...
gs://my-bucket/file-name-2-000000000080.json
gs://my-bucket/file-name-3-000000000000.json
gs://my-bucket/file-name-3-000000000001.json
...
gs://my-bucket/file-name-3-000000000080.json

Back to top

Python code example

The following code example shows how to export data in CSV format, in Python.

Back to top