You can also use Google Cloud Dataflow to read data from BigQuery. For more information about using Cloud Dataflow to read from, and write to, BigQuery, see BigQuery I/O in the Cloud Dataflow documentation.
- Access control
- Quota policy
- Configuration options
- Extract configuration example
- Exporting data into one or more files
Exporting data requires the following access levels.
Google Cloud Storage
The following limits apply for exporting data from BigQuery.
- Daily Limit: 1,000 exports per day, up to 10 TB
- Multiple Wildcard URI Limit: 500 URIs per export
You can configure two aspects of the exported data: the format, and the compression type.
- destinationFormat controls the format. BigQuery supports CSV, JSON and Avro format. Nested or repeated data cannot be exported to CSV, but can be exported to JSON or Avro format.
- compression controls the compression type. BigQuery supports GZIP compression or
BigQuery expresses Avro formatted data in the following ways:
- The resulting export files are Avro container files.
- Each BigQuery row is represented as an Avro Record. Nested data is represented by nested Record objects.
REQUIREDfields are represented as the corresponding Avro types. For example, a BigQuery
INTEGERtype maps to an Avro
NULLABLEfields are represented as an Avro Union of the corresponding type and "null".
REPEATEDfields are represented as Avro arrays.
TIMESTAMPdata types are represented as Avro
The Avro format can't be used in combination with GZIP compression.
Extract configuration example
The following code example shows the configuration of a job that exports data to a comma-separated values (CSV) file.
The destinationUris property indicates the location(s) and file name(s) where BigQuery should export your files to. Usually, you'll pass a single value so that BigQuery exports to a single file, but you can alternately pass one or more wildcard URIs. For more information on what to specify for the
destinationUris property, see exporting data into one or more files.
Exporting data into one or more files
The destinationUris property indicates the location(s) and file name(s) where BigQuery should export your files to.
BigQuery supports a single wildcard operator (*) in each URI. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern. The wildcard operator is replaced with a file number (starting at 0), left-padded to 12 digits. For example,
000000000001 for the second file.
The following table describes several possible options for the
|Single URI||Use a single URI if you want BigQuery to export your data to a single file. This option is the most common use case, as exported data is generally less than BigQuery's 1 GB per file maximum value.||
|Single wildcard URI||Use a single wildcard URI if you think your exported data will be larger than BigQuery's 1 GB per file maximum value. BigQuery shards your data into multiple files based on the provided pattern.||
gs://my-bucket/file-name-000000000000.json gs://my-bucket/file-name-000000000001.json gs://my-bucket/file-name-000000000002.json ...
|Multiple wildcard URIs||
Use multiple wildcard URIs if you want to partition the export output. You would use this option if you're running a parallel processing job with a service like Hadoop on Google Cloud Platform. Determine how many workers that are available to process the job, and create one URI per worker. BigQuery treats each URI location as a partition, and uses parallel processing to shard your data into multiple files in each location. You can use whatever pattern you'd like in your file name, assuming there is a single wildcard operator in each URI, each URI is unique, and the number of URIs does not exceed the quota policy.
When you pass more than one wildcard URI, BigQuery creates a special file at the end of each partition that indicates the "final" file in the set. This file name indicates how many shards BigQuery created.
For example, if your wildcard URI is
Note that a zero record file might contain more than 0 bytes depending on the data format, such as when exporting data in CSV format with a column header.
['gs://my-bucket/file-name-1-*.json', 'gs://my-bucket/file-name-2-*.json', 'gs://my-bucket/file-name-3-*.json']
This example assumes that BigQuery creates 80 sharded files in each partition.
gs://my-bucket/file-name-1-000000000000.json gs://my-bucket/file-name-1-000000000001.json ... gs://my-bucket/file-name-1-000000000080.json gs://my-bucket/file-name-2-000000000000.json gs://my-bucket/file-name-2-000000000001.json ... gs://my-bucket/file-name-2-000000000080.json gs://my-bucket/file-name-3-000000000000.json gs://my-bucket/file-name-3-000000000001.json ... gs://my-bucket/file-name-3-000000000080.json