This page describes how to export or extract data from BigQuery tables.
After you've loaded your data into BigQuery, you can export the data in several formats. BigQuery can export up to 1 GB of data to a single file. If you are exporting more than 1 GB of data, you must export your data to multiple files. When you export your data to multiple files, the size of the files will vary.
You can use a service such as Dataflow to read data from BigQuery instead of manually exporting it. For more information about using Dataflow to read from and write to BigQuery, see BigQuery I/O in the Apache Beam documentation.
You can also export the results of a query by using the
EXPORT DATA
statement.
Required permissions
To export data to Cloud Storage, you need permissions to access the BigQuery table that contains the data, permissions to run an export job, and permissions to write the data to the Cloud Storage bucket.
BigQuery permissions
At a minimum, to export data, you must be granted
bigquery.tables.export
permissions. The following predefined IAM roles are grantedbigquery.tables.export
permissions:bigquery.dataViewer
bigquery.dataOwner
bigquery.dataEditor
bigquery.admin
At a minimum, to run an export job, you must be granted
bigquery.jobs.create
permissions. The following predefined IAM roles are grantedbigquery.jobs.create
permissions:bigquery.user
bigquery.jobUser
bigquery.admin
Cloud Storage permissions
To write the data to an existing Cloud Storage bucket, you must be granted
storage.objects.create
andstorage.objects.delete
permissions. The following predefined IAM roles grant both permissions:storage.objectAdmin
storage.admin
Export limitations
When you export data from BigQuery, note the following:
- You cannot export table data to a local file, to Sheets, or to Drive. The only supported export location is Cloud Storage. For information on saving query results, see Downloading and saving query results.
- You can export up to 1 GB of table data to a single file. If you are exporting more than 1 GB of data, use a wildcard to export the data into multiple files. When you export data to multiple files, the size of the files will vary.
- You cannot export nested and repeated data in CSV format. Nested and repeated data is supported for Avro and JSON exports.
- When you export data in JSON format, INT64 (integer) data types are encoded as JSON strings to preserve 64-bit precision when the data is read by other systems.
- You cannot export data from multiple tables in a single export job.
- You cannot choose a compression type other than
GZIP
when you export data using the Cloud Console. - When you export data to a Cloud Storage bucket configured with a retention policy, BigQuery might fail to write the files to the bucket. Consider relaxing the retention policy for the duration of the export jobs.
Location considerations
When you choose a location for your data, consider the following:
- Colocate your Cloud Storage buckets for exporting data.
- When you export data, the regional or multi-regional Cloud Storage bucket must be in the same location as the BigQuery dataset. For example, if your BigQuery dataset is in the EU multi-regional location, the Cloud Storage bucket containing the data you're exporting must be in a regional or multi-regional location in the EU.
- If your dataset is in a regional location, your Cloud Storage bucket must be a regional bucket in the same location. For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
- Exception: If your dataset is in the US multi-regional location, you can export data into a Cloud Storage bucket in any regional or multi-regional location.
- Develop a data management plan.
- If you choose a regional storage resource such as a BigQuery dataset or a Cloud Storage bucket, develop a plan for geographically managing your data.
For more information on Cloud Storage locations, see Bucket Locations in the Cloud Storage documentation.
Moving BigQuery data between locations
You cannot change the location of a dataset after it is created, but you can make a copy of the dataset. You cannot move a dataset from one location to another, but you can manually move (recreate) a dataset.
Copying datasets
To see steps for copying a dataset, including across regions, see Copying datasets.
Moving a dataset
To manually move a dataset from one location to another, follow this process:
-
Export the data from your BigQuery tables to a regional or multi-region Cloud Storage bucket in the same location as your dataset. For example, if your dataset is in the EU multi-region location, export your data into a regional or multi-region bucket in the EU.
There are no charges for exporting data from BigQuery, but you do incur charges for storing the exported data in Cloud Storage. BigQuery exports are subject to the limits on export jobs.
-
Copy or move the data from your Cloud Storage bucket to a regional or multi-region bucket in the new location. For example, if you are moving your data from the US multi-region location to the Tokyo regional location, you would transfer the data to a regional bucket in Tokyo. For information on transferring Cloud Storage objects, see Copying, renaming, and moving objects in the Cloud Storage documentation.
Note that transferring data between regions incurs network egress charges in Cloud Storage.
-
After you transfer the data to a Cloud Storage bucket in the new location, create a new BigQuery dataset (in the new location). Then, load your data from the Cloud Storage bucket into BigQuery.
You are not charged for loading the data into BigQuery, but you will incur charges for storing the data in Cloud Storage until you delete the data or the bucket. You are also charged for storing the data in BigQuery after it is loaded. Loading data into BigQuery is subject to the limits on load jobs.
You can also use Cloud Composer to move and copy large datasets programmatically.
For more information on using Cloud Storage to store and move large datasets, see Using Cloud Storage with big data.
Export formats and compression types
BigQuery supports the following data formats and compression types for exported data.
Data format | Supported compression types | Details |
---|---|---|
CSV | GZIP | You can control the CSV delimiter in your exported data by using
the Nested and repeated data is not supported. |
JSON | GZIP | Nested and repeated data is supported. |
Avro | DEFLATE, SNAPPY | GZIP is not supported for Avro exports. Nested and repeated data is supported. |
Exporting data stored in BigQuery
You can export table data by:
- Using the Cloud Console
- Using the
bq extract
command in thebq
command-line tool - Submitting an
extract
job via the API or client libraries
Exporting table data
To export data from a BigQuery table:
Console
Open the BigQuery page in the Cloud Console.
In the Explorer panel, expand your project and dataset, then select the table.
In the details panel, click Export and select Export to Cloud Storage.
In the Export table to Google Cloud Storage dialog:
- For Select Google Cloud Storage location, browse for the bucket, folder, or file where you want to export the data.
- For Export format, choose the format for your exported data: CSV, JSON (Newline Delimited), or Avro.
- For Compression, accept the default value,
None
, or chooseGZIP
. The Avro format can't be used in combination with GZIP compression. To compress Avro data, use thebq
command-line tool or the API and specify one of the supported compression types for Avro data:DEFLATE
orSNAPPY
. - Click Export to export the table.
To check on the progress of the job, look near the top of the navigation for Job history for an Export job.
bq
Use the bq extract
command with the --destination_format
flag.
(Optional) Supply the --location
flag and set the value to your
location.
Other optional flags include:
--compression
: The compression type to use for exported files.--field_delimiter
: The character that indicates the boundary between columns in the output file for CSV exports. Both\t
andtab
are allowed for tab delimiters.--print_header
: When specified, print header rows for formats that have headers such as CSV.
bq --location=location extract \ --destination_format format \ --compression compression_type \ --field_delimiter delimiter \ --print_header=boolean \ project_id:dataset.table \ gs://bucket/filename.ext
Where:
- location is the name of your location. The
--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value toasia-northeast1
. You can set a default value for the location using the .bigqueryrc file. - format is the format for the exported data:
CSV
,NEWLINE_DELIMITED_JSON
, orAVRO
. - compression_type is a supported compression type for your data
format.
CSV
andNEWLINE_DELIMITED_JSON
supportGZIP
.AVRO
supportsDEFLATE
andSNAPPY
. - delimiter is the character that indicates the boundary between
columns in CSV exports.
\t
andtab
are accepted names for tab. - boolean is
true
orfalse
. When set totrue
, header rows are printed to the exported data if the data format supports headers. The default value istrue
. - project_id is your project ID.
- dataset is the name of the source dataset.
- table is the table you're exporting.
- bucket is the name of the Cloud Storage bucket to which you're exporting the data. The BigQuery dataset and the Cloud Storage bucket must be in the same location.
- filename.ext is the name and extension of the exported data file. You can export to multiple files using a wildcard.
Examples:
For example, the following command exports mydataset.mytable
into a gzip
compressed file named myfile.csv
. myfile.csv
is stored in a
Cloud Storage bucket named example-bucket
.
bq extract \ --compression GZIP \ 'mydataset.mytable' \ gs://example-bucket/myfile.csv
The default destination format is CSV. To export into JSON or Avro, use the
destination_format
flag and set it to either NEWLINE_DELIMITED_JSON
or AVRO
. For example:
bq extract \ --destination_format NEWLINE_DELIMITED_JSON \ 'mydataset.mytable' \ gs://example-bucket/myfile.json
The following command exports mydataset.mytable
into an Avro file that is
compressed using Snappy. The file is named myfile.avro
. myfile.avro
is
exported to a Cloud Storage bucket named example-bucket
.
bq extract \ --destination_format AVRO \ --compression SNAPPY \ 'mydataset.mytable' \ gs://example-bucket/myfile.avro
API
To export data, create an extract
job and populate the job configuration.
(Optional) Specify your location in the location
property in the
jobReference
section of the job resource.
Create an extract job that points to the BigQuery source data and the Cloud Storage destination.
Specify the source table by using the
sourceTable
configuration object that contains the project ID, dataset ID, and table ID.The
destination URI(s)
property must be fully-qualified, in the formatgs://bucket/filename.ext
. Each URI can contain one '*' wildcard character and it must come after the bucket name.Specify the data format by setting the
configuration.extract.destinationFormat
property. For example, to export a JSON file, set this property to the valueNEWLINE_DELIMITED_JSON
.To check the job status, call jobs.get(job_id) with the ID of the job returned by the initial request.
- If
status.state = DONE
, the job completed successfully. - If the
status.errorResult
property is present, the request failed, and that object will include information describing what went wrong. - If
status.errorResult
is absent, the job finished successfully, although there might have been some non-fatal errors. Non-fatal errors are listed in the returned job object'sstatus.errors
property.
- If
API notes:
As a best practice, generate a unique ID and pass it as
jobReference.jobId
when callingjobs.insert
to create a job. This approach is more robust to network failure because the client can poll or retry on the known job ID.Calling
jobs.insert
on a given job ID is idempotent; in other words, you can retry as many times as you like on the same job ID, and at most one of those operations will succeed.
C#
Before trying this sample, follow the C# setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery C# API reference documentation.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.
PHP
Before trying this sample, follow the PHP setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery PHP API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
Ruby
Before trying this sample, follow the Ruby setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Ruby API reference documentation.
Avro export details
BigQuery expresses Avro formatted data in the following ways:
- The resulting export files are Avro container files.
- Each BigQuery row is represented as an Avro record. Nested data is represented by nested record objects.
REQUIRED
fields are represented as the corresponding Avro types. For example, a BigQueryINTEGER
type maps to an AvroLONG
type.NULLABLE
fields are represented as an Avro union of the corresponding type and "null".REPEATED
fields are represented as Avro arrays.TIMESTAMP
data types are represented as Avrotimestamp-micros
logical types.DATE
data types are represented as AvroINT
types by default, or Avrodate
logical types if the flag--use_avro_logical_types
is specified.TIME
data types are represented as AvroLONG
types by default, or Avrotime-micros
logical types if the flag--use_avro_logical_types
is specified.DATETIME
data types are represented as AvroSTRING
types. The encoding follows the Internet Engineering Task Force RFC 3339 spec.
The Avro format can't be used in combination with GZIP compression. To compress
Avro data, use the bq
command-line tool or the API and specify one of the
supported compression types for Avro data: DEFLATE
or SNAPPY
.
Exporting data into one or more files
The destinationUris
property indicates the one or more locations and filenames where
BigQuery should export your files.
BigQuery supports a single wildcard operator (*) in each URI. The
wildcard can appear anywhere in the URI except as part of the bucket name. Using
the wildcard operator instructs BigQuery to create multiple
sharded files based on the supplied pattern. The wildcard operator is replaced
with a number (starting at 0), left-padded to 12 digits. For example, a URI with
a wildcard at the end of the filename would create files with000000000000
appended to the first file, 000000000001
appended to the second file, and so
on.
The following table describes several possible options for the destinationUris
property:
`destinationUris` options | |
---|---|
Single URI |
Use a single URI if you are exporting table data that is 1 GB or less. This option is the most common use case, as exported data is generally less than the 1 GB maximum value. Property definition:
Creates: gs://my-bucket/file-name.json |
Single wildcard URI |
Use a single wildcard URI if you think your exported data will be larger than the 1 GB maximum value. BigQuery shards your data into multiple files based on the provided pattern. The size of the exported files will vary. If you use a wildcard in a URI component other than the filename, be sure the path component does not exist before exporting your data. Property definition:
Creates: gs://my-bucket/file-name-000000000000.json gs://my-bucket/file-name-000000000001.json gs://my-bucket/file-name-000000000002.json ... |
Multiple wildcard URIs |
Use multiple wildcard URIs if you want to partition the export output. You would use this option if you're running a parallel processing job with a service like Dataproc. Determine how many workers are available to process the job, and create one URI per worker. BigQuery treats each URI location as a partition, and uses parallel processing to shard your data into multiple files in each location. You can use whatever pattern you'd like in your filename, assuming there is a single wildcard operator in each URI, each URI is unique, and the number of URIs does not exceed the quota policy. When you pass more than one wildcard URI, BigQuery creates a special file at the end of each partition that indicates the final file in the set. This filename indicates how many shards BigQuery created. For example, if your wildcard URI is Note that a zero record file might contain more than 0 bytes depending on the data format, such as when exporting data in CSV format with a column header. String pattern:
Property definition: ['gs://my-bucket/file-name-1-*.json', 'gs://my-bucket/file-name-2-*.json', 'gs://my-bucket/file-name-3-*.json'] Creates: This example assumes that BigQuery creates 80 sharded files in each partition. gs://my-bucket/file-name-1-000000000000.json gs://my-bucket/file-name-1-000000000001.json ... gs://my-bucket/file-name-1-000000000080.json gs://my-bucket/file-name-2-000000000000.json gs://my-bucket/file-name-2-000000000001.json ... gs://my-bucket/file-name-2-000000000080.json gs://my-bucket/file-name-3-000000000000.json gs://my-bucket/file-name-3-000000000001.json ... gs://my-bucket/file-name-3-000000000080.json |
Extract compressed table
Go
Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Node.js API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries. For more information, see the BigQuery Python API reference documentation.
Quota policy
For information on export job quotas, see Export jobs on the Quotas and limits page.
Pricing
There is currently no charge for exporting data from BigQuery, but exports are subject to BigQuery's Quotas and limits. For more information on BigQuery pricing, see the Pricing page.
Once the data is exported, you are charged for storing the data in Cloud Storage. For more information on Cloud Storage pricing, see the Cloud Storage Pricing page.
What's next
- To learn more about the Cloud Console, see Using the Cloud Console.
- To learn more about the
bq
command-line tool, see Using thebq
command-line tool. - To learn how to create an application using the BigQuery API client libraries, see Client library quickstart.