Exporting Table Data

This page describes how to export data from BigQuery tables.

After you've loaded your data into BigQuery, you can export the data in several formats. BigQuery can export up to 1 GB of data to a single file. If you are exporting more than 1 GB of data, you must export your data to multiple files. When you export your data to multiple files, the size of the files will vary.

You can use a service such as Google Cloud Dataflow to read data from BigQuery instead of manually exporting it. For more information about using Cloud Dataflow to read from and write to BigQuery, see BigQuery I/O in the Cloud Dataflow documentation.

Required permissions

  • Ensure that you have READER access to the dataset containing the data you want to export. Alternatively, you can use any predefined, project-level BigQuery IAM role that provides bigquery.tables.export permissions, such as bigquery.dataViewer, bigquery.dataOwner, bigquery.dataEditor, or bigquery.admin. When you assign an IAM role at the project level, the user or group has bigquery.tables.export permissions for every table in the project.
  • Ensure that you have WRITER permissions to a Google Cloud Storage bucket or that you have been assigned a predefined IAM role that gives you permissions to write objects to a bucket. You can export data from BigQuery only to a Cloud Storage bucket. For information on Cloud Storage IAM roles, see Cloud Storage IAM Roles.
  • Running an export job requires bigquery.jobs.create permissions. You can set bigquery.jobs.create permissions at the project level by granting one of the following predefined IAM roles:

    • bigquery.user
    • bigquery.jobUser
    • bigquery.admin

Export limitations

When you export data from BigQuery, note the following:

  • You cannot export data to a local file or to Google Drive, but you can save query results to a local file. The only supported export location is Google Cloud Storage.
  • You can export up to 1 GB of table data to a single file. If you are exporting more than 1 GB of data, use a wildcard to export the data into multiple files. When you export data to multiple files, the size of the files will vary.
  • You cannot export nested and repeated data in CSV format. Nested and repeated data is supported for Avro and JSON exports.
  • When you export data in JSON format, INT64 (integer) data types are encoded as JSON strings to preserve 64-bit precision when the data is read by other systems.
  • You cannot export data from multiple tables in a single export job.
  • When you export data from partitioned tables, you cannot export individual partitions.
  • You cannot choose a compression type other than GZIP when you export data using the classic BigQuery web UI.

Location considerations

When you choose a location for your data, consider the following:
  • Colocate your BigQuery dataset and your external data source.
    • When you query data in an external data source such as Cloud Storage, the data you're querying must be in the same location as your BigQuery dataset. For example, if your BigQuery dataset is in the EU multi-regional location, the Cloud Storage bucket containing the data you're querying must be in a multi-regional bucket in the EU. If your dataset is in the US multi-regional location, your Cloud Storage bucket must be in a multi-regional bucket in the US.
    • If your dataset is in a regional location, the Cloud Storage bucket containing the data you're querying must be in a regional bucket in the same location. For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
    • If your external dataset is in Cloud Bigtable, your dataset must be in the US or the EU multi-regional location. Your Cloud Bigtable data must be in one of the supported Cloud Bigtable locations.
    • Location considerations do not apply to Google Drive external data sources.
  • Colocate your Cloud Storage buckets for loading data.
    • If your BigQuery dataset is in a multi-regional location, the Cloud Storage bucket containing the data you're loading must be in a regional or multi-regional bucket in the same location. For example, if your BigQuery dataset is in the EU, the Cloud Storage bucket must be in a regional or multi-regional bucket in the EU.
    • If your dataset is in a regional location, your Cloud Storage bucket must be a regional bucket in the same location. For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
    • Exception: If your dataset is in the US multi-regional location, you can load data from a Cloud Storage bucket in any regional or multi-regional location.
  • Colocate your Cloud Storage buckets for exporting data.
    • When you export data, the regional or multi-regional Cloud Storage bucket must be in the same location as the BigQuery dataset. For example, if your BigQuery dataset is in the EU multi-regional location, the Cloud Storage bucket containing the data you're exporting must be in a regional or multi-regional location in the EU.
    • If your dataset is in a regional location, your Cloud Storage bucket must be a regional bucket in the same location. For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
    • Exception: If your dataset is in the US multi-regional location, you can export data into a Cloud Storage bucket in any regional or multi-regional location.
  • Develop a data management plan.
For more information on Cloud Storage locations, see Bucket Locations in the Cloud Storage documentation.

Moving BigQuery data between locations

You cannot change the location of a dataset after it is created. Also, you cannot move a dataset from one location to another. If you need to move a dataset from one location to another, follow this process:

  1. Export the data from your BigQuery tables to a regional or multi-region Cloud Storage bucket in the same location as your dataset. For example, if your dataset is in the EU multi-region location, export your data into a regional or multi-region bucket in the EU.

    There are no charges for exporting data from BigQuery, but you do incur charges for storing the exported data in Cloud Storage. BigQuery exports are subject to the limits on export jobs.

  2. Copy or move the data from your Cloud Storage bucket to a regional or multi-region bucket in the new location. For example, if you are moving your data from the US multi-region location to the Tokyo regional location, you would transfer the data to a regional bucket in Tokyo. For information on transferring Cloud Storage objects, see Renaming, Copying, and Moving Objects in the Cloud Storage documentation.

    Note that transferring data between regions incurs network egress charges in Cloud Storage.

  3. After you transfer the data to a Cloud Storage bucket in the new location, create a new BigQuery dataset (in the new location). Then, load your data from the Cloud Storage bucket into BigQuery.

    You are not charged for loading the data into BigQuery, but you will incur charges for storing the data in Cloud Storage until you delete the data or the bucket. You are also charged for storing the data in BigQuery after it is loaded. Loading data into BigQuery is subject to the limits on load jobs.

For more information on using Cloud Storage to store and move large datasets, see Using Google Cloud Storage with Big Data.

Export formats and compression types

BigQuery supports the following data formats and compression types for exported data.

Data format Supported compression types Details
CSV GZIP

You can control the CSV delimiter in your exported data by using the --field_delimiter CLI flag or the configuration.extract.fieldDelimiter. extract job property.

Nested and repeated data is not supported.

JSON GZIP Nested and repeated data is supported.
Avro DEFLATE, SNAPPY

GZIP is not supported for Avro exports.

Nested and repeated data is supported.

Exporting data stored in BigQuery

You can export table data by using the BigQuery web UI, by using the bq extract CLI command, or by submitting an extract job via the API or Client Libraries.

Exporting table data

To export data from a BigQuery table:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. In the navigation, find and click the expansion icon next to your dataset to display its contents.

  3. Find and click the down arrow icon down arrow icon image next to the table that contains the data you're exporting.

  4. Select Export table to display the Export to Google Cloud Storage dialog.

  5. In the Export to Google Cloud Storage dialog:

    • For Export format, choose the format for your exported data: CSV, JSON (Newline Delimited), or Avro.
    • For Compression, accept the default value, None, or choose GZIP. The Avro format can't be used in combination with GZIP compression. To compress Avro data, use the bq command-line tool or the API and specify one of the supported compression types for Avro data: DEFLATE or SNAPPY.
    • In the Google Cloud Storage URI textbox, enter a valid URI in the format gs://[BUCKET_NAME]/[FILENAME.CSV], where [BUCKET_NAME] is your Cloud Storage bucket name, and [FILENAME] is the name of your destination file. The BigQuery dataset and the Cloud Storage bucket must be in the same location.
    • Click OK to export the table.

While the job is running, (extracting) appears next to the name of the table in the navigation. To check on the progress of the job, look near the top of the navigation for Job History for an Extract job.

Command-line


Use the bq extract command.

  • bq --location=[LOCATION] extract --destination_format [FORMAT] --compression [COMPRESSION_TYPE] --field_delimiter [DELIMITER] --print_header [BOOLEAN] [PROJECT_ID]:[DATASET].[TABLE] gs://[BUCKET]/[FILENAME]

Where:

  • [LOCATION] is the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • [FORMAT] is the format for the exported data: CSV, NEWLINE_DELIMITED_JSON, or AVRO.
  • [COMPRESSION_TYPE] is a supported compression type for your data format. CSV and NEWLINE_DELIMITED_JSON support GZIP. AVRO supports DEFLATE and SNAPPY.
  • [DELIMITER] is the character that indicates the boundary between columns in CSV exports. \t and tab are accepted names for tab.
  • [BOOLEAN] is true or false. When set to true, header rows are printed to the exported data if the data format supports headers. The default value is true.
  • [PROJECT_ID] is your project ID.
  • [DATASET] is the name of the source dataset.
  • [TABLE] is the table you're exporting.
  • [BUCKET] is the name of the Cloud Storage bucket to which you're exporting the data. The BigQuery dataset and the Cloud Storage bucket must be in the same location.
  • [FILENAME] is the name of the exported data file. You can export to multiple files using a wildcard.

Examples:

For example, the following command exports mydataset.mytable into a gzip compressed file named myfile.csv. myfile.csv is stored in a Cloud Storage bucket named example-bucket.

  • bq --location=US extract --compression GZIP 'mydataset.mytable' gs://example-bucket/myfile.csv

The default destination format is CSV. To export into JSON or Avro, use the destination_format flag and set it to either NEWLINE_DELIMITED_JSON or AVRO. For example:

  • bq --location=US extract --destination_format NEWLINE_DELIMITED_JSON 'mydataset.mytable' gs://example-bucket/myfile.json

The following command exports mydataset.mytable into an Avro file that is compressed using Snappy. The file is named myfile.avro. myfile.avro is exported to a Cloud Storage bucket named example-bucket.

  • bq --location=US extract --destination_format AVRO --compression SNAPPY 'mydataset.mytable' gs://example-bucket/myfile.avro

API


To export data, create a job and populate the configuration.extract object.

Specify your location in the location property in the jobReference section of the job resource.

  1. Create an extract job that points to the BigQuery source data and the Cloud Storage destination. For information about creating jobs, see Managing Jobs, Datasets, and Projects.

  2. Specify the source table by using the sourceTable configuration object, which comprises the project ID, dataset ID, and table ID.

  3. The destination URI(s) must be fully-qualified, in the format gs://[BUCKET_NAME]/[FILENAME.CSV]. Each URI can contain one '*' wildcard character and it must come after the bucket name.

  4. Specify the data format by setting the configuration.extract.destinationFormat property. For example, to export a JSON file, set this property to the value NEWLINE_DELIMITED_JSON.

  5. To check the job status, call jobs.get([JOB_ID]) with the ID of the job returned by the initial request.

    • If status.state = DONE, the job completed successfully.
    • If the status.errorResult property is present, the request failed, and that object will include information describing what went wrong.
    • If status.errorResult is absent, the job finished successfully, although there might have been some non-fatal errors. Non-fatal errors are listed in the returned job object's status.errors property.

API notes:

  • As a best practice, generate a unique ID and pass it as jobReference.jobId when calling jobs.insert() to create a job. This approach is more robust to network failure because the client can poll or retry on the known job ID.

  • Calling jobs.insert() on a given job ID is idempotent; in other words, you can retry as many times as you like on the same job ID, and at most one of those operations will succeed.

C#

Before trying this sample, follow the C# setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery C# API reference documentation .

public void ExportJsonToGcs(
    string datasetId, string tableId, string bucketName, string fileName,
    BigQueryClient client)
{
    StorageClient gcsClient = StorageClient.Create();
    string contentType = "application/json";
    // Get Table and append results into StringBuilder.
    PagedEnumerable<TableDataList, BigQueryRow> result = client.ListRows(datasetId, tableId);
    StringBuilder sb = new StringBuilder();
    foreach (var row in result)
    {
        sb.Append($"{{\"title\" : \"{row["title"]}\", \"unique_words\":\"{row["unique_words"]}\"}}{Environment.NewLine}");
    }
    // Save stream to Google Cloud Storage.
    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(sb.ToString())))
    {
        var obj = gcsClient.UploadObject(bucketName, fileName, contentType, stream);
    }
}

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .

srcProject := "bigquery-public-data"
srcDataset := "samples"
srcTable := "shakespeare"

// For example, gcsUri = "gs://mybucket/shakespeare.csv"
gcsRef := bigquery.NewGCSReference(gcsURI)
gcsRef.FieldDelimiter = ","

extractor := client.DatasetInProject(srcProject, srcDataset).Table(srcTable).ExtractorTo(gcsRef)
extractor.DisableHeader = true
// You can choose to run the job in a specific location for more complex data locality scenarios.
// Ex: In this example, source dataset and GCS bucket are in the US.
extractor.Location = "US"

job, err := extractor.Run(ctx)
if err != nil {
	return err
}
status, err := job.Wait(ctx)
if err != nil {
	return err
}
if err := status.Err(); err != nil {
	return err
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Java API reference documentation .

Job job = table.extract(format, gcsUrl);
// Wait for the job to complete
try {
  Job completedJob =
      job.waitFor(
          RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
          RetryOption.totalTimeout(Duration.ofMinutes(3)));
  if (completedJob != null && completedJob.getStatus().getError() == null) {
    // Job completed successfully
  } else {
    // Handle error case
  }
} catch (InterruptedException e) {
  // Handle interrupted wait
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Node.js API reference documentation .

// Imports the Google Cloud client libraries
const BigQuery = require('@google-cloud/bigquery');
const Storage = require('@google-cloud/storage');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = "your-project-id";
// const datasetId = "my_dataset";
// const tableId = "my_table";
// const bucketName = "my-bucket";
// const filename = "file.csv";

// Instantiates clients
const bigquery = new BigQuery({
  projectId: projectId,
});

const storage = new Storage({
  projectId: projectId,
});

// Exports data from the table into a Google Cloud Storage file
bigquery
  .dataset(datasetId)
  .table(tableId)
  .extract(storage.bucket(bucketName).file(filename))
  .then(results => {
    const job = results[0];

    // load() waits for the job to finish
    assert.equal(job.status.state, 'DONE');
    console.log(`Job ${job.id} completed.`);

    // Check the job's status for errors
    const errors = job.status.errors;
    if (errors && errors.length > 0) {
      throw errors;
    }
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

Before trying this sample, follow the PHP setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery PHP API reference documentation .

use Google\Cloud\BigQuery\BigQueryClient;
use Google\Cloud\Storage\StorageClient;
use Google\Cloud\Core\ExponentialBackoff;

/** Uncomment and populate these variables in your code */
// $projectId  = 'The Google project ID';
// $datasetId  = 'The BigQuery dataset ID';
// $tableId    = 'The BigQuery table ID';
// $bucketName = 'The Cloud Storage bucket Name';
// $objectName = 'The Cloud Storage object Name';
// $format     = 'The extract format, either "csv" or "json"';

$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->table($tableId);
// load the storage object
$storage = new StorageClient([
    'projectId' => $projectId,
]);
$destinationObject = $storage->bucket($bucketName)->object($objectName);
// create the extract job
$options = ['destinationFormat' => $format];
$extractConfig = $table->extract($destinationObject, $options);
$job = $table->runJob($extractConfig);
// poll the job until it is complete
$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
    print('Waiting for job to complete' . PHP_EOL);
    $job->reload();
    if (!$job->isComplete()) {
        throw new Exception('Job has not yet completed', 500);
    }
});
// check if the job has errors
if (isset($job->info()['status']['errorResult'])) {
    $error = $job->info()['status']['errorResult']['message'];
    printf('Error running job: %s' . PHP_EOL, $error);
} else {
    print('Data extracted successfully' . PHP_EOL);
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'
project = 'bigquery-public-data'
dataset_id = 'samples'
table_id = 'shakespeare'

destination_uri = 'gs://{}/{}'.format(bucket_name, 'shakespeare.csv')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location='US')  # API request
extract_job.result()  # Waits for job to complete.

print('Exported {}:{}.{} to {}'.format(
    project, dataset_id, table_id, destination_uri))

Ruby

Before trying this sample, follow the Ruby setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Ruby API reference documentation .

# project_id   = "Your Google Cloud project ID"
# dataset_id   = "ID of the dataset containing table"
# table_id     = "ID of the table to export file data from"
# storage_path = "Storage path to export to, eg. gs://bucket/file.csv"

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new project: project_id
dataset  = bigquery.dataset dataset_id
table    = dataset.table table_id

puts "Exporting data to Cloud Storage file: #{storage_path}"
extract_job = table.extract_job storage_path

puts "Waiting for extract job to complete: #{extract_job.job_id}"
extract_job.wait_until_done!

puts "Data exported"

Avro export details

BigQuery expresses Avro formatted data in the following ways:

  • The resulting export files are Avro container files.
  • Each BigQuery row is represented as an Avro Record. Nested data is represented by nested Record objects.
  • REQUIRED fields are represented as the corresponding Avro types. For example, a BigQuery INTEGER type maps to an Avro LONG type.
  • NULLABLE fields are represented as an Avro Union of the corresponding type and "null".
  • REPEATED fields are represented as Avro arrays.
  • TIMESTAMP data types are represented as Avro LONG types.

The Avro format can't be used in combination with GZIP compression. To compress Avro data, use the bq command-line tool or the API and specify one of the supported compression types for Avro data: DEFLATE or SNAPPY.

Exporting data into one or more files

The destinationUris property indicates the location(s) and file name(s) where BigQuery should export your files.

BigQuery supports a single wildcard operator (*) in each URI. The wildcard can appear anywhere in the URI except as part of the bucket name. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern. The wildcard operator is replaced with a number (starting at 0), left-padded to 12 digits. For example, a URI with a wildcard at the end of the file name would create files with000000000000 appended to the first file, 000000000001 appended to the second file, and so on.

The following table describes several possible options for the destinationUris property:

`destinationUris` options
Single URI

Use a single URI if you are exporting table data that is 1 GB or less. This option is the most common use case, as exported data is generally less than the 1 GB maximum value.

Property definition:

['gs://[YOUR_BUCKET]/file-name.json']

Creates:

gs://my-bucket/file-name.json
Single wildcard URI

Use a single wildcard URI if you think your exported data will be larger than the 1 GB maximum value. BigQuery shards your data into multiple files based on the provided pattern. The size of the exported files will vary.

If you use a wildcard in a URI component other than the file name, be sure the path component does not exist before exporting your data.

Property definition:

['gs://[YOUR_BUCKET]/file-name-*.json']

Creates:

gs://my-bucket/file-name-000000000000.json
gs://my-bucket/file-name-000000000001.json
gs://my-bucket/file-name-000000000002.json
...

Property definition:

['gs://[YOUR_BUCKET]/path-component-*/file-name.json']

Creates:

gs://my-bucket/path-component-000000000000/file-name.json
gs://my-bucket/path-component-000000000001/file-name.json
gs://my-bucket/path-component-000000000002/file-name.json
...
Multiple wildcard URIs

Use multiple wildcard URIs if you want to partition the export output. You would use this option if you're running a parallel processing job with a service like Cloud Dataproc. Determine how many workers are available to process the job, and create one URI per worker. BigQuery treats each URI location as a partition, and uses parallel processing to shard your data into multiple files in each location. You can use whatever pattern you'd like in your file name, assuming there is a single wildcard operator in each URI, each URI is unique, and the number of URIs does not exceed the quota policy.

When you pass more than one wildcard URI, BigQuery creates a special file at the end of each partition that indicates the "final" file in the set. This file name indicates how many shards BigQuery created.

For example, if your wildcard URI is gs://[YOUR_BUCKET]/file-name- <worker number>-*.json, and BigQuery creates 80 sharded files, the zero record file name is gs://my-bucket/file- name-<worker number>-000000000080.json. You can use this file name to determine that BigQuery created 80 sharded files (named 000000000000-000000000079).

Note that a zero record file might contain more than 0 bytes depending on the data format, such as when exporting data in CSV format with a column header.

String pattern:

gs://[YOUR_BUCKET]/file-name-<worker number>-*.json

Property definition:

['gs://my-bucket/file-name-1-*.json',
'gs://my-bucket/file-name-2-*.json',
'gs://my-bucket/file-name-3-*.json']

Creates:

This example assumes that BigQuery creates 80 sharded files in each partition.

gs://my-bucket/file-name-1-000000000000.json
gs://my-bucket/file-name-1-000000000001.json
...
gs://my-bucket/file-name-1-000000000080.json
gs://my-bucket/file-name-2-000000000000.json
gs://my-bucket/file-name-2-000000000001.json
...
gs://my-bucket/file-name-2-000000000080.json
gs://my-bucket/file-name-3-000000000000.json
gs://my-bucket/file-name-3-000000000001.json
...
gs://my-bucket/file-name-3-000000000080.json

Quota policy

For information on export job quotas, see Export jobs on the Quotas and Limits page.

Pricing

There is currently no charge for exporting data from BigQuery, but exports are subject to BigQuery's Quotas and Limits.

For more information on BigQuery pricing, see the Pricing page.

Once the data is exported, you are charged for storing the data in Google Cloud Storage.

For more information on Cloud Storage pricing, see Cloud Storage Pricing.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Need help? Visit our support page.