Export table data to Cloud Storage

This page describes how to export or extract data from BigQuery tables to Cloud Storage.

After you've loaded your data into BigQuery, you can export the data in several formats. BigQuery can export up to 1 GB of data to a single file. If you are exporting more than 1 GB of data, you must export your data to multiple files. When you export your data to multiple files, the size of the files will vary.

You can use a service such as Dataflow to read data from BigQuery instead of manually exporting it. For more information about using Dataflow to read from and write to BigQuery, see BigQuery I/O in the Apache Beam documentation.

You can also export the results of a query by using the EXPORT DATA statement. You can use EXPORT DATA OPTIONS to export views to {storage_name}.

Export limitations

When you export data from BigQuery, note the following:

  • You cannot export table data to a local file, to Google Sheets, or to Google Drive. The only supported export location is Cloud Storage. For information on saving query results, see Downloading and saving query results.
  • You can export up to 1 GB of table data to a single file. If you are exporting more than 1 GB of data, use a wildcard to export the data into multiple files. When you export data to multiple files, the size of the files will vary. To limit the exported file size, you can partition your data and export each partition.
  • The generated file size when using the EXPORT DATA statement is not guaranteed.
  • The number of files generated by an export job can vary.
  • You cannot export nested and repeated data in CSV format. Nested and repeated data are supported for Avro, JSON, and Parquet exports.
  • When you export data in JSON format, INT64 (integer) data types are encoded as JSON strings to preserve 64-bit precision when the data is read by other systems.
  • You cannot export data from multiple tables in a single export job.
  • You cannot choose a compression type other than GZIP when you export data using the Google Cloud console.
  • When you export data to a Cloud Storage bucket configured with a retention policy, BigQuery might fail to write the files to the bucket. Configure the retention period to be longer than the duration of the export jobs.
  • When you export a table in JSON format, the symbols <, >, and & are converted by using the unicode notation \uNNNN, where N is a hexadecimal digit. For example, profit&loss becomes profit\u0026loss. This unicode conversion is done to avoid security vulnerabilities.
  • The order of exported table data is not guaranteed unless you use the EXPORT DATA statement and specify an ORDER BY clause in the query_statement.
  • BigQuery does not support Cloud Storage resource paths that include multiple consecutive slashes after the initial double slash. Cloud Storage object names can contain multiple consecutive slash ("/") characters. However, BigQuery converts multiple consecutive slashes into a single slash. For example, the following resource path, though valid in Cloud Storage, does not work in BigQuery: gs://bucket/my//object//name.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Required permissions

To perform the tasks in this document, you need the following permissions.

Permissions to export data from a BigQuery table

To export data from a BigQuery table, you need the bigquery.tables.export IAM permission.

Each of the following predefined IAM roles includes the bigquery.tables.export permission:

  • roles/bigquery.dataViewer
  • roles/bigquery.dataOwner
  • roles/bigquery.dataEditor
  • roles/bigquery.admin

Permissions to run an export job

To run an export job, you need the bigquery.jobs.create IAM permission.

Each of the following predefined IAM roles includes the permissions that you need in order to run an export job:

  • roles/bigquery.user
  • roles/bigquery.jobUser
  • roles/bigquery.admin

Permissions to write the data to the Cloud Storage bucket

To write the data to an existing Cloud Storage bucket, you need the following IAM permissions:

  • storage.objects.create
  • storage.objects.delete

Each of the following predefined IAM roles includes the permissions that you need in order to write the data to an existing Cloud Storage bucket:

  • roles/storage.objectAdmin
  • roles/storage.admin

For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Location considerations

Colocate your Cloud Storage buckets for exporting data:
  • If your BigQuery dataset is in the EU multi-region, the Cloud Storage bucket containing the data that you export must be in the same multi-region or in a location that is contained within the multi-region. For example, if your BigQuery dataset is in the EU multi-region, the Cloud Storage bucket can be located in the europe-west1 Belgium region, which is within the EU.

    If your dataset is in the US multi-region, you can export data into a Cloud Storage bucket in any location.

  • If your dataset is in a region, your Cloud Storage bucket must be in the same region. For example, if your dataset is in the asia-northeast1 Tokyo region, your Cloud Storage bucket cannot be in the ASIA multi-region.
Develop a data management plan:

For more information about Cloud Storage locations, see Bucket Locations in the Cloud Storage documentation.

Move BigQuery data between locations

You cannot change the location of a dataset after it is created, but you can make a copy of the dataset. You cannot move a dataset from one location to another, but you can manually move (recreate) a dataset.

Export formats and compression types

BigQuery supports the following data formats and compression types for exported data.

Data format Supported compression types Details
CSV GZIP

You can control the CSV delimiter in your exported data by using the --field_delimiter bq command-line tool flag or the configuration.extract.fieldDelimiter extract job property.

Nested and repeated data is not supported.

JSON GZIP Nested and repeated data are supported.
Avro DEFLATE, SNAPPY

GZIP is not supported for Avro exports.

Nested and repeated data are supported. See Avro export details.

Parquet SNAPPY, GZIP, ZSTD

Nested and repeated data are supported. See Parquet export details.

Export data

You can export table data by:

  • Using the Google Cloud console
  • Using the bq extract command in the bq command-line tool
  • Submitting an extract job via the API or client libraries

Export table data

To export data from a BigQuery table:

Console

  1. Open the BigQuery page in the Google Cloud console.

    Go to the BigQuery page

  2. In the Explorer panel, expand your project and dataset, then select the table.

  3. In the details panel, click Export and select Export to Cloud Storage.

  4. In the Export table to Google Cloud Storage dialog:

    • For Select Google Cloud Storage location, browse for the bucket, folder, or file where you want to export the data.
    • For Export format, choose the format for your exported data: CSV, JSON (Newline Delimited), Avro, or Parquet.
    • For Compression, select a compression format or select None for no compression.
    • Click Export to export the table.

To check on the progress of the job, look near the top of the navigation for Job history for an Export job.

To export views to Cloud Storage, use EXPORT DATA OPTIONS statement.

SQL

Use the EXPORT DATA statement. The following example exports selected fields from a table named mydataset.table1:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    EXPORT DATA
      OPTIONS (
        uri = 'gs://bucket/folder/*.csv',
        format = 'CSV',
        overwrite = true,
        header = true,
        field_delimiter = ';')
    AS (
      SELECT field1, field2
      FROM mydataset.table1
      ORDER BY field1
    );
    

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

bq

Use the bq extract command with the --destination_format flag.

(Optional) Supply the --location flag and set the value to your location.

Other optional flags include:

  • --compression: The compression type to use for exported files.
  • --field_delimiter: The character that indicates the boundary between columns in the output file for CSV exports. Both \t and tab are allowed for tab delimiters.
  • --print_header: When specified, print header rows for formats that have headers such as CSV.
bq extract --location=location \
--destination_format format \
--compression compression_type \
--field_delimiter delimiter \
--print_header=boolean \
project_id:dataset.table \
gs://bucket/filename.ext

Where:

  • location is the name of your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1. You can set a default value for the location using the .bigqueryrc file.
  • format is the format for the exported data: CSV, NEWLINE_DELIMITED_JSON, AVRO, or PARQUET.
  • compression_type is a supported compression type for your data format. See Export formats and compression types.
  • delimiter is the character that indicates the boundary between columns in CSV exports. \t and tab are accepted names for tab.
  • boolean is true or false. When set to true, header rows are printed to the exported data if the data format supports headers. The default value is true.
  • project_id is your project ID.
  • dataset is the name of the source dataset.
  • table is the table you're exporting. If you use a partition decorator, then you must surround the table path with single quotation marks or escape the $ character.
  • bucket is the name of the Cloud Storage bucket to which you're exporting the data. The BigQuery dataset and the Cloud Storage bucket must be in the same location.
  • filename.ext is the name and extension of the exported data file. You can export to multiple files using a wildcard.

Examples:

For example, the following command exports mydataset.mytable into a gzip compressed file named myfile.csv. myfile.csv is stored in a Cloud Storage bucket named example-bucket.

bq extract \
--compression GZIP \
'mydataset.mytable' \
gs://example-bucket/myfile.csv

The default destination format is CSV. To export into JSON or Avro, use the destination_format flag and set it to either NEWLINE_DELIMITED_JSON or AVRO. For example:

bq extract \
--destination_format NEWLINE_DELIMITED_JSON \
'mydataset.mytable' \
gs://example-bucket/myfile.json

The following command exports mydataset.mytable into an Avro file that is compressed using Snappy. The file is named myfile.avro. myfile.avro is exported to a Cloud Storage bucket named example-bucket.

bq extract \
--destination_format AVRO \
--compression SNAPPY \
'mydataset.mytable' \
gs://example-bucket/myfile.avro

The following command exports a single partition of mydataset.my_partitioned_table into a CSV file in Cloud Storage:

bq extract \
--destination_format CSV \
'mydataset.my_partitioned_table$0' \
gs://example-bucket/single_partition.csv

API

To export data, create an extract job and populate the job configuration.

(Optional) Specify your location in the location property in the jobReference section of the job resource.

  1. Create an extract job that points to the BigQuery source data and the Cloud Storage destination.

  2. Specify the source table by using the sourceTable configuration object that contains the project ID, dataset ID, and table ID.

  3. The destination URI(s) property must be fully-qualified, in the format gs://bucket/filename.ext. Each URI can contain one '*' wildcard character and it must come after the bucket name.

  4. Specify the data format by setting the configuration.extract.destinationFormat property. For example, to export a JSON file, set this property to the value NEWLINE_DELIMITED_JSON.

  5. To check the job status, call jobs.get(job_id) with the ID of the job returned by the initial request.

    • If status.state = DONE, the job completed successfully.
    • If the status.errorResult property is present, the request failed, and that object will include information describing what went wrong.
    • If status.errorResult is absent, the job finished successfully, although there might have been some non-fatal errors. Non-fatal errors are listed in the returned job object's status.errors property.

API notes:

  • As a best practice, generate a unique ID and pass it as jobReference.jobId when calling jobs.insert to create a job. This approach is more robust to network failure because the client can poll or retry on the known job ID.

  • Calling jobs.insert on a given job ID is idempotent; in other words, you can retry as many times as you like on the same job ID, and at most one of those operations will succeed.

C#

Before trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery C# API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.


using Google.Cloud.BigQuery.V2;
using System;

public class BigQueryExtractTable
{
    public void ExtractTable(
        string projectId = "your-project-id",
        string bucketName = "your-bucket-name")
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        // Define a destination URI. Use a single wildcard URI if you think
        // your exported data will be larger than the 1 GB maximum value.
        string destinationUri = $"gs://{bucketName}/shakespeare-*.csv";
        BigQueryJob job = client.CreateExtractJob(
            projectId: "bigquery-public-data",
            datasetId: "samples",
            tableId: "shakespeare",
            destinationUri: destinationUri
        );
        job = job.PollUntilCompleted().ThrowOnAnyError();  // Waits for the job to complete.
        Console.Write($"Exported table to {destinationUri}.");
    }
}

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import (
	"context"
	"fmt"

	"cloud.google.com/go/bigquery"
)

// exportTableAsCompressedCSV demonstrates using an export job to
// write the contents of a table into Cloud Storage as CSV.
func exportTableAsCSV(projectID, gcsURI string) error {
	// projectID := "my-project-id"
	// gcsUri := "gs://mybucket/shakespeare.csv"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	srcProject := "bigquery-public-data"
	srcDataset := "samples"
	srcTable := "shakespeare"

	gcsRef := bigquery.NewGCSReference(gcsURI)
	gcsRef.FieldDelimiter = ","

	extractor := client.DatasetInProject(srcProject, srcDataset).Table(srcTable).ExtractorTo(gcsRef)
	extractor.DisableHeader = true
	// You can choose to run the job in a specific location for more complex data locality scenarios.
	// Ex: In this example, source dataset and GCS bucket are in the US.
	extractor.Location = "US"

	job, err := extractor.Run(ctx)
	if err != nil {
		return err
	}
	status, err := job.Wait(ctx)
	if err != nil {
		return err
	}
	if err := status.Err(); err != nil {
		return err
	}
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import com.google.cloud.RetryOption;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.Table;
import com.google.cloud.bigquery.TableId;
import org.threeten.bp.Duration;

public class ExtractTableToCsv {

  public static void runExtractTableToCsv() {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "bigquery-public-data";
    String datasetName = "samples";
    String tableName = "shakespeare";
    String bucketName = "my-bucket";
    String destinationUri = "gs://" + bucketName + "/path/to/file";
    // For more information on export formats available see:
    // https://cloud.google.com/bigquery/docs/exporting-data#export_formats_and_compression_types
    // For more information on Job see:
    // https://googleapis.dev/java/google-cloud-clients/latest/index.html?com/google/cloud/bigquery/package-summary.html

    String dataFormat = "CSV";
    extractTableToCsv(projectId, datasetName, tableName, destinationUri, dataFormat);
  }

  // Exports datasetName:tableName to destinationUri as raw CSV
  public static void extractTableToCsv(
      String projectId,
      String datasetName,
      String tableName,
      String destinationUri,
      String dataFormat) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(projectId, datasetName, tableName);
      Table table = bigquery.getTable(tableId);

      Job job = table.extract(dataFormat, destinationUri);

      // Blocks until this job completes its execution, either failing or succeeding.
      Job completedJob =
          job.waitFor(
              RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
              RetryOption.totalTimeout(Duration.ofMinutes(3)));
      if (completedJob == null) {
        System.out.println("Job not executed since it no longer exists.");
        return;
      } else if (completedJob.getStatus().getError() != null) {
        System.out.println(
            "BigQuery was unable to extract due to an error: \n" + job.getStatus().getError());
        return;
      }
      System.out.println(
          "Table export successful. Check in GCS bucket for the " + dataFormat + " file.");
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("Table extraction job was interrupted. \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

// Import the Google Cloud client libraries
const {BigQuery} = require('@google-cloud/bigquery');
const {Storage} = require('@google-cloud/storage');

const bigquery = new BigQuery();
const storage = new Storage();

async function extractTableToGCS() {
  // Exports my_dataset:my_table to gcs://my-bucket/my-file as raw CSV.

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_dataset";
  // const tableId = "my_table";
  // const bucketName = "my-bucket";
  // const filename = "file.csv";

  // Location must match that of the source table.
  const options = {
    location: 'US',
  };

  // Export data from the table into a Google Cloud Storage file
  const [job] = await bigquery
    .dataset(datasetId)
    .table(tableId)
    .extract(storage.bucket(bucketName).file(filename), options);

  console.log(`Job ${job.id} created.`);

  // Check the job's status for errors
  const errors = job.status.errors;
  if (errors && errors.length > 0) {
    throw errors;
  }
}

PHP

Before trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

use Google\Cloud\BigQuery\BigQueryClient;

/**
 * Extracts the given table as json to given GCS bucket.
 *
 * @param string $projectId The project Id of your Google Cloud Project.
 * @param string $datasetId The BigQuery dataset ID.
 * @param string $tableId The BigQuery table ID.
 * @param string $bucketName Bucket name in Google Cloud Storage
 */
function extract_table(
    string $projectId,
    string $datasetId,
    string $tableId,
    string $bucketName
): void {
    $bigQuery = new BigQueryClient([
      'projectId' => $projectId,
    ]);
    $dataset = $bigQuery->dataset($datasetId);
    $table = $dataset->table($tableId);
    $destinationUri = "gs://{$bucketName}/{$tableId}.json";
    // Define the format to use. If the format is not specified, 'CSV' will be used.
    $format = 'NEWLINE_DELIMITED_JSON';
    // Create the extract job
    $extractConfig = $table->extract($destinationUri)->destinationFormat($format);
    // Run the job
    $job = $table->runJob($extractConfig);  // Waits for the job to complete
    printf('Exported %s to %s' . PHP_EOL, $table->id(), $destinationUri);
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'
project = "bigquery-public-data"
dataset_id = "samples"
table_id = "shakespeare"

destination_uri = "gs://{}/{}".format(bucket_name, "shakespeare.csv")
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location="US",
)  # API request
extract_job.result()  # Waits for job to complete.

print(
    "Exported {}:{}.{} to {}".format(project, dataset_id, table_id, destination_uri)
)

Ruby

Before trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Ruby API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

require "google/cloud/bigquery"

def extract_table bucket_name = "my-bucket",
                  dataset_id  = "my_dataset_id",
                  table_id    = "my_table_id"

  bigquery = Google::Cloud::Bigquery.new
  dataset  = bigquery.dataset dataset_id
  table    = dataset.table    table_id

  # Define a destination URI. Use a single wildcard URI if you think
  # your exported data will be larger than the 1 GB maximum value.
  destination_uri = "gs://#{bucket_name}/output-*.csv"

  extract_job = table.extract_job destination_uri do |config|
    # Location must match that of the source table.
    config.location = "US"
  end
  extract_job.wait_until_done! # Waits for the job to complete

  puts "Exported #{table.id} to #{destination_uri}"
end

Avro export details

BigQuery expresses Avro formatted data in the following ways:

  • The resulting export files are Avro container files.
  • Each BigQuery row is represented as an Avro record. Nested data is represented by nested record objects.
  • REQUIRED fields are represented as the corresponding Avro types. For example, a BigQuery INTEGER type maps to an Avro LONG type.
  • NULLABLE fields are represented as an Avro union of the corresponding type and "null".
  • REPEATED fields are represented as Avro arrays.
  • TIMESTAMP data types are represented as timestamp-micros logical type (it annotates an Avro LONG type) by default in both Extract jobs and Export Data SQL. (Caution: you can add use_avro_logical_types=False to Export Data Options to disable the logical type so it uses string type instead on timestamp column, but in Extract Jobs, it always uses the Avro logical type.)
  • DATE data types are represented as date logical type (it annotates an Avro INT types) by default in Export Data SQL, but are represented as string type by default in Extract jobs. (Note: you can add use_avro_logical_types=False to Export Data Options to disable the logical type, or use the flag --use_avro_logical_types=True to enable the logical type in Extract jobs.)
  • TIME data types are represented as timestamp-micro logical type (it annotates an Avro LONG types) by default in Export Data SQL, but are represented as string type by default in Extract jobs. (Note: you can add use_avro_logical_types=False to Export Data Options to disable the logical type, or use the flag --use_avro_logical_types=True to enable the logical type in Extract jobs.)
  • DATETIME data types are represented as Avro STRING types (a string type with custom named logical type datetime) by default in Export Data SQL, but are represented as string type by default in Extract jobs. (Note: you can add use_avro_logical_types=False to Export Data Options to disable the logical type, or use the flag --use_avro_logical_types=True to enable logical type in Extract jobs.)
Parameterized NUMERIC(P[, S]) and BIGNUMERIC(P[, S]) data types transfer their precision and scale type parameters to the Avro decimal logical type.

The Avro format can't be used in combination with GZIP compression. To compress Avro data, use the bq command-line tool or the API and specify one of the supported compression types for Avro data: DEFLATE or SNAPPY.

Parquet export details

BigQuery converts GoogleSQL data types to the following Parquet data types:

BigQuery data type Parquet primitive type Parquet logical type
Integer INT64 NONE
Numeric FIXED_LEN_BYTE_ARRAY DECIMAL (precision = 38, scale = 9)
Numeric(P[, S]) FIXED_LEN_BYTE_ARRAY DECIMAL (precision = P, scale = S)
BigNumeric FIXED_LEN_BYTE_ARRAY DECIMAL (precision = 76, scale = 38)
BigNumeric(P[, S]) FIXED_LEN_BYTE_ARRAY DECIMAL (precision = P, scale = S)
Floating point FLOAT NONE
Boolean BOOLEAN NONE
String BYTE_ARRAY STRING (UTF8)
Bytes BYTE_ARRAY NONE
Date INT32 DATE
Datetime INT64 TIMESTAMP (isAdjustedToUTC = false, unit = MICROS)
Time INT64 TIME (isAdjustedToUTC = true, unit = MICROS)
Timestamp INT64 TIMESTAMP (isAdjustedToUTC = false, unit = MICROS)

The Parquet schema represents nested data as a group and repeated records as repeated groups. For more information about using nested and repeated data in BigQuery, see Specifying nested and repeated columns.

You can use the following workarounds for DATETIME types:

  • Load the file into a staging table. Then use a SQL query to cast the field to a DATETIME and save the result to a new table. For more information, see Changing a column's data type.
  • Provide a schema for the table by using the --schema flag in the load job. Define the datetime column as col:DATETIME.

Exporting data into one or more files

The destinationUris property indicates the one or more locations and filenames where BigQuery should export your files.

BigQuery supports a single wildcard operator (*) in each URI. The wildcard can appear anywhere in the URI except as part of the bucket name. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern. The wildcard operator is replaced with a number (starting at 0), left-padded to 12 digits. For example, a URI with a wildcard at the end of the filename would create files with000000000000 appended to the first file, 000000000001 appended to the second file, and so on.

The following table describes several possible options for the destinationUris property:

destinationUris options
Single URI

Use a single URI if you are exporting table data that is 1 GB or less. This option is the most common use case, as exported data is generally less than the 1 GB maximum value. This option is not supported for the EXPORT DATA statement; you must use a single wildcard URI.

Property definition:

['gs://my-bucket/file-name.json']

Creates:

gs://my-bucket/file-name.json
Single wildcard URI

Use a single wildcard URI if you think your exported data will be larger than the 1 GB maximum value. BigQuery shards your data into multiple files based on the provided pattern. The size of the exported files will vary.

If you use a wildcard in a URI component other than the filename, be sure the path component does not exist before exporting your data.

Property definition:

['gs://my-bucket/file-name-*.json']

Creates:

gs://my-bucket/file-name-000000000000.json
gs://my-bucket/file-name-000000000001.json
gs://my-bucket/file-name-000000000002.json
...

Limit the exported file size

When you export more than 1 GB of data in a single export, you must use a wildcard to export the data into multiple files and the size of the files varies. If you need to limit the maximum size of each exported file, one option is to randomly partition your data and then export each partition to a file:

  1. Determine the number of partitions you need, which is equal to the total size of your data divided by the desired exported file size. For example, if you have 8,000 MB of data and you want each exported file to be approximately 20 MB, then you need 400 partitions.
  2. Create a new table that is partitioned and clustered by a new randomly generated column called export_id. The following example shows how to create a new processed_table from an existing table called source_table which requires n partitions to achieve the desired file size:

    CREATE TABLE my_dataset.processed_table
    PARTITION BY RANGE_BUCKET(export_id, GENERATE_ARRAY(0, n, 1))
    CLUSTER BY export_id
    AS (
      SELECT *, CAST(FLOOR(n*RAND()) AS INT64) AS export_id
      FROM my_dataset.source_table
    );
    
  3. For each integer i between 0 and n-1, run an EXPORT DATA statement on the following query:

    SELECT * EXCEPT(export_id)
    FROM my_dataset.processed_table
    WHERE export_id = i;
    

Extract compressed table

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import (
	"context"
	"fmt"

	"cloud.google.com/go/bigquery"
)

// exportTableAsCompressedCSV demonstrates using an export job to
// write the contents of a table into Cloud Storage as compressed CSV.
func exportTableAsCompressedCSV(projectID, gcsURI string) error {
	// projectID := "my-project-id"
	// gcsURI := "gs://mybucket/shakespeare.csv"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %w", err)
	}
	defer client.Close()

	srcProject := "bigquery-public-data"
	srcDataset := "samples"
	srcTable := "shakespeare"

	gcsRef := bigquery.NewGCSReference(gcsURI)
	gcsRef.Compression = bigquery.Gzip

	extractor := client.DatasetInProject(srcProject, srcDataset).Table(srcTable).ExtractorTo(gcsRef)
	extractor.DisableHeader = true
	// You can choose to run the job in a specific location for more complex data locality scenarios.
	// Ex: In this example, source dataset and GCS bucket are in the US.
	extractor.Location = "US"

	job, err := extractor.Run(ctx)
	if err != nil {
		return err
	}
	status, err := job.Wait(ctx)
	if err != nil {
		return err
	}
	if err := status.Err(); err != nil {
		return err
	}
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.ExtractJobConfiguration;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.TableId;

// Sample to extract a compressed table
public class ExtractTableCompressed {

  public static void main(String[] args) {
    // TODO(developer): Replace these variables before running the sample.
    String projectName = "MY_PROJECT_NAME";
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    String bucketName = "MY-BUCKET-NAME";
    String destinationUri = "gs://" + bucketName + "/path/to/file";
    // For more information on export formats available see:
    // https://cloud.google.com/bigquery/docs/exporting-data#export_formats_and_compression_types
    String compressed = "gzip";
    // For more information on Job see:
    // https://googleapis.dev/java/google-cloud-clients/latest/index.html?com/google/cloud/bigquery/package-summary.html
    String dataFormat = "CSV";

    extractTableCompressed(
        projectName, datasetName, tableName, destinationUri, dataFormat, compressed);
  }

  public static void extractTableCompressed(
      String projectName,
      String datasetName,
      String tableName,
      String destinationUri,
      String dataFormat,
      String compressed) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      TableId tableId = TableId.of(projectName, datasetName, tableName);

      ExtractJobConfiguration extractConfig =
          ExtractJobConfiguration.newBuilder(tableId, destinationUri)
              .setCompression(compressed)
              .setFormat(dataFormat)
              .build();

      Job job = bigquery.create(JobInfo.of(extractConfig));

      // Blocks until this job completes its execution, either failing or succeeding.
      Job completedJob = job.waitFor();
      if (completedJob == null) {
        System.out.println("Job not executed since it no longer exists.");
        return;
      } else if (completedJob.getStatus().getError() != null) {
        System.out.println(
            "BigQuery was unable to extract due to an error: \n" + job.getStatus().getError());
        return;
      }
      System.out.println("Table extract compressed successful");
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("Table extraction job was interrupted. \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

// Import the Google Cloud client libraries
const {BigQuery} = require('@google-cloud/bigquery');
const {Storage} = require('@google-cloud/storage');

const bigquery = new BigQuery();
const storage = new Storage();

async function extractTableCompressed() {
  // Exports my_dataset:my_table to gcs://my-bucket/my-file as a compressed file.

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_dataset";
  // const tableId = "my_table";
  // const bucketName = "my-bucket";
  // const filename = "file.csv";

  // Location must match that of the source table.
  const options = {
    location: 'US',
    gzip: true,
  };

  // Export data from the table into a Google Cloud Storage file
  const [job] = await bigquery
    .dataset(datasetId)
    .table(tableId)
    .extract(storage.bucket(bucketName).file(filename), options);

  console.log(`Job ${job.id} created.`);

  // Check the job's status for errors
  const errors = job.status.errors;
  if (errors && errors.length > 0) {
    throw errors;
  }
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'

destination_uri = "gs://{}/{}".format(bucket_name, "shakespeare.csv.gz")
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table("shakespeare")
job_config = bigquery.job.ExtractJobConfig()
job_config.compression = bigquery.Compression.GZIP

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location="US",
    job_config=job_config,
)  # API request
extract_job.result()  # Waits for job to complete.

Example use case

This example shows how you can export data to Cloud Storage.

Suppose you are streaming data to Cloud Storage from endpoint logs continuously. A daily snapshot is to be exported to Cloud Storage for backup and archival purposes. The best choice is an extract job subject to certain quotas and limitations.

Submit an extract job with the API or client libraries, passing in a unique ID as jobReference.jobId. Extract Jobs are asynchronous. Check the job status using the unique job ID used to create the job. The job completed successfully if status.status is DONE. If status.errorResult is present, the job failed and needs to be retried.

Batch data processing

Suppose a nightly batch job is used to load data by a fixed deadline. After this load job completes, a table with statistics is materialized from a query as described in the preceding section. Data from this table is retrieved and compiled into a PDF report and sent to a regulator.

Since the amount of data that needs to be read is small, use the tabledata.list API to retrieve all rows of the table in JSON dictionary format. If there is more than one page of data, the results have the pageToken property set. To retrieve the next page of results, make another tabledata.list call and include the token value as the pageToken parameter. If the API call fails with a 5xx error, retry with exponential backoff. Most 4xx errors are not retryable. For better decoupling of BigQuery export and report generation, results should be persisted to disk.

Quota policy

For information on export job quotas, see Export jobs on the Quotas and limits page.

Usage for export jobs are available in the INFORMATION_SCHEMA. The job entry in the JOBS_BY_* system tables for the export job contains a total_processed_bytes value that can be used to monitor the aggregate usage to ensure that it stays below 50 TB per-day. To learn how to query the INFORMATION_SCHEMA.JOBS view to get the total_processed_bytes value, see Get bytes processed by export jobs

View current quota usage

You can view your current usage of query, load, extract, or copy jobs by running an INFORMATION_SCHEMA query to view metadata about the jobs ran over a specified time period. You can compare your current usage against the quota limit to determine your quota usage for a particular type of job. The following example query uses the INFORMATION_SCHEMA.JOBS view to list the number of query, load, extract, and copy jobs by project:

SELECT
  sum(case  when job_type="QUERY" then 1 else 0 end) as QRY_CNT,
  sum(case  when job_type="LOAD" then 1 else 0 end) as LOAD_CNT,
  sum(case  when job_type="EXTRACT" then 1 else 0 end) as EXT_CNT,
  sum(case  when job_type="COPY" then 1 else 0 end) as CPY_CNT
FROM `region-eu`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE date(creation_time)= CURRENT_DATE()

You can set up a Cloud Monitoring alert policy that provides notification of the number of bytes exported.

  1. In the Google Cloud console, go to the Monitoring page.

    Go to Monitoring

  2. In the navigation pane, select Metrics explorer.

  3. In the MQL query editor, set up an alert to monitor your exported bytes per day, as seen in the following example:

    fetch consumer_quota
      | filter resource.service == 'bigquery.googleapis.com'
      | { metric serviceruntime.googleapis.com/quota/rate/net_usage
          | align delta_gauge(1m)
          | group_by [resource.project_id, metric.quota_metric, resource.location],
              sum(value.net_usage)
        ; metric serviceruntime.googleapis.com/quota/limit
          | filter metric.limit_name == 'ExtractBytesPerDay'
          | group_by [resource.project_id, metric.quota_metric, resource.location],
              sliding(1m), max(val()) }
      | ratio
      | every 1m
      | condition gt(val(), 0.01 '1')
    
  4. To set up your alert, click Run query.

For more information, see Alerting policies with MQL.

Pricing

For information on data export pricing, see the BigQuery pricing page.

Once the data is exported, you are charged for storing the data in Cloud Storage. For more information, see Cloud Storage pricing.

Table security

To control access to tables in BigQuery, see Introduction to table access controls.

What's next