Stay organized with collections Save and categorize content based on your preferences.

Copy datasets

You can copy a dataset within a region or from one region to another, without having to extract, move, and reload data into BigQuery. You can copy the dataset one time or on a customized recurring schedule.

Quotas and limitations

BigQuery dataset copies are subject to the following quotas and limitations.

Copy quotas

Copying a dataset requires one copy job for each table in the dataset.

The following limits apply to BigQuery jobs for copying tables, including jobs that create a copy, clone, or snapshot of a standard table, table clone, or table snapshot. The limits apply to jobs created by using the Google Cloud console, the bq command-line tool, or the copy-type jobs.insert method. Copy jobs count toward these limits whether they succeed or fail.

Limit Default Notes
Copy jobs per destination table per day See Table operations per day.
Copy jobs per day 100,000 jobs Your project can run up to 100,000 copy jobs per day.
Cross-region copy jobs per destination table per day 100 jobs Your project can run up to 100 cross-region copy jobs for a destination table per day.
Cross-region copy jobs per day 2,000 jobs Your project can run up to 2,000 cross-region copy jobs per day.
Number of source tables to copy 1,200 source tables You can copy from up to 1,200 source tables per copy job.

For information on viewing your current copy job usage, see Copy jobs - View current quota usage.

The following limits apply to copying datasets:

Limit Default Notes
Maximum number of tables in the source dataset 20,000 tables A source dataset can have up to 20,000 tables.
Maximum number of tables that can be copied per run to a destination dataset in the same region 20,000 tables Your project can copy 20,000 tables per run to a destination dataset that is in the same region.
Maximum number of tables that can be copied per run to a destination dataset in a different region 1,000 tables Your project can copy 1,000 tables per run to a destination dataset that is in a different region. For example, if you configure a cross-region copy of a dataset with 8,000 tables in it, then BigQuery Data Transfer Service automatically creates eight runs in a sequential manner. The first run copies 1,000 tables. Twenty-four hours later, the second run copies 1,000 tables. This process continues until all tables in the dataset are copied, up to the maximum of 20,000 tables per dataset.

For more information, see Copy jobs.

General limitations

  • You must create the destination dataset before you can create a transfer configuration for the dataset copy.

  • For each dataset copy configuration, you can have one active copy at a time. Additional transfer runs are queued.

  • You cannot increase the frequency of the copy job to be less than every 12 hours.

Data type limitations

  • Copying views is not supported.
  • Copying external tables is not supported.
  • Copying data in the write-optimized storage is not supported.
  • Support for copying tables encrypted with customer-managed keys varies if you are copying in the same region or across regions.

In-region

Copying encrypted tables, including tables encrypted with customer-managed keys (CMEK), is supported for dataset copying within the same region.

Cross-region

Copying tables with the default encryption across regions is supported. Copying tables encrypted with customer-managed keys (CMEK) across regions is not currently supported. CMEK-encrypted tables are skipped when copying tables to the destination dataset.

Destination dataset limitations

Datasets with customer-managed keys (CMEK) cannot be used as a destination when copying across regions. However, a table with CMEK is allowed as a destination when copying within regions.

Table limitations

  • Copying partitioned tables is currently supported. However, appending data to a partitioned table is not supported.

  • If a table exists in the source dataset and the destination dataset, and the source table has not changed since the last successful copy, it is skipped. This is true even if the Overwrite destination tables box is checked.

  • When truncating tables, the dataset copy does not detect any changes made to the destination dataset before beginning the copy. All the data in the destination dataset is refreshed (overwritten). Both tables and schema are overwritten.

    • If you intend to overwrite the destination table when copying by using the overwrite_destination_table flag, both tables must have the same partitioning schema.
  • While data transfer is in progress, the changes to the source tables may not be reflected in the destination tables if the table copy jobs have already started.

Before you begin

Before you create a dataset copy:

  • Dataset copying uses features of the BigQuery Data Transfer Service. In the Google Cloud project for your destination dataset, verify that you have completed all actions required to enable the BigQuery Data Transfer Service.
  • Create a BigQuery dataset as the destination dataset, in the same region or a different region from your source dataset. You specify a location for storing your BigQuery data when you create a dataset. Not all regions are supported yet for dataset copying (see Supported regions). The dataset name must be unique per project.
  • Locate the ID of your source dataset that you want to copy, and the source project's ID.
  • If you intend to set up transfer run notifications for Pub/Sub, you must have pubsub.topics.setIamPolicy permissions. Pub/Sub permissions are not required if you just set up email notifications. For more information, see BigQuery Data Transfer Service run notifications.
  • If you intend to overwrite the destination table when copying by using the overwrite_destination_table flag, both tables must have the same partitioning schema.
  • Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Required permissions

To create a dataset copy, you need the following IAM permissions:

  • To create the copy transfer, you need the following on the project:

    • bigquery.transfers.update
    • bigquery.jobs.create
  • On the source dataset, you need the following:

    • bigquery.datasets.get
    • bigquery.tables.list
  • On the destination dataset, you need the following:

    • bigquery.datasets.get
    • bigquery.datasets.update
    • bigquery.tables.create
    • bigquery.tables.list

The predefined IAM role roles/bigquery.admin includes the permissions that you need in order to create a dataset copy.

For more information on IAM roles in BigQuery, see Predefined roles and permissions or the IAM permissions reference.

Setting up a dataset copy

To create a dataset copy:

Console

Option 1: Use the Copy button.

  1. Go to the BigQuery page in the Google Cloud console.

    Go to the BigQuery page

  2. In the Explorer panel, expand your project and select a dataset.

  3. Expand the Actions option and click Open.

  4. Click Copy. In the Copy dataset dialog that appears, do the following:

    1. In the Dataset field, either create a new dataset or select an existing dataset ID from the list.

      Dataset names within a project must be unique. The project and dataset can be in different regions, but not all regions are supported for cross-region dataset copying.

    2. In the Location field, the location of the source dataset is displayed.

    3. Optional: To overwrite both data and schema of the destination tables with the source tables, select the Overwrite destination tables checkbox.

    4. To copy the dataset, click Copy.

To avoid additional storage costs, consider deleting the old dataset.

Option 2: Use the BigQuery Data Transfer Service.

  1. Enable the BigQuery Data Transfer Service.
  2. Create a transfer for your data source.

bq

Enter the bq mk command and supply the transfer creation flag --transfer_config. The following flags are also required:

  • --project_id
  • --data_source
  • --target_dataset
  • --display_name
  • --params

    bq mk --transfer_config --project_id=PROJECT_ID --data_source=cross_region_copy --target_dataset=DATASET --display_name=NAME --params='PARAMETERS'
    

The --data_source must be set to cross_region_copy for dataset copy commands. For a complete list of valid values for the --data_source flag, see the transfer-config flags in the bq command-line tool reference.

Replace the following:

  • PROJECT_ID: your Google Cloud project ID. If --project_id isn't specified, the default project is used.
  • DATASET: the BigQuery target dataset for the transfer configuration.
  • NAME: the display name for the copy job or transfer configuration. The transfer name can be any value that lets you easily identify the transfer if you need to modify it later.
  • PARAMETERS: contains the parameters for the created transfer configuration in JSON format.

    Parameters for a dataset copy configuration are:

    • source_dataset_id: the ID of the source dataset you want to copy.
    • source_project_id: the ID of the project your source dataset is in.
    • (Optional) overwrite_destination_table: include this flag if you want to truncate the tables of a previous copy and refresh all the data.

    Appropriate formatting of the parameters varies, depending on the your environment:

    • Linux: use single quotes to enclose the JSON string. For example:

      '{"source_dataset_id":"mydataset","source_project_id":"mysourceproject","overwrite_destination_table":"true"}'
      
    • Windows command line: use double quotes to enclose the JSON string, and escape double quotes in the string with a backslash. For example:

      "{\"source_dataset_id\":\"mydataset\",\"source_project_id\":\"mysourceproject\",\"overwrite_destination_table\":\"true\"}"
      
    • Powershell: use single quotes to enclose the JSON string, and escape double quotes in the string with a backslash. For example:

      '{\"source_dataset_id\":\"mydataset\",\"source_project_id\":\"mysourceproject\",\"overwrite_destination_table\":\"true\"}'
      

For example, the following command creates a dataset copy configuration named My Transfer with a target dataset named mydataset and a project with the ID of myproject.

bq mk --transfer_config --project_id=myproject --data_source=cross_region_copy --target_dataset=mydataset --display_name='My Dataset Copy' --params='{"source_dataset_id":"123_demo_eu","source_project_id":"mysourceproject","overwrite_destination_table":"true"}'

For additional information on the bq mk command, see Command-line tool reference.

API

Use the projects.locations.transferConfigs.create method and supply an instance of the TransferConfig resource.

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.api.gax.rpc.ApiException;
import com.google.cloud.bigquery.datatransfer.v1.CreateTransferConfigRequest;
import com.google.cloud.bigquery.datatransfer.v1.DataTransferServiceClient;
import com.google.cloud.bigquery.datatransfer.v1.ProjectName;
import com.google.cloud.bigquery.datatransfer.v1.TransferConfig;
import com.google.protobuf.Struct;
import com.google.protobuf.Value;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

// Sample to copy dataset from another gcp project
public class CopyDataset {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    final String destinationProjectId = "MY_DESTINATION_PROJECT_ID";
    final String destinationDatasetId = "MY_DESTINATION_DATASET_ID";
    final String sourceProjectId = "MY_SOURCE_PROJECT_ID";
    final String sourceDatasetId = "MY_SOURCE_DATASET_ID";
    Map<String, Value> params = new HashMap<>();
    params.put("source_project_id", Value.newBuilder().setStringValue(sourceProjectId).build());
    params.put("source_dataset_id", Value.newBuilder().setStringValue(sourceDatasetId).build());
    TransferConfig transferConfig =
        TransferConfig.newBuilder()
            .setDestinationDatasetId(destinationDatasetId)
            .setDisplayName("Your Dataset Copy Name")
            .setDataSourceId("cross_region_copy")
            .setParams(Struct.newBuilder().putAllFields(params).build())
            .setSchedule("every 24 hours")
            .build();
    copyDataset(destinationProjectId, transferConfig);
  }

  public static void copyDataset(String projectId, TransferConfig transferConfig)
      throws IOException {
    try (DataTransferServiceClient dataTransferServiceClient = DataTransferServiceClient.create()) {
      ProjectName parent = ProjectName.of(projectId);
      CreateTransferConfigRequest request =
          CreateTransferConfigRequest.newBuilder()
              .setParent(parent.toString())
              .setTransferConfig(transferConfig)
              .build();
      TransferConfig config = dataTransferServiceClient.createTransferConfig(request);
      System.out.println("Copy dataset created successfully :" + config.getName());
    } catch (ApiException ex) {
      System.out.print("Copy dataset was not created." + ex.toString());
    }
  }
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

Install the Python client for the BigQuery Data Transfer API with pip install google-cloud-bigquery-datatransfer. Then create a transfer configuration to copy the dataset.
from google.cloud import bigquery_datatransfer

transfer_client = bigquery_datatransfer.DataTransferServiceClient()

destination_project_id = "my-destination-project"
destination_dataset_id = "my_destination_dataset"
source_project_id = "my-source-project"
source_dataset_id = "my_source_dataset"
transfer_config = bigquery_datatransfer.TransferConfig(
    destination_dataset_id=destination_dataset_id,
    display_name="Your Dataset Copy Name",
    data_source_id="cross_region_copy",
    params={
        "source_project_id": source_project_id,
        "source_dataset_id": source_dataset_id,
    },
    schedule="every 24 hours",
)
transfer_config = transfer_client.create_transfer_config(
    parent=transfer_client.common_project_path(destination_project_id),
    transfer_config=transfer_config,
)
print(f"Created transfer config: {transfer_config.name}")

Viewing and refreshing dataset copy jobs

To see progress and view details of a dataset copy in Data transfers, do the following:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to the BigQuery page

  2. Click Data transfers.

  3. Select a transfer for which you want to view the transfer details.

    1. On the Transfer details page, select a transfer run.

    2. To refresh, click More > Refresh transfer.

Pricing

There is no charge for dataset copying during the beta period.

At general availability, data egress charges for data copied between regions or multi-regions is billed at the same rates as BigQuery data extraction pricing. There is no charge for datasets copied within a single region or multi-region.

If you copy a dataset within a single region (or within a multi-region), you are not charged egress charges. When you copy a dataset between regions (or between a region and a multi-region, or vice versa), you will be charged for egress.

BigQuery sends compressed data for copying across regions, so the GB billed might be less than the size of your dataset.

For more information, see BigQuery pricing.

Supported regions

There are two types of locations:

  • A region is a specific geographic place, such as London.

  • A multi-region is a large geographic area, such as the United States, that contains two or more geographic places.

Datasets can be copied from region to region, single region to multi-region, multi-region to single region, or multi-region to multi-region.

Not all regions are currently supported for dataset copying. You can create dataset copies in regions where the BigQuery Data Transfer Service is currently supported, as shown below.

Regions

The following table lists the regions in the Americas where BigQuery is available.
Region description Region name Details
Columbus, Ohio us-east5
Iowa us-central1 leaf icon Low CO2
Las Vegas us-west4
Los Angeles us-west2
Montréal northamerica-northeast1 leaf icon Low CO2
Northern Virginia us-east4
Oregon us-west1 leaf icon Low CO2
Salt Lake City us-west3
São Paulo southamerica-east1 leaf icon Low CO2
Santiago southamerica-west1
South Carolina us-east1
Toronto northamerica-northeast2
The following table lists the regions in Asia Pacific where BigQuery is available.
Region description Region name Details
Delhi asia-south2
Hong Kong asia-east2
Jakarta asia-southeast2
Melbourne australia-southeast2
Mumbai asia-south1
Osaka asia-northeast2
Seoul asia-northeast3
Singapore asia-southeast1
Sydney australia-southeast1
Taiwan asia-east1
Tokyo asia-northeast1
The following table lists the regions in Europe where BigQuery is available.
Region description Region name Details
Belgium europe-west1 leaf icon Low CO2
Finland europe-north1 leaf icon Low CO2
Frankfurt europe-west3
London europe-west2
Madrid europe-southwest1 leaf icon Low CO2
Milan europe-west8
Netherlands europe-west4
Paris europe-west9 leaf icon Low CO2
Warsaw europe-central2
Zürich europe-west6 leaf icon Low CO2

Multi-regions

The following table lists the multi-regions where BigQuery is available.
Multi-region description Multi-region name
Data centers within member states of the European Union1 EU
Data centers in the United States US

1 Data located in the EU multi-region is not stored in the europe-west2 (London) or europe-west6 (Zürich) data centers.

Dataset security

To control access to datasets in BigQuery, see Controlling access to datasets. For information about data encryption, see Encryption at rest.

What's next

  • For more information about using transfers, including getting information about a transfer configuration, listing transfer configurations, and viewing a transfer's run history, see Working with transfers.