Copy datasets
You can copy a dataset within a region or from one region to another, without having to extract, move, and reload data into BigQuery. You can copy the dataset one time or on a customized recurring schedule.
Quotas and limitations
BigQuery dataset copies are subject to the following quotas and limitations.
Copy quotas
Copying a dataset requires one copy job for each table in the dataset.
The following limits apply to BigQuery jobs for
copying tables, including jobs
that create a copy, clone, or snapshot of a standard table, table clone, or
table snapshot.
The limits apply to jobs created by using the Google Cloud console, the
bq
command-line tool, or the copy-type
jobs.insert
method.
Copy jobs count toward these limits whether they succeed or fail.
Limit | Default | Notes |
---|---|---|
Copy jobs per destination table per day | See Table operations per day. | |
Copy jobs per day | 100,000 jobs | Your project can run up to 100,000 copy jobs per day. |
Cross-region copy jobs per destination table per day | 100 jobs | Your project can run up to 100 cross-region copy jobs for a destination table per day. |
Cross-region copy jobs per day | 2,000 jobs | Your project can run up to 2,000 cross-region copy jobs per day. |
Number of source tables to copy | 1,200 source tables | You can copy from up to 1,200 source tables per copy job. |
For information on viewing your current copy job usage, see Copy jobs - View current quota usage.
The following limits apply to copying datasets:
Limit | Default | Notes |
---|---|---|
Maximum number of tables in the source dataset | 20,000 tables | A source dataset can have up to 20,000 tables. |
Maximum number of tables that can be copied per run to a destination dataset in the same region | 20,000 tables | Your project can copy 20,000 tables per run to a destination dataset that is in the same region. |
Maximum number of tables that can be copied per run to a destination dataset in a different region | 1,000 tables | Your project can copy 1,000 tables per run to a destination dataset that is in a different region. For example, if you configure a cross-region copy of a dataset with 8,000 tables in it, then BigQuery Data Transfer Service automatically creates eight runs in a sequential manner. The first run copies 1,000 tables. Twenty-four hours later, the second run copies 1,000 tables. This process continues until all tables in the dataset are copied, up to the maximum of 20,000 tables per dataset. |
For more information, see Copy jobs.
General limitations
You must create the destination dataset before you can create a transfer configuration for the dataset copy.
For each dataset copy configuration, you can have one active copy at a time. Additional transfer runs are queued.
You cannot increase the frequency of the copy job to be less than every 12 hours.
Data type limitations
- Copying views is not supported.
- Copying external tables is not supported.
- Copying data in the write-optimized storage is not supported.
Support for copying tables encrypted with customer-managed keys varies if you are copying in the same region or across regions.
In-region
Copying encrypted tables, including tables encrypted with customer-managed keys (CMEK), is supported for dataset copying within the same region.
Cross-region
Copying tables with the default encryption across regions is supported. Copying tables encrypted with customer-managed keys (CMEK) across regions is not currently supported. CMEK-encrypted tables are skipped when copying tables to the destination dataset.
Destination dataset limitations
Datasets with customer-managed keys (CMEK) cannot be used as a destination when copying across regions. However, a table with CMEK is allowed as a destination when copying within regions.
Table limitations
Copying partitioned tables is currently supported. However, appending data to a partitioned table is not supported.
If a table exists in the source dataset and the destination dataset, and the source table has not changed since the last successful copy, it is skipped. This is true even if the Overwrite destination tables box is checked.
When truncating tables, the dataset copy does not detect any changes made to the destination dataset before beginning the copy. All the data in the destination dataset is refreshed (overwritten). Both tables and schema are overwritten.
- If you intend to overwrite the destination table when copying by using
the
overwrite_destination_table
flag, both tables must have the same partitioning schema.
- If you intend to overwrite the destination table when copying by using
the
While data transfer is in progress, the changes to the source tables may not be reflected in the destination tables if the table copy jobs have already started.
Before you begin
Before you create a dataset copy:
- Dataset copying uses features of the BigQuery Data Transfer Service. In the Google Cloud project for your destination dataset, verify that you have completed all actions required to enable the BigQuery Data Transfer Service.
- Create a BigQuery dataset as the destination dataset, in the same region or a different region from your source dataset. You specify a location for storing your BigQuery data when you create a dataset. Not all regions are supported yet for dataset copying (see Supported regions). The dataset name must be unique per project.
- Locate the ID of your source dataset that you want to copy, and the source project's ID.
- If you intend to set up transfer run notifications for Pub/Sub, you
must have
pubsub.topics.setIamPolicy
permissions. Pub/Sub permissions are not required if you just set up email notifications. For more information, see BigQuery Data Transfer Service run notifications. - If you intend to overwrite the destination table when copying by using
the
overwrite_destination_table
flag, both tables must have the same partitioning schema. - Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
Required permissions
To create a dataset copy, you need the following IAM permissions:
To create the copy transfer, you need the following on the project:
bigquery.transfers.update
bigquery.jobs.create
On the source dataset, you need the following:
bigquery.datasets.get
bigquery.tables.list
On the destination dataset, you need the following:
bigquery.datasets.get
bigquery.datasets.update
bigquery.tables.create
bigquery.tables.list
The predefined IAM role roles/bigquery.admin
includes the permissions that you need in order to create a dataset copy.
For more information on IAM roles in BigQuery, see Predefined roles and permissions or the IAM permissions reference.
Setting up a dataset copy
To create a dataset copy:
Console
Option 1: Use the Copy button.
Go to the BigQuery page in the Google Cloud console.
In the Explorer panel, expand your project and select a dataset.
Expand the
Actions option and click Open.Click Copy. In the Copy dataset dialog that appears, do the following:
In the Dataset field, either create a new dataset or select an existing dataset ID from the list.
Dataset names within a project must be unique. The project and dataset can be in different regions, but not all regions are supported for cross-region dataset copying.
In the Location field, the location of the source dataset is displayed.
Optional: To overwrite both data and schema of the destination tables with the source tables, select the Overwrite destination tables checkbox.
To copy the dataset, click Copy.
To avoid additional storage costs, consider deleting the old dataset.
Option 2: Use the BigQuery Data Transfer Service.
bq
Enter the bq mk
command and supply the transfer creation flag
--transfer_config
. The following flags are also required:
--project_id
--data_source
--target_dataset
--display_name
--params
bq mk --transfer_config --project_id=PROJECT_ID --data_source=cross_region_copy --target_dataset=DATASET --display_name=NAME --params='PARAMETERS'
The --data_source
must be set to cross_region_copy
for dataset copy
commands. For a complete list of valid values for the --data_source
flag,
see the
transfer-config flags
in the bq
command-line tool reference.
Replace the following:
PROJECT_ID
: your Google Cloud project ID. If--project_id
isn't specified, the default project is used.DATASET
: the BigQuery target dataset for the transfer configuration.NAME
: the display name for the copy job or transfer configuration. The transfer name can be any value that lets you easily identify the transfer if you need to modify it later.PARAMETERS
: contains the parameters for the created transfer configuration in JSON format.Parameters for a dataset copy configuration are:
source_dataset_id
: the ID of the source dataset you want to copy.source_project_id
: the ID of the project your source dataset is in.- (Optional)
overwrite_destination_table
: include this flag if you want to truncate the tables of a previous copy and refresh all the data.
Appropriate formatting of the parameters varies, depending on the your environment:
Linux: use single quotes to enclose the JSON string. For example:
'{"source_dataset_id":"mydataset","source_project_id":"mysourceproject","overwrite_destination_table":"true"}'
Windows command line: use double quotes to enclose the JSON string, and escape double quotes in the string with a backslash. For example:
"{\"source_dataset_id\":\"mydataset\",\"source_project_id\":\"mysourceproject\",\"overwrite_destination_table\":\"true\"}"
Powershell: use single quotes to enclose the JSON string, and escape double quotes in the string with a backslash. For example:
'{\"source_dataset_id\":\"mydataset\",\"source_project_id\":\"mysourceproject\",\"overwrite_destination_table\":\"true\"}'
For example, the following command creates a dataset copy configuration
named My Transfer
with a target dataset named mydataset
and a project
with the ID of myproject
.
bq mk --transfer_config --project_id=myproject --data_source=cross_region_copy --target_dataset=mydataset --display_name='My Dataset Copy' --params='{"source_dataset_id":"123_demo_eu","source_project_id":"mysourceproject","overwrite_destination_table":"true"}'
For additional information on the bq mk
command, see
Command-line tool reference.
API
Use the projects.locations.transferConfigs.create
method and supply an instance of the
TransferConfig
resource.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Python API
reference documentation.
pip install google-cloud-bigquery-datatransfer
. Then create a transfer configuration to copy the dataset.
Viewing and refreshing dataset copy jobs
To see progress and view details of a dataset copy in Data transfers, do the following:
In the Google Cloud console, go to the BigQuery page.
Click Data transfers.
Select a transfer for which you want to view the transfer details.
On the Transfer details page, select a transfer run.
To refresh, click More > Refresh transfer.
Pricing
There is no charge for dataset copying during the beta period.
At general availability, data egress charges for data copied between regions or multi-regions is billed at the same rates as BigQuery data extraction pricing. There is no charge for datasets copied within a single region or multi-region.
If you copy a dataset within a single region (or within a multi-region), you are not charged egress charges. When you copy a dataset between regions (or between a region and a multi-region, or vice versa), you will be charged for egress.
BigQuery sends compressed data for copying across regions, so the GB billed might be less than the size of your dataset.
For more information, see BigQuery pricing.
Supported regions
There are two types of locations:
A region is a specific geographic place, such as London.
A multi-region is a large geographic area, such as the United States, that contains two or more geographic places.
Datasets can be copied from region to region, single region to multi-region, multi-region to single region, or multi-region to multi-region.
Not all regions are currently supported for dataset copying. You can create dataset copies in regions where the BigQuery Data Transfer Service is currently supported, as shown below.
Regions
The following table lists the regions in the Americas where BigQuery is available.Region description | Region name | Details |
---|---|---|
Columbus, Ohio | us-east5 |
|
Iowa | us-central1 |
|
Las Vegas | us-west4 |
|
Los Angeles | us-west2 |
|
Montréal | northamerica-northeast1 |
|
Northern Virginia | us-east4 |
|
Oregon | us-west1 |
|
Salt Lake City | us-west3 |
|
São Paulo | southamerica-east1 |
|
Santiago | southamerica-west1 |
|
South Carolina | us-east1 |
|
Toronto | northamerica-northeast2 |
|
Region description | Region name | Details |
---|---|---|
Delhi | asia-south2 |
|
Hong Kong | asia-east2 |
|
Jakarta | asia-southeast2 |
|
Melbourne | australia-southeast2 |
|
Mumbai | asia-south1 |
|
Osaka | asia-northeast2 |
|
Seoul | asia-northeast3 |
|
Singapore | asia-southeast1 |
|
Sydney | australia-southeast1 |
|
Taiwan | asia-east1 |
|
Tokyo | asia-northeast1 |
Region description | Region name | Details |
---|---|---|
Belgium | europe-west1 |
|
Finland | europe-north1 |
|
Frankfurt | europe-west3 |
|
London | europe-west2 |
|
Madrid | europe-southwest1 |
|
Milan | europe-west8 |
|
Netherlands | europe-west4 |
|
Paris | europe-west9 |
|
Warsaw | europe-central2 |
|
Zürich | europe-west6 |
|
Multi-regions
The following table lists the multi-regions where BigQuery is available.Multi-region description | Multi-region name |
---|---|
Data centers within member states of the European Union1 | EU |
Data centers in the United States | US |
1 Data located in the EU
multi-region is not
stored in the europe-west2
(London) or europe-west6
(Zürich) data
centers.
Dataset security
To control access to datasets in BigQuery, see Controlling access to datasets. For information about data encryption, see Encryption at rest.
What's next
- For more information about using transfers, including getting information about a transfer configuration, listing transfer configurations, and viewing a transfer's run history, see Working with transfers.