Copying datasets

You can copy a dataset within a region or from one region to another, without extracting, moving, and reloading data into BigQuery. You can copy the dataset one time or on a customized recurring schedule.

Before you begin

Before you create a dataset copy:

  • Dataset copying uses features of the BigQuery Data Transfer Service. In the Google Cloud project for your destination dataset, verify that you have completed all actions required to enable the BigQuery Data Transfer Service.
  • If you are creating the transfer by using the classic BigQuery web UI, allow pop-ups in your browser from bigquery.cloud.google.com so that you can view the permissions window. You must allow the BigQuery Data Transfer Service permission to manage your dataset copy, which is a kind of transfer.
  • Create a BigQuery dataset as the destination dataset, in the same region or a different region from your source dataset. You specify a location for storing your BigQuery data when you create a dataset. Not all regions are supported yet for dataset copying (see Location limitations). The dataset name must be unique per project.
  • Locate the ID of your source dataset that you wish to copy, and the source project's ID.
  • If you intend to set up transfer run notifications for Pub/Sub, you must have pubsub.topics.setIamPolicy permissions. Pub/Sub permissions are not required if you just set up email notifications. For more information, see BigQuery Data Transfer Service run notifications.

Required permissions

Before creating a dataset copy, ensure that the person creating the dataset copy has the following required permissions in BigQuery:

  1. The bigquery.transfers.update permission to create the copy transfer.

  2. The bigquery.tables.list permission on the source dataset.

    The following predefined, project-level IAM roles contain the bigquery.tables.list permission:

    • bigquery.dataViewer
    • bigquery.dataEditor
    • bigquery.dataOwner
    • bigquery.metadataViewer
    • bigquery.user
  3. The bigquery.tables.list and bigquery.tables.create permissions on the destination dataset.

    The following predefined, project-level IAM roles contain the bigquery.tables.list and bigquery.tables.create permissions:

    • bigquery.dataEditor
    • bigquery.dataOwner
  4. The bigquery.admin predefined, project-level IAM role includes all the required permissions for a dataset copy. For more information on IAM roles in BigQuery, see Access control.

Setting up a dataset copy

To create a dataset copy.

Console

Option 1: Use the Copy Dataset icon.

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Select the dataset name of the source dataset that you wish to copy.

  3. Click on the Copy Dataset icon.

    Copy Dataset icon

  4. In the Copy dataset dialog box, select the project ID and destination dataset ID. The project and dataset can be in different regions, but not all regions are supported for cross-region dataset copying.

  5. Dataset names within a project must be unique.

  6. Optionally, check the Overwrite destination table box if you wish to refresh (overwrite) all the data in the destination dataset. Both tables and schema will be overwritten.

    Copy dataset dialogue

  7. Consider deleting the old dataset to avoid additional storage costs.

Option 2: Use the Transfers UI.

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click Transfers.

  3. Click + CREATE A TRANSFER.

  4. On the Create Transfer page:

    • In the Source type section, for Source, choose Dataset Copy.

      Transfer source

    • In the Transfer config name section, for Display name, enter a name for the transfer such as My Transfer. The transfer name can be any value that allows you to easily identify the transfer if you need to modify it later.

      Transfer name

    • In the Schedule options section, for Schedule, leave the default value (Start now) or click Start at a set time.

      • For Repeats, choose an option for how often to run the transfer. Options include:

        • Daily (default)
        • Weekly
        • Monthly
        • Custom
        • On-demand

        If you choose an option other than Daily, additional options are available. For example, if you choose Weekly, an option appears for you to select the day of the week.

      • For Start date and run time, enter the date and time to start the transfer. If you choose Start now, this option is disabled.

        Transfer schedule

    • For Destination dataset, choose the dataset you created to store your data, in a different region.

    • For Source dataset, input the name of the dataset you wish to copy.

    • For Source project, input the ID of the project your source dataset is in.

    • (Optional) Check the Overwrite destination table box if you wish to refresh (overwrite) all data in the destination dataset. Both tables and schema will be overwritten.

      New dataset copy

    • (Optional) In the Notification options section:

      • Click the toggle to enable email notifications. When you enable this option, the transfer administrator receives an email notification when a transfer run fails.
      • For Select a Pub/Sub topic, choose your topic name or click Create a topic to create one. This option configures Pub/Sub run notifications for your transfer.

      Pub/Sub topic

  5. Click Save.

  6. Consider deleting the old dataset to avoid additional storage costs.

Classic UI

  1. Go to the classic BigQuery web UI.

    Go to the classic BigQuery web UI

  2. Click Transfers.

  3. Click Add Transfer.

  4. On the New Transfer page:

    • For Source, choose Dataset Copy.
    • For Display name, enter a name for the transfer such as My Dataset Copy. The display name can be any value that allows you to easily identify the transfer if you need to modify it later.
    • (Optional) For Schedule, you can leave the default value of Daily (every 24 hours, based on creation time), or click Edit to change the time. You can also change the interval to Weekly, Monthly, or Custom. When selecting Custom, a Cron-like time specification is expected, for example every 12 hours. The shortest allowed period is 12 hours.

    • For Destination dataset, choose the appropriate dataset.

    • For Source dataset, enter the name of your source dataset.

    • For Source project, select the project your source dataset is in.

      New dataset copy

    • (Optional) Expand the Advanced section and configure run notifications for your transfer.

    • For Cloud Pub/Sub topic, enter your Cloud Pub/Sub topic name, for example, projects/myproject/topics/mytopic.

    • Check Send email notifications to allow email notifications of transfer run failures.

      Cloud Pub/Sub topic

  5. Click Add.

  6. Consider deleting the old dataset to avoid additional storage costs.

CLI

Enter the bq mk command and supply the transfer creation flag --transfer_config. The following flags are also required:

  • --project_id
  • --data_source
  • --target_dataset
  • --display_name
  • --params

    bq mk --transfer_config --project_id=[PROJECT_ID] --data_source=[DATA_SOURCE] --target_dataset=[DATASET] --display_name=[NAME] --params='[PARAMETERS]'
    

Where:

  • --project_id is your Google Cloud project ID. If --project_id isn't specified, the default project is used.
  • --data_source is the data source: cross_region_copy.
  • --target_dataset is the BigQuery target dataset for the transfer configuration.
  • --display_name is the display name for the copy job, or transfer configuration. The transfer name can be any value that allows you to easily identify the transfer if you need to modify it later.
  • --params contains the parameters for the created transfer configuration in JSON format. For example: --params='{"param":"param_value"}'. For dataset copying, you must supply the source_dataset_id, the source_project_id, and optionally the overwrite_destination_table parameters.

Parameters for a dataset copy configuration are:

  • source_dataset_id : The ID of the source dataset you wish to copy.
  • source_project_id : The ID of the project your source dataset is in.
  • (Optional) overwrite_destination_table : Include this parameter if you wish to truncate the tables of a previous copy and refresh all the data.

For example, the following command creates a dataset copy configuration named My Transfer with a target dataset named mydataset and a project with the ID of myproject.

bq mk --transfer_config --project_id=myproject --data_source=cross_region_copy --target_dataset=123_demo_tokyo --display_name='My Dataset Copy' --params='{"source_dataset_id":"123_demo_eu","source_project_id":"mysourceproject","overwrite_destination_table":"true"}'

API

Use the projects.locations.transferConfigs.create method and supply an instance of the TransferConfig resource.

Viewing and refreshing dataset copy jobs

You can see progress and view details of a dataset copy under Transfers.

Console

Transfers button console

View transfer details console

Dataset copy refresh runs are scheduled to run with the schedule you specify, which is configurable. The default is every 24 hours. Click on the More menu and then Refresh Now if you want to request a new run for immediate execution.

Refresh dataset copy button

Classic UI

Transfers button classic

Dataset copy refresh runs are scheduled to run with the schedule you specify, which is configurable. The default is every 24 hours. Click Refresh Now if you want to request a new run for immediate execution.

Refresh dataset copy button

Pricing

There is no charge for dataset copying during the beta period.

At general availability, data copied between regions will be billed at the same rates as pricing for Compute Engine network egress between regions.

BigQuery sends compressed data for copying across regions, so the GB billed may be less than the size of your dataset.

Quotas and limitations

The following limits apply to copying datasets in BigQuery. The limits apply to copy jobs created by using the command-line tool (CLI), the console, or the classic BigQuery web UI. The limits also apply to copy jobs submitted programmatically by using the copy-type jobs.insert API method.

All standard BigQuery usage charges for storage and querying apply on copied data. See BigQuery pricing for more information.

BigQuery dataset copies are subject to the following limitations:

Quotas

In-region

Copying datasets within the same region follows the same quota for all BigQuery copy jobs.

The following limits apply to copying tables in BigQuery. The limits apply to jobs created automatically by copying data using the command-line tool, the console, or the classic BigQuery web UI. The limits also apply to copy jobs submitted programmatically by using the copy-type jobs.insert API method.

  • Copy jobs per destination table per day — 1,000 (including failures)
  • Copy jobs per project per day — 100,000 (including failures)

Cross-region

  • Copy jobs per destination table per day — 100 (including failures)
  • Copy jobs per project per day — 2,000 (including failures)

General limitations

  • You must create the destination dataset before you can create a transfer configuration for the dataset copy.

  • For each dataset copy configuration, you can have one active copy at a time. Additional transfer runs will be queued.

Data type limitations

  • Copying views is not supported.
  • Copying external tables is not supported.
  • Copying storage in a streaming buffer is not supported.
  • Support for copying tables encrypted with customer-managed keys varies if you are copying in the same region or across regions.

In-region

Copying encrypted tables, including tables encrypted with customer-managed keys (CMEK), is supported for dataset copying within the same region.

Cross-region

Copying tables with the default encryption across regions is supported. Copying tables encrypted with customer-managed keys (CMEK) across regions is not currently supported. CMEK-encrypted tables will be skipped when copying tables to the destination dataset.

Source dataset limitations

The source dataset size maximum is 20,000 tables in it. How many tables can be copied per run varies when copying in the same region or across regions.

In-region

  • The source dataset size can have a maximum of 20,000 tables in it.

  • A maximum of 20,000 tables can be copied per run to the destination dataset.

Cross-region

  • The source dataset size can have a maximum of 20,000 tables in it.

  • A maximum of 1000 tables can be copied per run to the destination dataset.

Example

If you configure a cross-region copy of a dataset with 8,000 tables in it, the BigQuery Data Transfer Service will automatically create 8 runs in a sequential manner. The first run will copy 1,000 tables, then 24 hours later another run will copy 1,000 tables, etc., until all tables in the dataset are copied — up to the maximum of 20,000 tables per dataset.

Table limitations

  • Copying partitioned tables is currently supported. However, appending data to a partitioned table is not supported.

  • If a table exists in the source dataset and the destination dataset, and it has not changed since the last successful copy, it will be skipped. This is true even if the Override destination table box is checked.

  • When truncating tables, the dataset copy does not detect any changes made to the destination dataset before beginning the copy. All the data in the destination dataset will be refreshed (overwritten). Both tables and schema will be overwritten.

Location limitations

Datasets can be copied from region to region, single region to multi-region, multi-region to single region, or multi-region to multi-region.

Not all regions are currently supported for dataset copying. You can create dataset copies in regions where the BigQuery Data Transfer Service is currently supported, as shown below.

Regional locations

Region Name Region Description
Europe
europe-west2 London
Asia Pacific
asia-northeast1 Tokyo
asia-southeast1 Singapore
australia-southeast1 Sydney

Multi-regional locations

Multi-Region Name Multi-Region Description
EU European Union1
US United States

1 Data located in the EU multi-region is not stored in the europe-west2 (London) data center.

What's next

  • For information on using transfers, including getting information about a transfer configuration, listing transfer configurations, and viewing a transfer's run history, see Working with transfers.
Var denne siden nyttig? Si fra hva du synes:

Send tilbakemelding om ...

Trenger du hjelp? Gå til brukerstøttesiden vår.