Overview
This document describes the process of migrating data from Amazon Redshift to BigQuery.
The BigQuery Data Transfer Service allows you to copy your data from an Amazon Redshift data warehouse to BigQuery. The service will engage migration agents in GKE and trigger an unload operation from Amazon Redshift to a staging area in an Amazon S3 bucket. Then the BigQuery Data Transfer Service transfers your data from the Amazon S3 bucket to BigQuery.
This diagram shows the overall flow of data between an Amazon Redshift data warehouse and BigQuery during a migration.

Before you begin
This section describes the step-by-step process of setting up a data migration from Amazon Redshift to BigQuery. The steps are:
- Meet the prerequisites and set permissions on Google Cloud.
- Grant access to your Amazon Redshift cluster.
- Grant access to your Amazon S3 staging bucket.
- Set up the migration with the BigQuery Data Transfer Service. You will need:
- The Amazon Redshift JDBC url. Follow these instructions to obtain the JDBC url.
- The username and password of your Amazon Redshift database.
- The AWS access key pair you will obtain from the step: Grant access to your S3 bucket.
- The URI of the Amazon S3 bucket you'll use to temporarily stage data. It's recommended that you set up a Lifecycle policy for this bucket to avoid unnecessary charges. The recommended expiration time is 24 hours to allow sufficient time to transfer all data to BigQuery.
Google Cloud requirements
To ensure a successful Amazon Redshift data warehouse migration, make sure you have met the following prerequisites on Google Cloud.
Choose or create a Google Cloud project to store your migration data.
-
In the Cloud Console, go to the project selector page.
-
Select or create a Google Cloud project.
-
Enable the BigQuery Data Transfer Service API.
In the Google Cloud Console, click the Enable button on the BigQuery Data Transfer Service API page.
BigQuery is automatically enabled in new projects. For an existing project, you may need to enable the BigQuery API. A green checkmark indicates that you've already enabled the API.
Create a BigQuery dataset to store your data. You do not need to create any tables.
Allow pop-ups in your browser from
bigquery.cloud.google.com
so that you can view the permissions window when you set up the transfer. You must allow the BigQuery Data Transfer Service permission to manage your transfer.
Grant access to your Amazon Redshift cluster
Follow the instructions from Amazon to whitelist the following IP addresses. You can whitelist the IP addresses that correspond to your dataset's location, or you can whitelist all of the IP addresses in the table below. These Google-owned IP addresses are reserved for Amazon Redshift data migrations.
US (us multi-region) |
Tokyo (asia-northeast1) |
EU (eu multi-region) |
London (europe-west2) |
Australia (australia-southeast1) |
---|---|---|---|---|
35.185.196.212 35.197.102.120 35.185.224.10 35.185.228.170 35.197.5.235 35.185.206.139 35.197.67.234 35.197.38.65 35.185.202.229 35.185.200.120 |
34.85.11.246 34.85.30.58 34.85.8.125 34.85.38.59 34.85.31.67 34.85.36.143 34.85.32.222 34.85.18.128 34.85.23.202 34.85.35.192 |
34.76.156.158 34.76.156.172 34.76.136.146 34.76.1.29 34.76.156.232 34.76.156.81 34.76.156.246 34.76.102.206 34.76.129.246 34.76.121.168 |
35.189.119.113 35.189.101.107 35.189.69.131 35.197.205.93 35.189.121.178 35.189.121.41 35.189.85.30 35.197.195.192 |
35.189.33.150 35.189.38.5 35.189.29.88 35.189.22.179 35.189.20.163 35.189.29.83 35.189.31.141 35.189.14.219 |
Grant access to your Amazon S3 bucket
You will need an S3 bucket to use as a staging area, for transferring the Amazon Redshift data to BigQuery. Detailed instructions from Amazon can be found here.
We recommended you create a dedicated Amazon IAM user, and grant that user only Read access to Redshift and Read and Write access to S3. This can be achieved by applying the following existing policies:
Create an Amazon IAM user access key pair.
Optional: workload control with a separate migration queue
You can define an Amazon Redshift queue for migration purposes to limit and separate the resources used for migration. This migration queue can be configured with a max concurrency query count. You can then associate a certain migration user group with the queue, and use those credentials when setting up the migration to transfer data to BigQuery. The transfer service will only have access to the migration queue.
Setting up an Amazon Redshift migration
To set up an Amazon Redshift transfer:
Console
Go to the BigQuery web UI in the Cloud Console.
Click Transfers.
Click Add Transfer.
On the New Transfer page:
- For Source, choose Migration: Amazon Redshift.
- For Display name, enter a name for the transfer such as
My migration
. The display name can be any value that allows you to easily identify the transfer if you need to modify it later. For Destination dataset, choose the appropriate dataset.
Under Data Source Details, continue with specific details for your Amazon Redshift transfer.
- For JDBC connection url for Amazon Redshift, provide the JDBC url to access your Amazon Redshift cluster.
- For Username of your database, enter the username for the Amazon Redshift database you'd like to migrate.
- For Password of your database, enter the database password.
- For Access key ID and Secret access key, enter the access key pair you obtained from Grant access to your S3 bucket.
- For Amazon S3 URI, enter the URI of the S3 bucket you'll use as a staging area.
- For Amazon Redshift Schema, enter the Amazon Redshift Schema you're migrating.
For Table name patterns specify a name or a pattern for matching the table names in the Schema. You can use regular expressions to specify the pattern in the form:
<table1Regex>;<table2Regex>
. The pattern should follow Java regular expression syntax.(Optional) In the Notification options section:
- Click the toggle to enable email notifications. When you enable this option, the transfer administrator receives an email notification when a transfer run fails.
For Select a Pub/Sub topic, choose your topic name or click Create a topic. This option configures Pub/Sub run notifications for your transfer.
Click Save.
The Cloud Console will display all the transfer setup details, including a Resource name for this transfer.
Classic UI
Go to the classic BigQuery web UI.
Click Transfers.
Click Add Transfer.
On the New Transfer page:
- For Source, choose Migration: Amazon Redshift.
- For Display name, enter a name for the transfer such as
My Migration
. The display name can be any value that allows you to easily identify the transfer if you need to modify it later. - For Destination dataset, choose the appropriate dataset.
- For JDBC connection url for Amazon Redshift, provide the JDBC url to access your Amazon Redshift cluster.
- For Username of your database, enter username for the Amazon Redshift database you'd like to migrate.
- For Password of your database, enter the database password.
- For Access key ID and Secret access key, Enter the access key pair you obtained from Grant access to your S3 bucket.
- For Amazon S3 URI, enter the URI of S3 bucket you'll use as a staging area.
- For Amazon Redshift Schema, enter the Amazon Redshift Schema you would like to migrate tables from.
- For Table name patterns specify a name or pattern for matching the
table names in the database schema. You can use regular expressions to
specify the pattern, in the form:
<table1Regex>;<table2Regex>
. The pattern should follow Java regular expression syntax.
(Optional) Expand the Advanced section and configure run notifications for your transfer.
- For Pub/Sub topic, enter your topic
name, for example,
projects/myproject/topics/mytopic
. - Check Send email notifications to allow email notifications of transfer run failures.
- Don't check Disabled when setting up a transfer. See Working with transfers to disable existing transfers.
- For Pub/Sub topic, enter your topic
name, for example,
Click Add.
If prompted, click Allow to give the BigQuery Data Transfer Service permission to manage your transfer. You must allow browser pop-ups from
bigquery.cloud.google.com
to view the permissions window.The web UI will display all the transfer setup details, including a Resource name for this transfer.
CLI
Enter the bq mk
command and supply the transfer creation flag
--transfer_config
. The following flags are also required:
--project_id
--data_source
--target_dataset
--display_name
--params
bq mk \ --transfer_config \ --project_id=project_id \ --data_source=data_source \ --target_dataset=dataset \ --display_name=name \ --params='parameters'
Where:
- project_id is your Google Cloud project ID. If
--project_id
isn't specified, the default project is used. - data_source is the data source:
redshift
. - dataset is the BigQuery target dataset for the transfer configuration.
- name is the display name for the transfer configuration. The transfer name can be any value that allows you to easily identify the transfer if you need to modify it later.
- parameters contains the parameters for the created transfer
configuration in JSON format. For example:
--params='{"param":"param_value"}'
.
Parameters required for an Amazon Redshift transfer configuration are:
jdbc_url
: The JDBC connection url is used to locate the Amazon Redshift cluster.database_username
: The username to access your database to unload specified tables.database_password
: The password used with the username to access your database to unload specified tables.access_key_id
: The access key ID to sign requests made to AWS.secret_access_key
: The secret access key used with the access key ID to sign requests made to AWS.s3_bucket
: The Amazon S3 URI beginning with "s3://" and specifying a prefix for temporary files to be used.redshift_schema
: The Amazon Redshift schema that contains all the tables to be migrated.table_name_patterns
: Table name patterns separated by a semicolon (;). The table pattern is a regular expression for table(s) to migrate. If not provided, all tables under the database schema will be migrated.
For example, the following command creates an Amazon Redshift transfer
named My Transfer
with a target dataset named mydataset
and a project
with the ID of google.com:myproject
.
bq mk \
--transfer_config \
--project_id=myproject \
--data_source=redshift \
--target_dataset=mydataset \
--display_name='My Transfer' \
--params='{"jdbc_url":"jdbc:postgresql://test-example-instance.sample.us-west-1.redshift.amazonaws.com:5439/dbname","database_username":"my_username","database_password":"1234567890","access_key_id":"A1B2C3D4E5F6G7H8I9J0","secret_access_key":"1234567890123456789012345678901234567890","s3_bucket":"s3://bucket/prefix","redshift_schema":"public","table_name_patterns":"table_name"}'
API
Use the projects.locations.transferConfigs.create
method and supply an instance of the TransferConfig
resource.
Quotas and limits
BigQuery has a load quota of 15 TB, per load job, per table. Internally, Amazon Redshift compresses the table data, so the exported table size will be larger than the table size reported by Amazon Redshift. If you are planning to migrate a table larger than 15 TB, please reach out to bq-dts-support@google.com first.
Note that costs can be incurred outside of Google by using this service. Please review the Amazon Redshift and Amazon S3 pricing pages for details.
Because of Amazon S3's consistency model, it's possible for some files to not be included in the transfer to BigQuery.
What's next
- Learn more about the BigQuery Data Transfer Service