Migrating data from Amazon Redshift

Overview

This document describes the process of migrating data from Amazon Redshift to BigQuery through public IPs.

If you'd like to transfer data from your Redshift instance through a virtual private cloud (VPC), on private IP addresses, see Migrating Amazon Redshift data with VPC.

The BigQuery Data Transfer Service allows you to copy your data from an Amazon Redshift data warehouse to BigQuery. The service will engage migration agents in GKE and trigger an unload operation from Amazon Redshift to a staging area in an Amazon S3 bucket. Then the BigQuery Data Transfer Service transfers your data from the Amazon S3 bucket to BigQuery.

This diagram shows the overall flow of data between an Amazon Redshift data warehouse and BigQuery during a migration.

Before you begin

This section outlines the step-by-step process of setting up a data migration from Amazon Redshift to BigQuery. The steps are:

  • Google Cloud requirements: meet the prerequisites and set permissions on Google Cloud.
  • Grant access to your Amazon Redshift cluster.
  • Grant access to your Amazon S3 bucket you'll use to temporarily stage data. Take note of the access key pair, for use in a later step.
  • Set up the migration with the BigQuery Data Transfer Service. You will need:
    • The Amazon Redshift JDBC url. Follow these instructions to obtain the JDBC url.
    • The username and password of your Amazon Redshift database.
    • The AWS access key pair you will obtain from the step: Grant access to your S3 bucket.
    • The URI of the Amazon S3 bucket. We recommend that you set up a Lifecycle policy for this bucket to avoid unnecessary charges. The recommended expiration time is 24 hours to allow sufficient time to transfer all data to BigQuery.

Required permissions

Before creating an Amazon Redshift transfer:

  1. Ensure that the person creating the transfer has the following required permissions in BigQuery:

    • bigquery.transfers.update permissions to create the transfer
    • bigquery.datasets.update permissions on the target dataset

    The bigquery.admin predefined IAM role includes bigquery.transfers.update and bigquery.datasets.update permissions. For more information on IAM roles in BigQuery Data Transfer Service, see Access control reference.

  2. Consult the documentation for Amazon S3 to ensure you have configured any permissions necessary to enable the transfer. At a minimum, the Amazon S3 source data must have the AWS managed policy AmazonS3ReadOnlyAccess applied to it.

Google Cloud requirements

To ensure a successful Amazon Redshift data warehouse migration, make sure you have met the following prerequisites on Google Cloud.

  1. Choose or create a Google Cloud project to store your migration data.

    • In the Cloud Console, go to the project selector page.

      Go to the project selector page

    • Select or create a Cloud project.

  2. Enable the BigQuery Data Transfer Service API.

    In the Google Cloud Console, click the Enable button on the BigQuery Data Transfer Service API page.

    Enable API

    BigQuery is automatically enabled in new projects. For an existing project, you may need to enable the BigQuery API. A green checkmark indicates that you've already enabled the API.

    Enabled API

  3. Create a BigQuery dataset to store your data. You do not need to create any tables.

Grant access to your Amazon Redshift cluster

Follow the instructions from Amazon to allowlist the following IP addresses. You can allowlist the IP addresses that correspond to your dataset's location, or you can allowlist all of the IP addresses in the table below. These Google-owned IP addresses are reserved for Amazon Redshift data migrations.

Regional locations

Region description Region name IP addresses
Americas
Las Vegas us-west4 34.125.53.201
34.125.69.174
34.125.159.85
34.125.152.1
34.125.195.166
34.125.50.249
34.125.68.55
34.125.91.116
Los Angeles us-west2 35.236.59.167
34.94.132.139
34.94.207.21
34.94.81.187
34.94.88.122
35.235.101.187
34.94.238.66
34.94.195.77
Montréal northamerica-northeast1 34.95.20.253
35.203.31.219
34.95.22.233
34.95.27.99
35.203.12.23
35.203.39.46
35.203.116.49
35.203.104.223
Northern Virginia us-east4 35.245.95.250
35.245.126.228
35.236.225.172
35.245.86.140
35.199.31.35
35.199.19.115
35.230.167.48
35.245.128.132
35.245.111.126
35.236.209.21
Oregon us-west1 35.197.117.207
35.197.117.207
35.197.86.233
34.82.155.140
35.247.28.48
35.247.31.246
35.247.106.13
34.105.85.54
Salt Lake City us-west3 34.106.37.58
34.106.85.113
34.106.28.153
34.106.64.121
34.106.246.131
34.106.56.150
34.106.41.31
34.106.182.92
São Paolo southamerica-east1 35.199.88.228
34.95.169.140
35.198.53.30
34.95.144.215
35.247.250.120
35.247.255.158
34.95.231.121
35.198.8.157
South Carolina us-east1 35.196.207.183
35.237.231.98
104.196.102.222
35.231.13.201
34.75.129.215
34.75.127.9
35.229.36.137
35.237.91.139
Europe
Belgium europe-west1 35.240.36.149
35.205.171.56
34.76.234.4
35.205.38.234
34.77.237.73
35.195.107.238
35.195.52.87
34.76.102.189
Finland europe-north1 35.228.35.94
35.228.183.156
35.228.211.18
35.228.146.84
35.228.103.114
35.228.53.184
35.228.203.85
35.228.183.138
Frankfurt europe-west3 35.246.153.144
35.198.80.78
35.246.181.106
35.246.211.135
34.89.165.108
35.198.68.187
35.242.223.6
34.89.137.180
London europe-west2 35.189.119.113
35.189.101.107
35.189.69.131
35.197.205.93
35.189.121.178
35.189.121.41
35.189.85.30
35.197.195.192
Netherlands europe-west4 35.204.237.173
35.204.18.163
34.91.86.224
34.90.184.136
34.91.115.67
34.90.218.6
34.91.147.143
34.91.253.1
Zürich europe-west6 34.65.205.160
34.65.121.140
34.65.196.143
34.65.9.133
34.65.156.193
34.65.216.124
34.65.233.83
34.65.168.250
Asia Pacific
Hong Kong asia-east2 34.92.245.180
35.241.116.105
35.220.240.216
35.220.188.244
34.92.196.78
34.92.165.209
35.220.193.228
34.96.153.178
Jakarta asia-southeast2 34.101.79.105
34.101.129.32
34.101.244.197
34.101.100.180
34.101.109.205
34.101.185.189
34.101.179.27
34.101.197.251
Mumbai asia-south1 34.93.67.112
35.244.0.1
35.200.245.13
35.200.203.161
34.93.209.130
34.93.120.224
35.244.10.12
35.200.186.100
Osaka asia-northeast2 34.97.94.51
34.97.118.176
34.97.63.76
34.97.159.156
34.97.113.218
34.97.4.108
34.97.119.140
34.97.30.191
Seoul asia-northeast3 34.64.152.215
34.64.140.241
34.64.133.199
34.64.174.192
34.64.145.219
34.64.136.56
34.64.247.158
34.64.135.220
Singapore asia-southeast1 34.87.12.235
34.87.63.5
34.87.91.51
35.198.197.191
35.240.253.175
35.247.165.193
35.247.181.82
35.247.189.103
Sydney australia-southeast1 35.189.33.150
35.189.38.5
35.189.29.88
35.189.22.179
35.189.20.163
35.189.29.83
35.189.31.141
35.189.14.219
Taiwan asia-east1 35.221.201.20
35.194.177.253
34.80.17.79
34.80.178.20
34.80.174.198
35.201.132.11
35.201.223.177
35.229.251.28
35.185.155.147
35.194.232.172
Tokyo asia-northeast1 34.85.11.246
34.85.30.58
34.85.8.125
34.85.38.59
34.85.31.67
34.85.36.143
34.85.32.222
34.85.18.128
34.85.23.202
34.85.35.192

Multi-regional locations

Multi-region description Multi-region name IP addresses
Data centers within member states of the European Union1 EU 34.76.156.158
34.76.156.172
34.76.136.146
34.76.1.29
34.76.156.232
34.76.156.81
34.76.156.246
34.76.102.206
34.76.129.246
34.76.121.168
Data centers in the United States US 35.185.196.212
35.197.102.120
35.185.224.10
35.185.228.170
35.197.5.235
35.185.206.139
35.197.67.234
35.197.38.65
35.185.202.229
35.185.200.120

1 Data located in the EU multi-region is not stored in the europe-west2 (London) or europe-west6 (Zürich) data centers.

Grant access to your Amazon S3 bucket

You will need an S3 bucket to use as a staging area, for transferring the Amazon Redshift data to BigQuery. Detailed instructions from Amazon can be found here.

  1. We recommended you create a dedicated Amazon IAM user, and grant that user only Read access to Redshift and Read and Write access to S3. This can be achieved by applying the following existing policies:

    Redshift migration Amazon permissions

  2. Create an Amazon IAM user access key pair.

Optional: workload control with a separate migration queue

You can define an Amazon Redshift queue for migration purposes to limit and separate the resources used for migration. This migration queue can be configured with a max concurrency query count. You can then associate a certain migration user group with the queue, and use those credentials when setting up the migration to transfer data to BigQuery. The transfer service will only have access to the migration queue.

Setting up an Amazon Redshift transfer

To set up an Amazon Redshift transfer:

Console

  1. Go to the BigQuery page in the Cloud Console.

    Go to the BigQuery page

  2. Click Transfers.

  3. Click Add Transfer.

  4. On the New Transfer page:

    • For Source, choose Migration: Amazon Redshift.
    • For Display name, enter a name for the transfer such as My migration. The display name can be any value that allows you to easily identify the transfer if you need to modify it later.
    • For Destination dataset, choose the appropriate dataset.

      New Amazon Redshift migration general

  5. Under Data Source Details, continue with specific details for your Amazon Redshift transfer.

    • For JDBC connection url for Amazon Redshift, provide the JDBC url to access your Amazon Redshift cluster.
    • For Username of your database, enter the username for the Amazon Redshift database you'd like to migrate.
    • For Password of your database, enter the database password.
    • For Access key ID and Secret access key, enter the access key pair you obtained from Grant access to your S3 bucket.
    • For Amazon S3 URI, enter the URI of the S3 bucket you'll use as a staging area.
    • For Amazon Redshift Schema, enter the Amazon Redshift Schema you're migrating.
    • For Table name patterns specify a name or a pattern for matching the table names in the Schema. You can use regular expressions to specify the pattern in the form: <table1Regex>;<table2Regex>. The pattern should follow Java regular expression syntax.

      New Amazon Redshift migration data source details

    • (Optional) In the Notification options section:

      • Click the toggle to enable email notifications. When you enable this option, the transfer administrator receives an email notification when a transfer run fails.
      • For Select a Pub/Sub topic, choose your topic name or click Create a topic. This option configures Pub/Sub run notifications for your transfer.

        Pub/Sub topic

  6. Click Save.

  7. The Cloud Console will display all the transfer setup details, including a Resource name for this transfer.

    Transfer confirmation

bq

Enter the bq mk command and supply the transfer creation flag --transfer_config. The following flags are also required:

  • --project_id
  • --data_source
  • --target_dataset
  • --display_name
  • --params
bq mk \
--transfer_config \
--project_id=project_id \
--data_source=data_source \
--target_dataset=dataset \
--display_name=name \
--params='parameters'

Where:

  • project_id is your Google Cloud project ID. If --project_id isn't specified, the default project is used.
  • data_source is the data source: redshift.
  • dataset is the BigQuery target dataset for the transfer configuration.
  • name is the display name for the transfer configuration. The transfer name can be any value that allows you to easily identify the transfer if you need to modify it later.
  • parameters contains the parameters for the created transfer configuration in JSON format. For example: --params='{"param":"param_value"}'.

Parameters required for an Amazon Redshift transfer configuration are:

  • jdbc_url: The JDBC connection url is used to locate the Amazon Redshift cluster.
  • database_username: The username to access your database to unload specified tables.
  • database_password: The password used with the username to access your database to unload specified tables.
  • access_key_id: The access key ID to sign requests made to AWS.
  • secret_access_key: The secret access key used with the access key ID to sign requests made to AWS.
  • s3_bucket: The Amazon S3 URI beginning with "s3://" and specifying a prefix for temporary files to be used.
  • redshift_schema: The Amazon Redshift schema that contains all the tables to be migrated.
  • table_name_patterns: Table name patterns separated by a semicolon (;). The table pattern is a regular expression for table(s) to migrate. If not provided, all tables under the database schema will be migrated.

For example, the following command creates an Amazon Redshift transfer named My Transfer with a target dataset named mydataset and a project with the ID of google.com:myproject.

bq mk \
--transfer_config \
--project_id=myproject \
--data_source=redshift \
--target_dataset=mydataset \
--display_name='My Transfer' \
--params='{"jdbc_url":"jdbc:postgresql://test-example-instance.sample.us-west-1.redshift.amazonaws.com:5439/dbname","database_username":"my_username","database_password":"1234567890","access_key_id":"A1B2C3D4E5F6G7H8I9J0","secret_access_key":"1234567890123456789012345678901234567890","s3_bucket":"s3://bucket/prefix","redshift_schema":"public","table_name_patterns":"table_name"}'

API

Use the projects.locations.transferConfigs.create method and supply an instance of the TransferConfig resource.

Quotas and limits

BigQuery has a load quota of 15 TB, per load job, per table. Internally, Amazon Redshift compresses the table data, so the exported table size will be larger than the table size reported by Amazon Redshift. If you are planning to migrate a table larger than 15 TB, please reach out to bq-dts-support@google.com first.

Note that costs can be incurred outside of Google by using this service. Please review the Amazon Redshift and Amazon S3 pricing pages for details.

Because of Amazon S3's consistency model, it's possible for some files to not be included in the transfer to BigQuery.

What's next