Migrating Amazon Redshift data with VPC

Overview

This document explains how to migrate data from Amazon Redshift to BigQuery using a virtual private cloud (VPC) network.

If you'd like to transfer data from your Redshift instance through public IPs, you can migrate your Redshift data to BigQuery with these instructions.

If you have a private Amazon Redshift instance in AWS, you can migrate that data to BigQuery by using VPC peering. To enable this feature, you will specify the VPC and reserved IP range when setting up the migration.

  • You will need to set up a virtual private network (VPN) between the Amazon Redshift VPC network and the GCP VPC network.
  • Through the VPN, the migration agent running in the GCP VPC will trigger an unload operation from Amazon Redshift to a staging area in an Amazon S3 bucket
  • Then the BigQuery Data Transfer Service transfers your data from the Amazon S3 bucket to BigQuery.

The following diagram shows the VPC communications and the overall flow of data between a private Amazon Redshift instance and BigQuery during a migration.

Before you begin

This section outlines the step-by-step process of setting up a data migration from a private Amazon Redshift instance to BigQuery. The steps are:

  • Google Cloud requirements: Meet the prerequisites and set permissions on Google Cloud.
  • Set up a VPN between Google Cloud and Amazon Redshift.
  • Grant access to your Amazon Redshift cluster.
  • Grant access to your Amazon S3 bucket you'll use to temporarily stage data. Take note of the access key pair, for use in a later step.
  • Set up the migration with the BigQuery Data Transfer Service. You will need:
    • The VPC and reserved IP range in Amazon Redshift.
    • The Amazon Redshift JDBC url. Follow these instructions to obtain the JDBC url.
    • The username and password of your Amazon Redshift database.
    • The AWS access key pair you will obtain from the step: Grant access to your S3 bucket.
    • The URI of the Amazon S3 bucket. We recommend that you set up a Lifecycle policy for this bucket to avoid unnecessary charges. The recommended expiration time is 24 hours to allow sufficient time to transfer all data to BigQuery.

Required permissions

Before creating an Amazon Redshift transfer:

  1. Ensure that the person creating the transfer has the following required permissions in BigQuery:

    • bigquery.transfers.update permissions to create the transfer
    • bigquery.datasets.update permissions on the target dataset

    The bigquery.admin predefined Cloud IAM role includes bigquery.transfers.update and bigquery.datasets.update permissions. For more information on Cloud IAM roles in BigQuery Data Transfer Service, see Access control reference.

  2. Consult the documentation for Amazon S3 to ensure you have configured any permissions necessary to enable the transfer. At a minimum, the Amazon S3 source data must have the AWS managed policy AmazonS3ReadOnlyAccess applied to it.

  3. To build VPC peering, the service will use the Google Cloud user credentials of the individual setting up the transfer. Ensure that the person creating the transfer has the necessary permissions to create the VPC peering connection by granting the appropriate IAM permissions for creating and deleting VPC Network Peering.

  • Permissions to create VPC peering: compute.networks.addPeering

    • Permissions to delete VPC peering: compute.networks.removePeering

    The project.owner, project.editor and network.admin predefined Cloud IAM roles include the compute.networks.addPeering and compute.networks.removePeering permissions by default.

Google Cloud requirements

Follow the standard Amazon Redshift migration instructions to meet the Google Cloud requirements.

Set up the VPN

  1. Set up a Google Cloud VPC network in your Google Cloud project.

  2. Set up the VPN. Follow the instructions in this guide to set up a VPN between your Google Cloud project's VPC network and the Amazon Redshift VPC. Caution: The service uses your VPC network's name as the VPC peering connection name, so ensure there aren't any existing VPC peering connections already using that name.

  3. Grant permissions to do VPC Peering on Google Cloud. Ensure that you have the necessary permissions to create the VPC peering connection. See Required permissions.

  4. Before continuing, ensure your Google Cloud VPC network exists in your Google Cloud project, and it is already connected to Redshift through VPN.

Grant access to your Amazon Redshift cluster

Follow the instructions from Amazon to whitelist the IP ranges of your private Amazon Redshift cluster. In a later step, you will define the private IP range in this VPC network, when you set up the transfer.

Grant access to your Amazon S3 bucket

Follow the standard Amazon Redshift migration instructions to grant access to your Amazon S3 bucket.

Optional: workload control with a separate migration queue

You can define an Amazon Redshift queue for migration purposes to limit and separate the resources used for migration. This migration queue can be configured with a max concurrency query count. You can then associate a certain migration user group with the queue, and use those credentials when setting up the migration to transfer data to BigQuery. The transfer service will only have access to the migration queue.

Setting up an Amazon Redshift transfer

Follow the standard Amazon Redshift migration instructions to set up an Amazon Redshift transfer, with the following difference for private Amazon Redshift instances:

  • In the transfer setup's field for the JDBC connection url, you will enter the VPC and the reserved IP range field of the private Amazon Redshift instance.
  • If you don't provide these, the transfer configuration will fall back to a standard Amazon Redshift migration.

To enter the VPC and reserved IP range:

  1. In the VPC and reserved IP range field, specify your VPC network name and the expected private IP range as a CIDR block for provisioning migration infrastructure.

    Amazon Redshift migration CIDR field

    • The form is VPC_network_name:CIDR, for example: my_vpc:10.251.1.0/24.
    • Use standard private VPC network address ranges in the CIDR notation, starting with 10.x.x.x.
    • The private IP Range is for provisioning migration infrastructure, so ensure that:
      • The IP range is wide enough (has more than 10 IP addresses)
      • The IP range does not overlap with any subnet in your Google Cloud VPC network or the Amazon Redshift VPC network.
    • If you have multiple transfers configured for the same Amazon Redshift instance, make sure to use the same VPC_network_name:CIDR value in each, so that multiple transfers can reuse the same migration infrastructure.

Quotas and limits

Migrating Amazon Redshift private instances with VPC runs migration agents on a single tenant infrastructure. Due to computation resource limits, at most 5 concurrent transfer runs are allowed.

The same quotas and limits as for standard Migrations from Amazon Redshift apply.

What's next