Jump to Content
Developers & Practitioners

Transfer data from AWS to GCP using Storage Transfer Service

January 13, 2023
Amrutha Singh

Cloud Infrastructure Engineer, Google Cloud

Overview

Storage Transfer Service enables users to quickly and securely transfer data to, from, and between object and file storage systems, including Google’s Cloud Storage, Amazon S3, Azure Blob Storage, and on-premises data. See the matrix of supported sources and sinks in the Storage Transfer Service documentation.

This blog walks you through the process of transferring data from AWS S3 to Google’s Cloud Storage in a secure manner using identity federation. 

Identity Federation creates a trust relationship between Google Cloud and AWS. It allows you to access resources directly, using a short-lived access token, and eliminates the maintenance and security burden associated with long-term credentials such as the service account keys. Using Identity federation, you do not have to worry about rotating keys or explicitly revoking the keys when Storage Transfer Service is not in use.

Steps to configure storage transfer job to transfer data from AWS S3 to GCS

This section walks you through the process to set up infrastructure to transfer data from Amazon Web Services to Google Cloud securely. 

Configurations on Google Cloud 

Enable the Storage Transfer API under APIs and Services.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1._Enable_STS.max-800x800.png

Open Cloud Shell in the Google Cloud project that you want to configure the transfer job in

Run gcloud auth print-access-token to generate the Authorization: Bearer token which will be used in the next step

Run the following command in the cloud shell to generate the service account:

Loading...

Replace project number, project ID and token.

The output of this command will be in the format:

Loading...

NOTE: Make a note of the subjectId as this will be used in the AWS IAM role trust relationship policy.

Create a Cloud Storage bucket in Google Cloud

Give the service account “accountEmail” that you generated in the previous step the following IAM permissions: 

Configurations on Amazon Web Services

In the AWS IAM console, create an IAM policy with the following:

Loading...

NOTE: This policy can be further restricted to a single S3 bucket. s3:GetBucketLocation permission will be needed to fetch the object location.

After the policy is created, head to IAM roles tab and follow the steps below: 

https://storage.googleapis.com/gweb-cloudblog-publish/images/2._AWS_Web_Identity.max-2200x2200.png

In the next step, to add permissions, Select the IAM policy that you created in step 1

Update the role name and Create the role

In the IAM Role console, Select the role you created and Click on the Trust relationships tab

Click Edit Trust Policy, update the following and update the policy:

Loading...

https://storage.googleapis.com/gweb-cloudblog-publish/images/3._Edit_Trust_Policy.max-1600x1600.png

Create an S3 bucket on AWS with objects to be transferred to Google Cloud

Storage transfer job configuration

Head to the Storage Transfer Service (STS) on the Google Cloud project

Select Create a transfer job

Source - Amazon S3, Destination - Google’s Cloud Storage

In the next step, enter the S3 bucket name and AWS IAM role ARN

https://storage.googleapis.com/gweb-cloudblog-publish/images/4._Create_STS_transfer_job.max-1400x1400.png

Next, select the Cloud Storage bucket

Next, choose the settings that works best for your use case

https://storage.googleapis.com/gweb-cloudblog-publish/images/5._Scheduling_options.max-1100x1100.png

Select the schedule and create the STS job.

Notice that the objects from the S3 bucket are being copied over to the GCS bucket. When you run it again, Storage Transfer Service does incremental transfers, skipping the data that was already copied.

https://storage.googleapis.com/gweb-cloudblog-publish/images/6._STS_run.max-1500x1500.png

Custom scheduler for Storage Transfer Service

STS currently supports a minimum sync schedule of 1 hour. Triggering a Cloud scheduler via Cloud Functions is a work around technique to reduce the sync schedule to minutes/custom schedule.

https://storage.googleapis.com/gweb-cloudblog-publish/images/unnamed_25_Z2ukcik.max-600x600.png

Event-driven STS for Cloud Storage

Storage Transfer Service now offers event-driven transfer, a serverless and easy-to-use replication service. STS can listen to event notifications in AWS or Google Cloud to automatically transfer data that has been added or updated in the source location. Event-driven transfers are supported from AWS S3 or Cloud Storage to Cloud Storage.

This feature is a good fit for use cases where you have a changing data source (e.g., new object insertion) that needs to be replicated to the destination in a matter of minutes.

You can trigger an event driven replication from AWS S3 to Google Cloud for ongoing data analytics and/or machine learning. 

Event driven configuration on Storage Transfer Service:

https://storage.googleapis.com/gweb-cloudblog-publish/images/7._Event_driven_STS.max-800x800.png

The image below indicates a new file being transferred from AWS S3 to Cloud Storage via event driven STS:

https://storage.googleapis.com/gweb-cloudblog-publish/images/8._STS_event_driven_transfer.max-1700x1700.png

Enabling AWS Event Notifications for SQS

On AWS S3: 

Go to the bucket “Properties” tab and create “Event Notification”

Select All Object Create Events 

Update the SQS ARN

Add SQS and S3 permissions to the IAM role being configured in the Storage Transfer Job

https://storage.googleapis.com/gweb-cloudblog-publish/images/9._S3_event_notification.max-1400x1400.png
https://storage.googleapis.com/gweb-cloudblog-publish/images/Event_notification_high_res_2.max-1100x1100.png

Once the setup is complete, the S3 bucket should be enabled to deliver notifications to the configured SQS queue. And the configured role should be able to access both SQS queue and S3 bucket for event-driven transfers.

On AWS SQS:

Create SQS queue for event driven transfers

In Access Policy, select “Advance” and add the sample policy below. This grants Amazon S3 permissions to publish messages to the SQS queue. 

Loading...

Note: Replace SQS ARN, Source account number and S3 bucket ARN 

Make a note of the SQS ARN to configure it in the Storage Transfer Service Event Driven tab.

https://storage.googleapis.com/gweb-cloudblog-publish/images/11._S3_event_notification_config.max-1300x1300.png

Create a Storage transfer job and observe your objects being replicated from AWS S3 to GCS bucket.

This completes the event driven set up for Storage Transfer Service.

Summary

In this blog, we used Storage Transfer Service to securely transfer data from AWS S3 to Google’s Cloud Storage. We also discussed the event-driven STS feature that can listen to event notifications in AWS to automatically transfer data that has been added or updated in the source location.

If you would like to learn more, check out the Storage Transfer Service documentation.

Posted in