Storage Transfer Service supports transfers from cloud or on-premises object storage systems that are compatible with the Amazon S3 API.
Storage Transfer Service accesses your data in S3-compatible storage using transfer agents deployed on VMs close to the data source. These agents run in a Docker container and belong to an agent pool, which is a collection of agents using the same configuration and that collectively move your data in parallel.
This feature allows you to migrate from on-premise or cloud object storage to Cloud Storage, archive data to free up on-premises storage capacity, replicate data to Google Cloud for business continuity, or transfer data to Google Cloud for analysis and processing. For customers migrating from AWS S3 to Cloud Storage, this feature gives an option to control network routes to Google Cloud, resulting in considerably lower outbound data transfer charges.
Before you begin
Before configuring your transfers, complete the following steps:
- Install the gcloud CLI.
- Satisfy the requirements for file system transfers, including installing Docker on the transfer agent machine.
Obtain source credentials
Transferring from S3-compatible storage requires an access key ID and a secret access key.
The steps to obtain these depend on your storage provider.
The account from which the ID and key are generated requires one of the following permissions:
- Read-only permission on source objects, if you don't want to delete objects at source.
- Full access to source objects, if you choose to delete objects at source as part of your transfer.
Once you've created the account, added permissions, and downloaded the access key ID and secret access key, store the ID and key in a safe place.
Configure Google Cloud permissions
Before creating a transfer, you must configure permissions for the following entities:
The user account being used to create the transfer. This is the account that is signed in to the Google Cloud console, or the account that is specified when authenticating to the `gcloud` CLI. The user account can be a regular user account, or a user-managed service account. | |
The Google-managed service account, also known as the service
agent, used by Storage Transfer Service. This account is generally identified by
its email address, which uses the format
project-PROJECT_NUMBER@storage-transfer-service.iam.gserviceaccount.com .
|
|
The transfer agent account that provides Google Cloud permissions for transfer agents. Transfer agent accounts use the credentials of the user installing them, or the credentials of a user-managed service account, to authenticate. |
See Agent-based transfer permissions for instructions.
Transfer options
The following Storage Transfer Service features are available for transfers from S3-compatible storage to Cloud Storage:
- Transfer specific files using a manifest
- You can pass a list of files for Storage Transfer Service to act on. See Transfer specific files or objects using a manifest for details.
- Specify storage class
- You can specify the
Cloud Storage storage class to use for your data in the destination
bucket. See the
StorageClass
options for REST details, or use the--custom-storage-class
flag with Google Cloud CLI.Note that any storage class settings are ignored if the destination bucket has Autoclass enabled. If Autoclass is enabled, objects transferred into the bucket are initially set to Standard storage.
- Metadata preservation
-
When transferring files from S3-compatible storage, Storage Transfer Service can optionally preserve certain attributes as custom metadata.
See the Amazon S3 or S3-compatible storage to Cloud Storage section of Metadata preservation for details on which metadata can be preserved, and how to configure your transfer.
- Logging and monitoring
- Transfers from S3-compatible storage can be viewed in Cloud Logging and Cloud Monitoring. See Cloud Logging for Storage Transfer Service and Monitor transfer jobs for details. You can also configure Pub/Sub notifications.
Create an agent pool
To create an agent pool:
Google Cloud console
In the Google Cloud console, go to the Agent pools page.
The Agent pools page is displayed, listing your existing agent pools.
Click Create another pool.
Name your pool, and optionally describe it.
You may choose to set a bandwidth limit that will apply to the pool as a whole. The specified bandwidth in MB/s will be split amongst all of the agents in the pool. See Manage network bandwidth for more information.
Click Create.
REST API
Use projects.agentPools.create:
POST https://storagetransfer.googleapis.com/v1/projects/PROJECT_ID/agentPools?agent_pool_id=AGENT_POOL_ID
Where:
PROJECT_ID
: The project ID that you're creating the agent pool in.AGENT_POOL_ID
: The agent pool ID that you are creating.
If an agent pool is stuck in the Creating
state for more than 30 minutes,
we recommend deleting the agent pool and creating it again.
Revoking required Storage Transfer Service permissions from a project while
an agent pool is in the Creating
state leads to incorrect service behavior.
gcloud CLI
To create an agent pool with the gcloud
command line tool, run
[gcloud transfer agent-pools create
][agent-pools-create].
gcloud transfer agent-pools create AGENT_POOL
Where the following options are available:
AGENT_POOL is a unique, permanent identifier for this pool.
--no-async
blocks other tasks in your terminal until the pool has been created. If not included, pool creation runs asynchronously.--bandwidth-limit
defines how much of your bandwidth in MB/s to make available to this pool's agents. A bandwidth limit applies to all agents in a pool and can help prevent the pool's transfer workload from disrupting other operations that share your bandwidth. For example, enter '50' to set a bandwidth limit of 50 MB/s. By leaving this flag unspecified, this pool's agents will use all bandwidth available to them.--display-name
is a modifiable name to help you identify this pool. You can include details that might not fit in the pool's unique full resource name.
Install transfer agents
Transfer agents are software agents that coordinate transfer activities from your source through Storage Transfer Service. They must be installed on a system with access to your source data.
gcloud CLI
To install agents to use with an S3-compatible source using the gcloud
CLI,
use the transfer agents install
command.
You must provide access credentials either as environment variables as the
values of AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
, or stored as
default credentials in your system's configuration files.
export AWS_ACCESS_KEY_ID=ID
export AWS_SECRET_ACCESS_KEY=SECRET
gcloud transfer agents install --pool=POOL_NAME
To run agents using a
service account key, use
the --creds-file
option:
gcloud transfer agents install --pool=POOL_NAME \
--creds-file=/relative/path/to/service-account-key.json
Create a transfer job
Google Cloud console
Follow these steps to create a transfer from an S3-compatible source to a Cloud Storage bucket.
Go to the Storage Transfer Service page in the Google Cloud console.
Click Create transfer job. The Create a transfer job page is displayed.
Select S3-compatible object storage as the Source type. The destination must be Google Cloud Storage.
Click Next step.
Configure your source
Specify the required information for this transfer:
Select the agent pool you configured for this transfer.
Enter the Bucket name relative to the endpoint. For example, if your data resides at:
https://example.com/bucket_a
Enter:
bucket_a
Enter the Endpoint. Do not include the protocol (
http://
orhttps://
) or the bucket name. For example:example.com
Specify any optional attributes for this transfer:
Enter the Signing region to use for signing of requests.
Choose the Signing process for this request.
Select the Addressing style. This determines whether the bucket name is provided in path-style (e.g.,
https://example.com/bucket-name/key-name
) or virtual hosted-style (e.g.,https://bucket-name.example.com/key-name
). Read Virtual hosting of buckets in the Amazon documentation for more information.Select the Network protocol.
Select the listing API version to use. Refer to the ListObjectsV2 and ListObjects documentation for more information.
Click Next step.
Configure your sink
In the Bucket or folder field, enter the destination bucket and (optionally) folder name, or click Browse to select a bucket from a list of existing buckets in your current project. To create a new bucket, click Create new bucket.
Click Next step.
Choose transfer settings
In the Description field, enter a description of the transfer. As a best practice, enter a description that is meaningful and unique so that you can tell jobs apart.
Under Metadata options, choose to use the default options, or click View and select options to specify values for all supported metadata. See Metadata preservation for details.
Under When to overwrite, select one of the following:
If different: Overwrites destination files if the source file with the same name has different Etags or checksum values.
Always: Always overwrites destination files when the source file has the same name, even if they're identical.
Under When to delete, select one of the following:
Never: Never delete files from either the source or destination.
Delete file from source after they're transferred: Delete files from the source after they're transferred to the destination.
Delete files from destination if they're not also at source: If files in the destination Cloud Storage bucket aren't also in the source, then delete the files from the Cloud Storage bucket.
This option ensures that the destination Cloud Storage bucket exactly matches your source.
Under Notification options, select your Pub/Sub topic and which events to notify for. See Pub/Sub notifications for more details.
Click Next step.
Schedule the transfer
You can schedule your transfer to run one time only, or configure a recurring transfer.
Click Create to create the transfer job.
gcloud CLI
Before using the gcloud
CLI to create a transfer, follow the instructions in
Configure access to a Cloud Storage sink.
To use the gcloud
CLI to create a transfer from an S3-compatible source to
a Cloud Storage bucket, use the following command.
gcloud transfer jobs create s3://SOURCE_BUCKET_NAME gs://SINK_BUCKET_NAME \
--source-agent-pool=POOL_NAME \
--source-endpoint=ENDPOINT \
--source-signing-region=REGION \
--source-auth-method=AWS_SIGNATURE_V2 | AWS_SIGNATURE_V4 \
--source-request-model=PATH_STYLE | VIRTUAL_HOSTED_STYLE \
--source-network-protocol=HTTP | HTTPS \
--source-list-api=LIST_OBJECTS | LIST_OBJECTS_V2
The following flags are required:
--source-agent-pool
is the name of the agent pool to use for this transfer.--source-endpoint
specifies your storage system's endpoint. For example,s3.us-east.example.com
. Check with your provider for the correct formatting. Don't include the protocol (e.g.,https://
) or the bucket name.
The remaining flags are optional:
--source-signing-region
specifies a region for signing requests. Omit this flag if your storage provider doesn't require a signing region.--source-auth-method
specifies the authentication method to use. Valid values areAWS_SIGNATURE_V2
orAWS_SIGNATURE_V4
. Refer to Amazon's SigV4 and SigV2 documentation for more information.--source-request-model
specifies the addressing style to use. Valid values arePATH_STYLE
orVIRTUAL_HOSTED_STYLE
. Path style uses the formathttps://s3.REGION.example.com/BUCKET_NAME/KEY_NAME
. Virtual hosted style uses the format `https://BUCKET_NAME.s3.REGION.example.com/KEY_NAME.--source-network-protocol
specifies the network protocol that agents should use for this job. Valid values areHTTP
orHTTPS
.--source-list-api
specifies the version of the S3 listing API for returning objects from the bucket. Valid values areLIST_OBJECTS
orLIST_OBJECTS_V2
. Refer to Amazon's ListObjectsV2 and ListObjects documentation for more information.
For additional transfer job options, run gcloud transfer jobs create --help
or refer to the gcloud
reference documentation.
REST API
Before using the REST API to create a transfer, follow the instructions in Configure access to a Cloud Storage sink.
To create a transfer from an S3-compatible source using the REST API, create a JSON object similar to the following example.
POST https://storagetransfer.googleapis.com/v1/transferJobs
{
...
"transferSpec": {
"source_agent_pool_name":"POOL_NAME",
"awsS3CompatibleData": {
"region":"us-east-1",
"s3Metadata":{
"protocol": "NETWORK_PROTOCOL_HTTPS",
"requestModel": "REQUEST_MODEL_VIRTUAL_HOSTED_STYLE",
"authMethod": "AUTH_METHOD_AWS_SIGNATURE_V4"
},
"endpoint": "example.com",
"bucketName": "BUCKET_NAME",
"path": "PATH",
},
"gcsDataSink": {
"bucketName": "SINK_NAME",
"path": "SINK_PATH"
},
"transferOptions": {
"deleteObjectsFromSourceAfterTransfer": false
}
}
}
See the AwsS3CompatibleData
API reference for field descriptions.
Client libraries
Before using the client libraries to create a transfer, follow the instructions in Configure access to a Cloud Storage sink.
Go
To learn how to install and use the client library for Storage Transfer Service, see Storage Transfer Service client libraries. For more information, see the Storage Transfer Service Go API reference documentation.
To authenticate to Storage Transfer Service, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Storage Transfer Service, see Storage Transfer Service client libraries. For more information, see the Storage Transfer Service Java API reference documentation.
To authenticate to Storage Transfer Service, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Storage Transfer Service, see Storage Transfer Service client libraries. For more information, see the Storage Transfer Service Node.js API reference documentation.
To authenticate to Storage Transfer Service, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Storage Transfer Service, see Storage Transfer Service client libraries. For more information, see the Storage Transfer Service Python API reference documentation.
To authenticate to Storage Transfer Service, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Frequently asked questions
Is there a cost to transfer from S3-compatible storage?
Transfers from S3-compatible storage do not incur the "Storage Transfer Service transfers requiring agents" fee. See Pricing for any other fees that may be incurred. You may also incur outbound data transfer and operational charges from your source cloud provider.
Is Cloud Logging supported for S3-compatible storage transfers?
Yes, you can enable Cloud Logging for your transfers by following the instructions in Cloud Logging for Storage Transfer Service.
Are transfers using a manifest supported?
Yes, manifest files are supported for S3-compatible transfers.
If I add an object to the source bucket after the job has started, is that object transferred?
Storage Transfer Service performs a list operation on the source bucket to compute the diff from the destination. If the list operation has already completed when the new object is added, that object is skipped until the next transfer.
Does Storage Transfer Service perform checksum matching on S3-compatible sources?
Storage Transfer Service relies on checksum data being returned by the source. For S3-compatible storage, Storage Transfer Service expects the object's Etag to be the MD5 hash of the object.
However, any objects that were transferred to S3-compatible storage using S3 multipart upload do not have MD5 ETags. In this case, Storage Transfer Service uses the file size to validate the transferred object.
What throughput can be achieved for transfers from S3-compatible storage?
Your transfer throughput can be scaled by adding more transfer agents. We recommend using 3 agents for fault tolerance and to fill a <10Gbps pipe. To scale more, add more agents. Agents can be added and removed while a transfer is in process.
Where should transfer agents be deployed to transfer data from Amazon S3 to Cloud Storage?
You can install agents in Amazon EC2 or EKS within the same region as your bucket. You can also run agents on Google Cloud in the nearest region.