Cloud Storage Transfer Service enables you to import large amounts of online data into Google Cloud Storage quickly and cost-effectively. To use Cloud Storage Transfer Service, you set up a transfer from a data source to data sink. Data sources can be an Amazon Simple Storage Service (Amazon S3) bucket, an HTTP/HTTPS location, or another Google Cloud Storage bucket. Data sinks are always a Google Cloud Storage bucket.
Example uses of Cloud Storage Transfer Service include:
Backing up data to a Google Cloud Storage bucket from other storage providers.
Moving data from a Standard Storage bucket to a Nearline Storage bucket to lower your storage costs.
Cloud Storage Transfer Service has options that make data transfers and synchronization between data sources and data sinks easier. For example, you can:
Schedule one-time transfers or recurring transfers.
Schedule periodic synchronization from data source to data sink with advanced filters based on file creation dates, file-name filters, and the times of day you prefer to import data.
Delete source objects after transferring them.
There are a number of ways you can work with Cloud Storage Transfer Service:
Use the Google Developers Console UI to create and manage transfers. This is often the easiest and quickest way to get started with Cloud Storage Transfer Service. For more information, see Using the Developers Console.
Use a Google APIs Client Library in a language of your choice. See the Developer's Guide.
If you are comfortable with REST interfaces, you can work directly with the Storage Transfer Service API. See Creating A Client for information about enabling the API and getting authentication tokens to use in your requests.
You can use
gsutil, a flexible command-line tool for working with
Google Cloud Storage. With
gsutil you can also work with Amazon S3 buckets, and
transfer data from Amazon S3 to Google Cloud Storage using a daisy chain approach. For
more information, see the help for the
-D flag of the
For less than 1 TB of data to transfer, you should use
gsutil. For greater
than 10 TB of data to transfer, you should use Storage Transfer Service. For values
between 1 and 10 TB, either tool is appropriate. Use this guidance as a
starting point, but keep in mind that the specifics of your transfer scenario
will determine which tool is more appropriate.
The rest of this page discusses concepts that are independent of how you choose to work with Cloud Storage Transfer Service.
Cloud Storage Transfer Service uses a Google Service Account to access Google Cloud Storage buckets in your Google Developers Console project. For example, a service account looks like
<unique-ID> is specific to your project.
The service account must be granted write access to any bucket that is designated as a data sink in a transfer. For information about how to add the service to access your bucket, see Setting Bucket Permissions.
Depending on your transfer data source, take one of the following actions.
- Google Cloud Storage bucket
If your source data is another Google Cloud Storage bucket and the transfer does not delete from the source, then grant the service account the
Can Viewrole. If the transfer deletes from the source, then grant the service account the
Can Editrole. For more information, see Project Members and Permissions.
- Amazon S3 bucket
If your source data is an Amazon S3 bucket, then set up an AWS Identity and Access Management (IAM) user so that you give the user the ability to list the Amazon S3 bucket, get the location of the bucket, and read the objects in the bucket. You must configure the AWS IAM user using Amazon S3 tools such as the AWS Management Console. For more information, see Creating an IAM User in Your AWS Account and Bucket Policy Examples.
The name you choose for an IAM user is not the service account described above, which is only used in Google Cloud Storage. We recommend you choose an IAM user name that you'll recognize like "transfer-user" and ensure the name follows the IAM User name guidelines (see Limitations on IAM Entities and Objects).
These AWS IAM user credentials (access/secret key) must be entered into Storage Transfer Service when you configure a transfer from Amazon S3. We recommend that you create an access/secret key pair for each transfer or one for a group of transfers at most. Avoid using an access/secret key pair that can access all resources of the AWS account.
- HTTP URLs
If your source data is a list of HTTP URLs, then the objects that the URLs point to must allow public access.
After setting up a transfer, if you change the default ACLs on sink or source objects or your change the service account's project role, then Storage Transfer Service may not be able to perform some job operations.
Transferring data from URLs
You can use Storage Transfer Service to transfer data from public data locations to a
Google Cloud Storage bucket. Each location represents an object that can be identified
by a URL using either the
https scheme. All locations to be used
for a transfer are combined into a list that itself can be referenced by a URL.
When you configure your transfer, specify the URL path that points to the list.
The URL list must be a tab-separated values (TSV) file with the following format:
The first line is the format specifier, "TsvHttpData-1.0".
The rest of the file consists of one or more lines, one line for each object to transfer. Each line must have the following tab-separated fields:
- HTTP/HTTPS URL of a source object.
- The size of the object in bytes.
- The base64-encoded MD5 hash of the object.
An example TSV file you can use for a transfer is:
TsvHttpData-1.0 https://example.com/buckets/obj1 1357 wHENa08V36iPYAsOa2JAdw== https://example.com/buckets/obj2 2468 R9acAaveoPd2y8nniLUYbw==
When transferring data based on a URL list, keep the following in mind:
When an object located at
http(s)://hostname:port/<URL-path>is transferred to a data sink, the name of the object at the data sink is
If the specified size of an object does not match the actual size of the object fetched, the object will not be transferred.
If the specified MD5 does not match the MD5 computed from the transferred bytes, the object transfer will fail. For more information, see Generating MD5 hashes.
Ensure that each URL you specify is publicly accessible. For example, in Google Cloud Storage you can share an object publicly and get a link to it.
Storage Transfer Service obeys
robots.txtrules and requires the source HTTP server to support
Rangerequests and to return a
Content-Lengthheader in each response.
Object conditions (prefixes and modification times) have no effect when filtering objects to transfer.
Generating MD5 hashes
When transferring data from public locations defined by URLs, you must provide an MD5 hash for each object transferred. The following public object is a reference object you can use to test your MD5 algorithm:
This object has a base64-encoded MD5 hash of "BfnRTwvHpofMOn2Pq7EVyQ==".
Assuming you copy the object above to a local file
md5-test, you can
calculate the hash, for example, using OpenSSL:
openssl md5 -binary md5-test | openssl enc -base64
Charges related to using Storage Transfer Service include:
- When transferring data from an external source into Google Cloud Storage, you may incur egress
and operation charges based on the pricing policy of the source provider. For example, when moving
data from Amazon Simple Storage Service (Amazon S3) to Google Cloud Storage, the pricing in the Amazon S3 Pricing page applies for requests and
data transferred out.
- When transferring data from one Google Cloud Storage bucket to another,
you may incur transfer charges for transferring between buckets
in different locations. For more information see
Network Pricing. In addition,
early deletion from Google Cloud Storage Nearline will incur a cost. For more information see
Google Cloud Storage Nearline pricing.
- When you use Storage Transfer Service, operation charges apply for managing objects in buckets both in Google Cloud Storage and storage providers external to Google. For example, a transfer operation from an external provider into Google Cloud Storage might need to list bucket contents in both the source and destination locations. For more information, see Operation Pricing and the appropriate pricing page for the source provider.