Storage Transfer Service can be used to transfer large amounts of data between Cloud Storage buckets, either within the same Google Cloud project, or between different projects.
Bucket migrations are useful in a number of scenarios. They can be used to consolidate data from separate projects, to move data into a backup location, or to change the location of your data.
When to use Storage Transfer Service
Google Cloud offers multiple options to transfer data between Cloud Storage buckets. We recommend the following guidelines:
Transferring less than 1 TB: Use
gsutil
orgcloud
. For instructions, refer to Move and rename buckets.Transferring more than 1 TB: Use Storage Transfer Service. Storage Transfer Service is a managed transfer option that provides out of the box security, reliability, and performance. It eliminates the need to optimize and maintain scripts, and handle retries.
This guide discusses best practices when transferring data between Cloud Storage buckets using Storage Transfer Service.
Create the destination bucket
Before beginning the transfer, create a storage bucket. See Location considerations for help choosing an appropriate bucket location.
You may wish to copy over some of the bucket metadata when you create the new bucket. See Get bucket information to learn how to display the source bucket's metadata, so that you can apply the same settings to your new bucket.
Preserve the bucket name
Cloud Storage bucket names must be globally unique; this means that you can not create a new bucket with the same name as your old bucket. If you'd like your data to end up in a bucket with the same name as the source bucket, you'll need to do two transfers:
- First, transfer data from your source bucket (
bucketA
) to a temporary bucket (bucketTemp
). - Delete the original
bucketA
, then immediately create a new bucket namedbucketA
. - Transfer data from
bucketTemp
to the newbucketA
. - Delete
bucketTemp
.
Tip: Create the new bucket immediately after deleting the original bucket to ensure that the name remains available to you.
Copy the objects to destination
To copy objects from source bucket to destination, select an interface from the tabs below.
Google Cloud console
Use the Cloud Storage Transfer Service from within Google Cloud console:
- If you don't have a destination bucket yet, create the bucket.
- Open the Transfer page in the Google Cloud console.
- Click Create transfer job.
Follow the step-by-step walkthrough, clicking Next step as you complete each step:
Choose a source: Use Google Cloud Storage bucket as your source type, and either enter the name of the wanted bucket directly, or click Browse to find and select the bucket you want.
Choose a destination: Either enter the name of the wanted bucket directly, or click Browse to find and select the bucket you want.
Choose settings: Select the option Delete files from source after they're transferred.
Scheduling options: You can ignore this section.
After you complete the step-by-step walkthrough, click Create.
This begins the process of copying objects from your old bucket into your new one. This process may take some time; however, after you click Create, you can navigate away from the Google Cloud console.
To view the transfer's progress: Open the Transfer page in the Google Cloud console.
To learn how to get detailed error information about failed operations in the Storage Transfer Service browser, see Troubleshooting.
Once the transfer completes, you don't need to do anything to delete the objects from your old bucket if you selected the Delete source objects after the transfer completes checkbox during setup. You may, however, want to also delete your old bucket, which you must do separately.
gcloud CLI
Install the gcloud CLI
If you haven't already, install the gcloud command-line tool.
Then, call gcloud init
to initialize the tool and to specify your project ID
and user account. See Initializing Cloud SDK for more
details.
gcloud init
Add the service account to your destination folder
You must add the Storage Transfer Service service account to your destination
bucket before creating a transfer. To do so, use gsutil iam ch
:
gsutil iam ch serviceAccount:project-12345678@storage-transfer-service.iam.gserviceaccount.com:roles/storage.admin gs://bucket_name
For instructions using the Google Cloud console or API, refer to Use IAM permissions in the Cloud Storage documentation.
Create the transfer job
To create a new transfer job, use the gcloud transfer jobs create
command. Creating a new job initiates the specified transfer, unless a
schedule or --do-not-run
is specified.
gcloud transfer jobs create SOURCE DESTINATION
Where:
SOURCE is the data source for this transfer, in the format
gs://BUCKET_NAME
.DESTINATION is your new bucket, in the form
gs://BUCKET_NAME
.
Additional options include:
Job information: You can specify
--name
and--description
.Schedule: Specify
--schedule-starts
,--schedule-repeats-every
, and--schedule-repeats-until
, or--do-not-run
.Object conditions: Use conditions to determine which objects are transferred. These include
--include-prefixes
and--exclude-prefixes
, and the time-based conditions in--include-modified-[before | after]-[absolute | relative]
.Transfer options: Specify whether to overwrite destination files (
--overwrite-when=different
oralways
) and whether to delete certain files during or after the transfer (--delete-from=destination-if-unique
orsource-after-transfer
); specify which [metadata values to preserve]metadata; and optionally set a storage class on transferred objects (--custom-storage-class
).Notifications: Configure Pub/Sub notifications for transfers with
--notification-pubsub-topic
,--notification-event-types
, and--notification-payload-format
.
To view all options, run gcloud transfer jobs create --help
.
For example, to transfer all objects with the prefix folder1
:
gcloud transfer jobs create gs://old-bucket gs://new-bucket \
--include-prefixes="folder1/"
REST
In this example, you'll learn how to move files from one Cloud Storage bucket to another. For example, you can move data to a bucket in another location.
Request using transferJobs create:
POST https://storagetransfer.googleapis.com/v1/transferJobs { "description": "YOUR DESCRIPTION", "status": "ENABLED", "projectId": "PROJECT_ID", "schedule": { "scheduleStartDate": { "day": 1, "month": 1, "year": 2025 }, "startTimeOfDay": { "hours": 1, "minutes": 1 }, "scheduleEndDate": { "day": 1, "month": 1, "year": 2025 } }, "transferSpec": { "gcsDataSource": { "bucketName": "GCS_SOURCE_NAME" }, "gcsDataSink": { "bucketName": "GCS_SINK_NAME" }, "transferOptions": { "deleteObjectsFromSourceAfterTransfer": true } } }Response:
200 OK { "transferJob": [ { "creationTime": "2015-01-01T01:01:00.000000000Z", "description": "YOUR DESCRIPTION", "name": "transferJobs/JOB_ID", "status": "ENABLED", "lastModificationTime": "2015-01-01T01:01:00.000000000Z", "projectId": "PROJECT_ID", "schedule": { "scheduleStartDate": { "day": 1, "month": 1, "year": 2015 }, "startTimeOfDay": { "hours": 1, "minutes": 1 } }, "transferSpec": { "gcsDataSource": { "bucketName": "GCS_SOURCE_NAME", }, "gcsDataSink": { "bucketName": "GCS_NEARLINE_SINK_NAME" }, "objectConditions": { "minTimeElapsedSinceLastModification": "2592000.000s" }, "transferOptions": { "deleteObjectsFromSourceAfterTransfer": true } } } ] }
Client libraries
In this example, you'll learn how to move files from one Cloud Storage bucket to another. For example, you can replicate data to a bucket in another location.
For more information about the Storage Transfer Service client libraries, see Getting started with Storage Transfer Service client libraries.
Java
Looking for older samples? See the Storage Transfer Service Migration Guide.
Python
Looking for older samples? See the Storage Transfer Service Migration Guide.
Options
Some of the options available to you when setting up your transfer are listed below.
Logging: Cloud Logging provides detailed logs of individual objects, allowing you to verify transfer status and to perform additional data integrity checks.
Filtering: You can use include and exclude prefixes to limit which objects Storage Transfer Service operates on. This option can be used to split a transfer into multiple transfer jobs so that they can run in parallel. See Optimize transfer speed for more information.
Transfer options: You can configure your transfer to overwrite existing items in the destination bucket; to delete objects in the destination that don't exist in the transfer set; or to delete transferred objects from the source.
Metadata preservation
The following object metadata is preserved when transferring between Cloud Storage buckets with Storage Transfer Service:
- User-created custom metadata.
- Cloud Storage fixed-key metadata fields, such as Cache-Control, Content-Disposition, Content-Type, and Custom-Time.
- Size.
The following metadata can optionally be preserved when transferring using the API:
- ACLs
- Storage class
- CMEK
- Temporary hold
See the
TransferSpec
reference
for details.
Timestamp metadata from the source is not preserved. As a result, the object's time spent in the storage class before transfer is reset. That means for an object in Coldline Storage, post transfer, the object has to exist again for 90 days at destination to avoid early deletion charges.
Storage Transfer Service offers an option to preserve createTime
as the value of a
customTime
field. You can apply your createTime
-based lifecycle policies
using customTime
. Any values already saved as customTime
will not be
preserved in this case and overwritten.
Generation number is not preserved during the transfer.
Refer to Metadata preservation for more details.
Minimize downtime
Storage Transfer Service does not lock read or write on the source or destination buckets during a transfer.
If you choose to manually lock read/write on your bucket, to minimize downtime consider transferring your data in two steps: seed, and sync.
Seed transfer: Perform a bulk transfer without locking read/write on the source.
Sync transfer: Once the first run is complete, lock the read/write on the source bucket and perform another transfer. Storage Transfer Service transfers are incremental by default, so this second transfer will only transfer data that changed during the seed transfer.
Migrate versioned objects
Storage Transfer Service's manifest feature allows you to specify versions of objects you need to move.
List the bucket objects and copy them into a JSON file:
gcloud alpha storage ls --all-versions --recursive --json [SOURCE_BUCKET] > object-listing.json
This command typically lists around 1k objects per second.
Split the JSON file into two CSV files: one file with non-current versions, and another with the live versions:
jq -r '.[] | select( .type=="cloud_object" and (.metadata | has("timeDeleted") | not)) | [.metadata.name, .metadata.generation] | @csv' object-listing.json > live-object-manifest.csv jq -r '.[] | select( .type=="cloud_object" and (.metadata | has("timeDeleted"))) | [.metadata.name, .metadata.generation] | @csv' object-listing.json > non-current-object-manifest.csv
Transfer the non-current versions first by passing the
non-current-object-manifest.csv
manifest file as the value of thetransferManifest
field.Then, transfer the live versions in the same way, specifying
live-object-manifest.csv
as the manifest file.
Optimize the transfer speed
When estimating how long a transfer job will take, consider what the possible bottleneck will be. For example, if the source has billions of small files, then your transfer speed will be QPS bound. If object sizes are large, bandwidth might be the bottleneck.
Bandwidth limits are set at the region level and are fairly allocated across all projects. If sufficient bandwidth is available, Storage Transfer Service can complete around 1000 tasks per transfer job per second. You can accelerate a transfer in this case by splitting your job into multiple small transfer jobs, for example by using include and exclude prefixes to transfer certain files.
In cases where the location, storage class, and encryption key are the same, Storage Transfer Service does not create a new copy of the bytes; it instead creates a new metadata entry that points to the source blob. As a result, same-location / same-class copies of a large corpus happen very quickly and are only QPS bound.
Deletes are also metadata-only operations. For these transfers, parallelizing the transfer by splitting it into multiple small jobs will increase the speed.
Verify that objects were copied
After your transfer is complete, we recommend performing additional data integrity checks.
Validate that objects were copied correctly, by verifying the metadata on the objects, such as checksums and size.
Verify that the correct version of the objects were copied. Storage Transfer Service offers an out-of-the-box option to verify that objects are copies. If you've enabled logging, view logs to infer whether all objects were successfully copied and their corresponding metadata fields.
Start using the destination bucket
Once the migration is complete and verified, update any existing applications or workloads so that they use the target bucket name. Check data access logs in Cloud Audit Logs to ensure that your operations are correctly modifying and reading objects.
Delete the original bucket
Once everything is working well, delete the original bucket.
Storage Transfer Service offers the option of deleting objects after they have been
transferred by specifying deleteObjectsFromSourceAfterTransfer: true
in the
job configuration, or selecting the option in the Google Cloud console.
Schedule object deletion
To schedule the deletion of your objects at a later date, use a combination of
a scheduled transfer job, and the
deleteObjectsUniqueInSink = true
option.
The transfer job should be set up to transfer an empty bucket into the bucket containing your objects. This will cause Storage Transfer Service to list the objects and begin deleting them. As deletions are a metadata-only operation, the transfer job will only be QPS bound. To speed up the process, split the transfer into multiple jobs, each acting on a distinct set of prefixes.
Alternatively, Google Cloud offers a managed cron job scheduler. Read Schedule Google Cloud STS Transfer Job with Cloud Scheduler on medium.com, written by a Google Cloud Customer Engineer, for details.