Transfer between Cloud Storage buckets

Storage Transfer Service can be used to transfer large amounts of data between Cloud Storage buckets, either within the same Google Cloud project, or between different projects.

Bucket migrations are useful in a number of scenarios. They can be used to consolidate data from separate projects, to move data into a backup location, or to change the location of your data.

When to use Storage Transfer Service

Google Cloud offers multiple options to transfer data between Cloud Storage buckets. We recommend the following guidelines:

  • Transferring less than 1 TB: Use gsutil or gcloud. For instructions, refer to Move and rename buckets.

  • Transferring more than 1 TB: Use Storage Transfer Service. Storage Transfer Service is a managed transfer option that provides out of the box security, reliability, and performance. It eliminates the need to optimize and maintain scripts, and handle retries.

This guide discusses best practices when transferring data between Cloud Storage buckets using Storage Transfer Service.

Create the destination bucket

Before beginning the transfer, create a storage bucket. See Location considerations for help choosing an appropriate bucket location.

You may wish to copy over some of the bucket metadata when you create the new bucket. See Get bucket information to learn how to display the source bucket's metadata, so that you can apply the same settings to your new bucket.

Preserve the bucket name

Cloud Storage bucket names must be globally unique; this means that you can not create a new bucket with the same name as your old bucket. If you'd like your data to end up in a bucket with the same name as the source bucket, you'll need to do two transfers:

  • First, transfer data from your source bucket (bucketA) to a temporary bucket (bucketTemp).
  • Delete the original bucketA, then immediately create a new bucket named bucketA.
  • Transfer data from bucketTemp to the new bucketA.
  • Delete bucketTemp.

Tip: Create the new bucket immediately after deleting the original bucket to ensure that the name remains available to you.

Copy the objects to destination

To copy objects from source bucket to destination, select an interface from the tabs below.

Google Cloud console

Use the Cloud Storage Transfer Service from within Google Cloud console:

  1. If you don't have a destination bucket yet, create the bucket.
  2. Open the Transfer page in the Google Cloud console.

    Open the Transfer page

  3. Click Create transfer job.
  4. Follow the step-by-step walkthrough, clicking Next step as you complete each step:

    • Choose a source: Use Google Cloud Storage bucket as your source type, and either enter the name of the wanted bucket directly, or click Browse to find and select the bucket you want.

    • Choose a destination: Either enter the name of the wanted bucket directly, or click Browse to find and select the bucket you want.

    • Choose settings: Select the option Delete files from source after they're transferred.

    • Scheduling options: You can ignore this section.

  5. After you complete the step-by-step walkthrough, click Create.

    This begins the process of copying objects from your old bucket into your new one. This process may take some time; however, after you click Create, you can navigate away from the Google Cloud console.

    To view the transfer's progress: Open the Transfer page in the Google Cloud console.

    Open the Transfer page

    To learn how to get detailed error information about failed operations in the Storage Transfer Service browser, see Troubleshooting.

  6. Once the transfer completes, you don't need to do anything to delete the objects from your old bucket if you selected the Delete source objects after the transfer completes checkbox during setup. You may, however, want to also delete your old bucket, which you must do separately.

gcloud CLI

Install the gcloud CLI

If you haven't already, install the gcloud command-line tool.

Then, call gcloud init to initialize the tool and to specify your project ID and user account. See Initializing Cloud SDK for more details.

gcloud init

Add the service account to your destination folder

You must add the Storage Transfer Service service account to your destination bucket before creating a transfer. To do so, use gsutil iam ch:

gsutil iam ch serviceAccount:project-12345678@storage-transfer-service.iam.gserviceaccount.com:roles/storage.admin gs://bucket_name

For instructions using the Google Cloud console or API, refer to Use IAM permissions in the Cloud Storage documentation.

Create the transfer job

To create a new transfer job, use the gcloud transfer jobs create command. Creating a new job initiates the specified transfer, unless a schedule or --do-not-run is specified.

gcloud transfer jobs create SOURCE DESTINATION

Where:

  • SOURCE is the data source for this transfer, in the format gs://BUCKET_NAME.

  • DESTINATION is your new bucket, in the form gs://BUCKET_NAME.

Additional options include:

  • Job information: You can specify --name and --description.

  • Schedule: Specify --schedule-starts, --schedule-repeats-every, and --schedule-repeats-until, or --do-not-run.

  • Object conditions: Use conditions to determine which objects are transferred. These include --include-prefixes and --exclude-prefixes, and the time-based conditions in --include-modified-[before | after]-[absolute | relative].

  • Transfer options: Specify whether to overwrite destination files (--overwrite-when=different or always) and whether to delete certain files during or after the transfer (--delete-from=destination-if-unique or source-after-transfer); specify which [metadata values to preserve]metadata; and optionally set a storage class on transferred objects (--custom-storage-class).

  • Notifications: Configure Pub/Sub notifications for transfers with --notification-pubsub-topic, --notification-event-types, and --notification-payload-format.

To view all options, run gcloud transfer jobs create --help.

For example, to transfer all objects with the prefix folder1:

gcloud transfer jobs create gs://old-bucket gs://new-bucket \
  --include-prefixes="folder1/"

REST

In this example, you'll learn how to move files from one Cloud Storage bucket to another. For example, you can move data to a bucket in another location.

Request using transferJobs create:

POST https://storagetransfer.googleapis.com/v1/transferJobs
{
  "description": "YOUR DESCRIPTION",
  "status": "ENABLED",
  "projectId": "PROJECT_ID",
  "schedule": {
      "scheduleStartDate": {
          "day": 1,
          "month": 1,
          "year": 2025
      },
      "startTimeOfDay": {
          "hours": 1,
          "minutes": 1
      },
      "scheduleEndDate": {
          "day": 1,
          "month": 1,
          "year": 2025
      }
  },
  "transferSpec": {
      "gcsDataSource": {
          "bucketName": "GCS_SOURCE_NAME"
      },
      "gcsDataSink": {
          "bucketName": "GCS_SINK_NAME"
      },
      "transferOptions": {
          "deleteObjectsFromSourceAfterTransfer": true
      }
  }
}
Response:
200 OK
{
  "transferJob": [
      {
          "creationTime": "2015-01-01T01:01:00.000000000Z",
          "description": "YOUR DESCRIPTION",
          "name": "transferJobs/JOB_ID",
          "status": "ENABLED",
          "lastModificationTime": "2015-01-01T01:01:00.000000000Z",
          "projectId": "PROJECT_ID",
          "schedule": {
              "scheduleStartDate": {
                  "day": 1,
                  "month": 1,
                  "year": 2015
              },
              "startTimeOfDay": {
                  "hours": 1,
                  "minutes": 1
              }
          },
          "transferSpec": {
              "gcsDataSource": {
                  "bucketName": "GCS_SOURCE_NAME",
              },
              "gcsDataSink": {
                  "bucketName": "GCS_NEARLINE_SINK_NAME"
              },
              "objectConditions": {
                  "minTimeElapsedSinceLastModification": "2592000.000s"
              },
              "transferOptions": {
                  "deleteObjectsFromSourceAfterTransfer": true
              }
          }
      }
  ]
}

Client libraries

In this example, you'll learn how to move files from one Cloud Storage bucket to another. For example, you can replicate data to a bucket in another location.

For more information about the Storage Transfer Service client libraries, see Getting started with Storage Transfer Service client libraries.

Java

Looking for older samples? See the Storage Transfer Service Migration Guide.

import com.google.protobuf.Duration;
import com.google.storagetransfer.v1.proto.StorageTransferServiceClient;
import com.google.storagetransfer.v1.proto.TransferProto.CreateTransferJobRequest;
import com.google.storagetransfer.v1.proto.TransferTypes.GcsData;
import com.google.storagetransfer.v1.proto.TransferTypes.ObjectConditions;
import com.google.storagetransfer.v1.proto.TransferTypes.Schedule;
import com.google.storagetransfer.v1.proto.TransferTypes.TransferJob;
import com.google.storagetransfer.v1.proto.TransferTypes.TransferJob.Status;
import com.google.storagetransfer.v1.proto.TransferTypes.TransferOptions;
import com.google.storagetransfer.v1.proto.TransferTypes.TransferSpec;
import com.google.type.Date;
import com.google.type.TimeOfDay;
import java.io.IOException;
import java.util.Calendar;

public class TransferToNearline {
  /**
   * Creates a one-off transfer job that transfers objects in a standard GCS bucket that are more
   * than 30 days old to a Nearline GCS bucket.
   */
  public static void transferToNearline(
      String projectId,
      String jobDescription,
      String gcsSourceBucket,
      String gcsNearlineSinkBucket,
      long startDateTime)
      throws IOException {

    // Your Google Cloud Project ID
    // String projectId = "your-project-id";

    // A short description of this job
    // String jobDescription = "Sample transfer job of old objects to a Nearline GCS bucket.";

    // The name of the source GCS bucket to transfer data from
    // String gcsSourceBucket = "your-gcs-source-bucket";

    // The name of the Nearline GCS bucket to transfer old objects to
    // String gcsSinkBucket = "your-nearline-gcs-bucket";

    // What day and time in UTC to start the transfer, expressed as an epoch date timestamp.
    // If this is in the past relative to when the job is created, it will run the next day.
    // long startDateTime =
    //     new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").parse("2000-01-01 00:00:00").getTime();

    // Parse epoch timestamp into the model classes
    Calendar startCalendar = Calendar.getInstance();
    startCalendar.setTimeInMillis(startDateTime);
    // Note that this is a Date from the model class package, not a java.util.Date
    Date date =
        Date.newBuilder()
            .setYear(startCalendar.get(Calendar.YEAR))
            .setMonth(startCalendar.get(Calendar.MONTH) + 1)
            .setDay(startCalendar.get(Calendar.DAY_OF_MONTH))
            .build();
    TimeOfDay time =
        TimeOfDay.newBuilder()
            .setHours(startCalendar.get(Calendar.HOUR_OF_DAY))
            .setMinutes(startCalendar.get(Calendar.MINUTE))
            .setSeconds(startCalendar.get(Calendar.SECOND))
            .build();

    TransferJob transferJob =
        TransferJob.newBuilder()
            .setDescription(jobDescription)
            .setProjectId(projectId)
            .setTransferSpec(
                TransferSpec.newBuilder()
                    .setGcsDataSource(GcsData.newBuilder().setBucketName(gcsSourceBucket))
                    .setGcsDataSink(GcsData.newBuilder().setBucketName(gcsNearlineSinkBucket))
                    .setObjectConditions(
                        ObjectConditions.newBuilder()
                            .setMinTimeElapsedSinceLastModification(
                                Duration.newBuilder().setSeconds(2592000 /* 30 days */)))
                    .setTransferOptions(
                        TransferOptions.newBuilder().setDeleteObjectsFromSourceAfterTransfer(true)))
            .setSchedule(Schedule.newBuilder().setScheduleStartDate(date).setStartTimeOfDay(time))
            .setStatus(Status.ENABLED)
            .build();

    // Create a Transfer Service client
    StorageTransferServiceClient storageTransfer = StorageTransferServiceClient.create();

    // Create the transfer job
    TransferJob response =
        storageTransfer.createTransferJob(
            CreateTransferJobRequest.newBuilder().setTransferJob(transferJob).build());

    System.out.println("Created transfer job from standard bucket to Nearline bucket:");
    System.out.println(response.toString());
  }
}

Python

Looking for older samples? See the Storage Transfer Service Migration Guide.

from datetime import datetime

from google.cloud import storage_transfer
from google.protobuf.duration_pb2 import Duration


def create_daily_nearline_30_day_migration(
        project_id: str, description: str, source_bucket: str,
        sink_bucket: str, start_date: datetime):
    """Create a daily migration from a GCS bucket to a Nearline GCS bucket
    for objects untouched for 30 days."""

    client = storage_transfer.StorageTransferServiceClient()

    # The ID of the Google Cloud Platform Project that owns the job
    # project_id = 'my-project-id'

    # A useful description for your transfer job
    # description = 'My transfer job'

    # Google Cloud Storage source bucket name
    # source_bucket = 'my-gcs-source-bucket'

    # Google Cloud Storage destination bucket name
    # sink_bucket = 'my-gcs-destination-bucket'

    transfer_job_request = storage_transfer.CreateTransferJobRequest({
        'transfer_job': {
            'project_id': project_id,
            'description': description,
            'status': storage_transfer.TransferJob.Status.ENABLED,
            'schedule': {
                'schedule_start_date': {
                    'day': start_date.day,
                    'month': start_date.month,
                    'year': start_date.year
                }
            },
            'transfer_spec': {
                'gcs_data_source': {
                    'bucket_name': source_bucket,
                },
                'gcs_data_sink': {
                    'bucket_name': sink_bucket,
                },
                'object_conditions': {
                    'min_time_elapsed_since_last_modification': Duration(
                        seconds=2592000  # 30 days
                    )
                },
                'transfer_options': {
                    'delete_objects_from_source_after_transfer': True
                }
            }
        }
    })

    result = client.create_transfer_job(transfer_job_request)
    print(f'Created transferJob: {result.name}')

Options

Some of the options available to you when setting up your transfer are listed below.

  • Logging: Cloud Logging provides detailed logs of individual objects, allowing you to verify transfer status and to perform additional data integrity checks.

  • Filtering: You can use include and exclude prefixes to limit which objects Storage Transfer Service operates on. This option can be used to split a transfer into multiple transfer jobs so that they can run in parallel. See Optimize transfer speed for more information.

  • Transfer options: You can configure your transfer to overwrite existing items in the destination bucket; to delete objects in the destination that don't exist in the transfer set; or to delete transferred objects from the source.

Metadata preservation

The following object metadata is preserved when transferring between Cloud Storage buckets with Storage Transfer Service:

  • User-created custom metadata.
  • Cloud Storage fixed-key metadata fields, such as Cache-Control, Content-Disposition, Content-Type, and Custom-Time.
  • Size.

The following metadata can optionally be preserved when transferring using the API:

  • ACLs
  • Storage class
  • CMEK
  • Temporary hold

See the TransferSpec reference for details.

Timestamp metadata from the source is not preserved. As a result, the object's time spent in the storage class before transfer is reset. That means for an object in Coldline Storage, post transfer, the object has to exist again for 90 days at destination to avoid early deletion charges.

Storage Transfer Service offers an option to preserve createTime as the value of a customTime field. You can apply your createTime-based lifecycle policies using customTime. Any values already saved as customTime will not be preserved in this case and overwritten.

Generation number is not preserved during the transfer.

Refer to Metadata preservation for more details.

Minimize downtime

Storage Transfer Service does not lock read or write on the source or destination buckets during a transfer.

If you choose to manually lock read/write on your bucket, to minimize downtime consider transferring your data in two steps: seed, and sync.

  • Seed transfer: Perform a bulk transfer without locking read/write on the source.

  • Sync transfer: Once the first run is complete, lock the read/write on the source bucket and perform another transfer. Storage Transfer Service transfers are incremental by default, so this second transfer will only transfer data that changed during the seed transfer.

Migrate versioned objects

Storage Transfer Service's manifest feature allows you to specify versions of objects you need to move.

  1. List the bucket objects and copy them into a JSON file:

    gcloud alpha storage ls --all-versions --recursive --json [SOURCE_BUCKET] > object-listing.json
    

    This command typically lists around 1k objects per second.

  2. Split the JSON file into two CSV files: one file with non-current versions, and another with the live versions:

    jq -r '.[] | select( .type=="cloud_object" and (.metadata | has("timeDeleted") | not)) | [.metadata.name, .metadata.generation] | @csv' object-listing.json > live-object-manifest.csv
    jq -r '.[] | select( .type=="cloud_object" and (.metadata | has("timeDeleted"))) | [.metadata.name, .metadata.generation] | @csv' object-listing.json > non-current-object-manifest.csv
    
  3. Transfer the non-current versions first by passing the non-current-object-manifest.csv manifest file as the value of the transferManifest field.

  4. Then, transfer the live versions in the same way, specifying live-object-manifest.csv as the manifest file.

Optimize the transfer speed

When estimating how long a transfer job will take, consider what the possible bottleneck will be. For example, if the source has billions of small files, then your transfer speed will be QPS bound. If object sizes are large, bandwidth might be the bottleneck.

Bandwidth limits are set at the region level and are fairly allocated across all projects. If sufficient bandwidth is available, Storage Transfer Service can complete around 1000 tasks per transfer job per second. You can accelerate a transfer in this case by splitting your job into multiple small transfer jobs, for example by using include and exclude prefixes to transfer certain files.

In cases where the location, storage class, and encryption key are the same, Storage Transfer Service does not create a new copy of the bytes; it instead creates a new metadata entry that points to the source blob. As a result, same-location / same-class copies of a large corpus happen very quickly and are only QPS bound.

Deletes are also metadata-only operations. For these transfers, parallelizing the transfer by splitting it into multiple small jobs will increase the speed.

Verify that objects were copied

After your transfer is complete, we recommend performing additional data integrity checks.

  • Validate that objects were copied correctly, by verifying the metadata on the objects, such as checksums and size.

  • Verify that the correct version of the objects were copied. Storage Transfer Service offers an out-of-the-box option to verify that objects are copies. If you've enabled logging, view logs to infer whether all objects were successfully copied and their corresponding metadata fields.

Start using the destination bucket

Once the migration is complete and verified, update any existing applications or workloads so that they use the target bucket name. Check data access logs in Cloud Audit Logs to ensure that your operations are correctly modifying and reading objects.

Delete the original bucket

Once everything is working well, delete the original bucket.

Storage Transfer Service offers the option of deleting objects after they have been transferred by specifying deleteObjectsFromSourceAfterTransfer: true in the job configuration, or selecting the option in the Google Cloud console.

Schedule object deletion

To schedule the deletion of your objects at a later date, use a combination of a scheduled transfer job, and the deleteObjectsUniqueInSink = true option.

The transfer job should be set up to transfer an empty bucket into the bucket containing your objects. This will cause Storage Transfer Service to list the objects and begin deleting them. As deletions are a metadata-only operation, the transfer job will only be QPS bound. To speed up the process, split the transfer into multiple jobs, each acting on a distinct set of prefixes.

Alternatively, Google Cloud offers a managed cron job scheduler. Read Schedule Google Cloud STS Transfer Job with Cloud Scheduler on medium.com, written by a Google Cloud Customer Engineer, for details.