Transfer data between file systems

This page shows you how to transfer data between two POSIX file systems. Common use cases include:

  • Burst to cloud and Hybrid HPC: Quickly transfer large data sets from on-premises to the cloud for processing.
  • Migration and sync to Filestore: Migrate or sync data from an on-premises file system to Filestore.
  • Managed file transfer: Securely and reliably transfer data between data centers or between two in-cloud file systems.

Before you begin

Before you can perform the tasks described on this page, complete the prerequisite steps.

Create agent pools and install agents

For file system to file system transfers, you need to create agent pools and agents for both the source and destination file systems. Agents for the source agent pool need to be installed on machines or VMs that have access to the source file system. Agents for the destination agent pool need to be installed on machines or VMs that have access to the destination file system.

Create a source agent pool

Create a source agent pool using one of the following methods:

gcloud CLI

Create a source agent pool by running:

gcloud transfer agent-pools create SOURCE_AGENT_POOL

Replace SOURCE_AGENT_POOL with the name that you want to give to the source agent pool.

Cloud console

  1. In the Cloud console, go to the Agent pools page.

    Go to Agent pools

    The Agent pools page is displayed, listing your existing agent pools.

  2. Click Create another pool.

  3. Enter a name for the pool.

  4. Click Create.

Install agents for the source agent pool

Install agents for the source agent pool on a machine or VM that has access to the source file system:

gcloud CLI

Install agents for the source agent pool by running:

gcloud transfer agents install --pool=SOURCE_AGENT_POOL --count=NUMBER_OF_AGENTS

Replace the following:

  • SOURCE_AGENT_POOL with the name of the source agent pool.
  • NUMBER_OF_AGENTS with the number of agents that you want to install for the source agent pool.

To determine the optimal number of agents for your environment, see Agent requirements and best practices.

Cloud console

  1. In the Cloud console, go to the Agent pools page.

    Go to Agent pools

    The Agent pools page is displayed, listing your existing agent pools.

  2. Click the name of the source agent pool that you just created.

  3. Under the Agents tab, click Install agent.

  4. Follow the instructions in Google Cloud console to create the Pub/Sub resource, install Docker, and start the agent.

Create a destination agent pool and install agents

Repeat the preceding steps to create a destination agent pool and install agents.

Create a Cloud Storage bucket as an intermediary

File system to file system transfers require a Cloud Storage bucket as an intermediary for the data transfer.

  1. Create a Cloud Storage Standard class bucket with the following settings:

    • Encryption: You can specify a customer-managed encryption key (CMEK). Otherwise, a Google-managed encryption key is used.
    • Object Versioning, Bucket Lock, and default object holds: Keep these features disabled.
    • Lifecycle policy for object deletion: Use the age condition to control the retention period of the transferred data. We recommend specifying a period longer than the longest transfer job that uses the bucket as an intermediary.
  2. Grant permissions and roles using one of the following methods:

    • Grant the Storage Transfer Service service account the Storage Admin role (roles/storage.admin) for the bucket.
    • Use gcloud transfer authorize to authorize your account for all Storage Transfer Service features. This command grants project-wide Storage Admin permissions:

      gcloud transfer authorize --add-missing
      

Manage intermediary buckets

Once a transfer job completes, data in the intermediary bucket is cleaned up and a transfer log is created in the bucket. If the cleanup process does not delete all of the data in the bucket, you can either clean up the left behind data by setting a lifecycle policy for the intermediary bucket or clean up the bucket manually:

Lifecycle policy

Use the age condition to control the retention period of bucket data. Specify a period longer than the longest transfer job that uses the bucket as an intermediary. If the specified age condition is shorter than the time required to download the file from the intermediary bucket to the destination, the file transfer fails.

Manual clean up

Delete bucket data by running:

gsutil -m rm gs://BUCKET/PREFIX**

Replace the following:

  • BUCKET with the name of the Cloud Storage bucket.
  • PREFIX with the prefix of the objects to delete.

Create a transfer job

To create a transfer from the source file system to the destination file system, run

  gcloud transfer jobs create SOURCE_DIRECTORY DESTINATION_DIRECTORY \
      --source-agent-pool=SOURCE_AGENT_POOL \
      --destination-agent-pool=DESTINATION_AGENT_POOL \
      --intermediate-storage-path=STORAGE_BUCKET

Replace the following:

  • SOURCE_DIRECTORY with the path of the source directory.
  • DESTINATION_DIRECTORY with the path of the destination directory.
  • SOURCE_AGENT_POOL with the name of the source agent pool.
  • DESTINATION_AGENT_POOL with the name of the destination agent pool.
  • STORAGE_BUCKET with the name of the Cloud Storage bucket.

When you start a transfer job, the system first computes the data in the source and destination to determine the source data that's new or updated since the previous transfer. Only the new data is transferred.

Preserving file metadata

To preserve file metadata, including numeric UID, GID, MODE, and symbolic links:

gcloud CLI

Use the --preserve-metadata field to specify the preservation behavior for this transfer. Options that apply to file system transfers are: acl, gid, mode, symlink, uid.

REST API

Specify the appropriate options in a metadataOptions object.

See Preserving optional POSIX attributes for more information.

Example transfer using the gcloud CLI

In this example, we transfer data from the /tmp/source directory on VM1 to the /tmp/destination directory on VM2.

  1. Set up the source of the transfer.

    1. Create the source agent pool:

      gcloud transfer agent-pools create source_agent_pool
      
    2. On VM1, install agents for source_agent_pool by running:

      gcloud transfer agents install --pool=source_agent_pool \
          --count=1
      
  2. Set up the destination of the transfer.

    1. Create the destination agent pool:

      gcloud transfer agent-pools create destination_agent_pool
      
    2. On VM2, install agents for destination_agent_pool by running:

      gcloud transfer agents install --pool=destination_agent_pool \
          --count=3
      
  3. Create an intermediary Cloud Storage bucket.

    1. Create a bucket named my-intermediary-bucket:

      gsutil mb gs://my-intermediary-bucket
      
    2. Authorize your account for all Storage Transfer Service features by running:

      gcloud transfer authorize --add-missing
      
  4. Create a transfer job by running:

    gcloud transfer jobs create posix:///tmp/source/on/some/system posix:///tmp/destination/on/some/other/system \
        --source-agent-pool=source_agent_pool \
        --destination-agent-pool=destination_agent_pool \
        --intermediate-storage-path=gs://my-intermediary-bucket
    

What's next