This page shows you how to transfer data between two POSIX file systems. Common use cases include:
- Burst to cloud and Hybrid HPC: Quickly transfer large data sets from on-premises to the cloud for processing.
- Migration and sync to Filestore: Migrate or sync data from an on-premises file system to Filestore.
- Managed file transfer: Securely and reliably transfer data between data centers or between two in-cloud file systems.
Transfer performance guidelines
The following guidelines can help to maximize performance during file system to file system transfers.
Agent deployment
In general, we recommend using three agents in each of the source and destination agent pools. Monitor the transfer and add more agents if required. Each agent needs 4 vCPU and 8 GiB RAM.
If you are migrating to a Filestore instance, Filestore
recommends using the n2-standard-8
instance type for each agent. Specify
nconnect=2
when you mount the instance to a Compute Engine VM. Refer to
Filestore's performance guidelines for
more information on how to optimize and test instance performance.
Transferring large numbers of small files
For better performance when transferring a large number of small files, we recommend splitting the files into multiple directories and avoiding a single directory with millions of files.
Before you begin
Before you can perform the tasks described on this page, complete the prerequisite steps.
Create agent pools and install agents
For file system to file system transfers, you need to create agent pools and agents for both the source and destination file systems. Agents for the source agent pool need to be installed on machines or VMs that have access to the source file system. Agents for the destination agent pool need to be installed on machines or VMs that have access to the destination file system.
Don't include sensitive information such as personally identifiable information (PII) or security data in your agent ID prefix or agent pool name. Resource names may be propagated to the names of other Google Cloud resources and may be exposed to Google-internal systems outside of your project.
Create a source agent pool
Create a source agent pool using one of the following methods:
gcloud CLI
Create a source agent pool by running:
gcloud transfer agent-pools create SOURCE_AGENT_POOL
Replace SOURCE_AGENT_POOL with the name that you want to give to the source agent pool.
Google Cloud console
In the Google Cloud console, go to the Agent pools page.
The Agent pools page is displayed, listing your existing agent pools.
Click Create another pool.
Enter a name for the pool.
Click Create.
Install agents for the source agent pool
Install agents for the source agent pool on a machine or VM that has access to the source file system:
gcloud CLI
Install agents for the source agent pool by running:
gcloud transfer agents install --pool=SOURCE_AGENT_POOL --count=NUMBER_OF_AGENTS \
--mount-directories=MOUNT_DIRECTORIES
Replace the following:
- SOURCE_AGENT_POOL with the name of the source agent pool.
- NUMBER_OF_AGENTS with the number of agents that you want to install for the source agent pool. To determine the optimal number of agents for your environment, see Agent requirements and best practices.
- MOUNT_DIRECTORIES with a comma-separated list of directories on the source file system from which to copy. Omitting this flag mounts the entire file system, which could present a security risk.
Google Cloud console
In the Google Cloud console, go to the Agent pools page.
The Agent pools page is displayed, listing your existing agent pools.
Click the name of the source agent pool that you just created.
Under the Agents tab, click Install agent.
Follow the instructions in Google Cloud console to install Docker and start the agent.
Create a destination agent pool and install agents
Repeat the preceding steps to create a destination agent pool and install agents.
Create a Cloud Storage bucket as an intermediary
File system to file system transfers require a Cloud Storage bucket as an intermediary for the data transfer.
Create a Cloud Storage Standard class bucket with the following settings:
- Encryption: You can specify a customer-managed encryption key (CMEK). Otherwise, a Google-owned and Google-managed key is used.
- Object Versioning, Bucket Lock, and default object holds: Keep these features disabled.
Grant permissions and roles using one of the following methods:
- Grant
the Storage Transfer Service service account the Storage Admin role
(
roles/storage.admin
) for the bucket. Use
gcloud transfer authorize
to authorize your account for all Storage Transfer Service features. This command grants project-wide Storage Admin permissions:gcloud transfer authorize --add-missing
- Grant
the Storage Transfer Service service account the Storage Admin role
(
Create a transfer job
gcloud CLI
To create a transfer from the source file system to the destination file system, run
gcloud transfer jobs create SOURCE_DIRECTORY DESTINATION_DIRECTORY \
--source-agent-pool=SOURCE_AGENT_POOL \
--destination-agent-pool=DESTINATION_AGENT_POOL \
--intermediate-storage-path= gs://STORAGE_BUCKET/FOLDER/
Replace the following variables:
- SOURCE_DIRECTORY with the path of the source directory.
- DESTINATION_DIRECTORY with the path of the destination directory.
- SOURCE_AGENT_POOL with the name of the source agent pool.
- DESTINATION_AGENT_POOL with the name of the destination agent pool.
- STORAGE_BUCKET with the name of the Cloud Storage bucket.
- FOLDER with the name of the folder you would like the data to be transferred into.
When you start a transfer job, the system first computes the data in the source and destination to determine the source data that's new or updated since the previous transfer. Only the new data is transferred.
Client Libraries
Go
Java
Node.js
Python
Manage intermediary buckets
After a transfer job completes, Storage Transfer Service saves transfer logs listing transferred and failed-to-transfer data in the bucket. After the transfer, clean-up tasks initiate automatically to delete the intermediate data. In some instances, clean-up tasks fail to delete all of the data in the bucket. To delete data that isn't erased during clean-up, use the instructions below to either delete the data manually, or to set a lifecycle rule to delete the data automatically.
Manual clean-up
Delete data from the intermediary bucket by running the following commands based on the type of data you want to delete.
To erase data in the intermediary bucket that wasn't deleted during clean-up, run the following command:
gcloud storage rm gs://STORAGE_BUCKET/PREFIX**
To delete all data, including transfer logs, specify the root of the bucket using the matches-all (*) wildcard.
gcloud storage rm gs://STORAGE_BUCKET/*
To delete the bucket, run the following command:
gcloud storage rm gs://STORAGE_BUCKET
Replace the following variables:
STORAGE_BUCKET with the name of the intermediary bucket.
PREFIX with the name of the folder the data was transferred into within the intermediary bucket.
Set a lifecycle rule
To delete data that isn't erased by the automatic clean-up cycle, set a
lifecycle rule for the Cloud Storage bucket. Use the
age
condition to erase intermediate data in
the bucket by specifying a period longer than the longest transfer job that
uses the bucket as an intermediary. If the specified age
condition is shorter than the time required to download the file from the
intermediary bucket to the destination, the file transfer fails.
Optionally, use the matchesPrefix
condition
to erase data in the folder you specified for the intermediate bucket. To
delete transfer logs along with data in the bucket, the matchesPrefix
condition is not required.
Preserving file metadata
To preserve file metadata, including numeric UID, GID, MODE, and symbolic links:
gcloud CLI
Use the --preserve-metadata
field to specify the preservation behavior for
this transfer. Options that apply to file system transfers are: gid
,
mode
, symlink
, uid
.
REST API
Specify the appropriate options in a
metadataOptions
object.
See Preserving optional POSIX attributes for more information.
Example transfer using the gcloud CLI
In this example, we transfer data from the /tmp/datasource
directory on VM1 to
/tmp/destination
on VM2.
Set up the source of the transfer.
Create the source agent pool:
gcloud transfer agent-pools create source_agent_pool
On VM1, install agents for
source_agent_pool
by running:gcloud transfer agents install --pool=source_agent_pool \ --count=1 \ --mount-directories="/tmp/datasource"
Set up the destination of the transfer.
Create the destination agent pool:
gcloud transfer agent-pools create destination_agent_pool
On VM2, install agents for
destination_agent_pool
by running:gcloud transfer agents install --pool=destination_agent_pool \ --count=3 \ --mount-directories="/tmp/destination"
Create an intermediary Cloud Storage bucket.
Create a bucket named
my-intermediary-bucket
:gcloud storage buckets create gs://my-intermediary-bucket
Authorize your account for all Storage Transfer Service features by running:
gcloud transfer authorize --add-missing
Create a transfer job by running:
gcloud transfer jobs create posix:///tmp/datasource posix:///tmp/destination \ --source-agent-pool=source_agent_pool \ --destination-agent-pool=destination_agent_pool \ --intermediate-storage-path=gs://my-intermediary-bucket
What's next
- Learn more about managing jobs.