This page describes Transfer service for on-premises data, its requirements, and its features.
About Transfer service for on-premises data
Transfer service for on-premises data is a software service that enables you to transfer large amounts of data from your data center to a Cloud Storage bucket. It is well suited for customers that are moving billions of files and 100s of TB of data in a single transfer. It can scale to network connections in the 10s of Gbps.
Benefits of Transfer service for on-premises data
Transfer service for on-premises data is a scalable, reliable, and managed service that enables you to transfer large volumes of data without investing in engineering teams or buying costly off-the-shelf solutions. You install a Docker container containing the on-premises agent for Linux on your data center's computers, and Transfer for on-premises coordinates the agents to transfer your data securely to Cloud Storage.
Using Transfer service for on-premises data with limited bandwidth
If you have limited bandwidth, you can still use Transfer service for on-premises data. You can set a bandwidth limit for your Google Cloud project, which limits the rate that on-premises agents copy data to Google Cloud. The bandwidth limit is shared across all transfer jobs and their associated on-premises agents within your Google Cloud project.
How Transfer service for on-premises data works
The following is a high-level overview of how Transfer service for on-premises data works:
See Installing and running the on-premises agent for more information.
Complete Transfer for on-premises set up. This includes granting access to resources used by Storage Transfer Service, such as Pub/Sub and Cloud Storage.
Start a Transfer service for on-premises data transfer from the Google Cloud Console. You'll provide the NFS directory and a destination Cloud Storage bucket to transfer data to.
See Creating a transfer job for more information.
When the transfer starts, it recursively traverses through the given NFS directory and moves data it finds to your Cloud Storage bucket.
Transferred data is checksummed, files with errors are re-tried, and data is sent via a secure connection. A record of the transfer's progress is written to log objects within your destination Cloud Storage bucket. You can track the progress of the transfer within the Cloud Console.
When the transfer completes, you can view error samples within the Cloud Console. You can also review the transfer log for a catalog of files transferred and any errors.
How Transfer service for on-premises data agents work
The following describes Transfer service for on-premises data agent processes:
Agent processes are dynamic. While you are running a transfer, you can add agents to increase performance. Newly started agents join the assigned agent pool and perform work from existing transfers. You can use this to adjust how many agents are running, or to adapt transfer performance to changing transfer demand.
Agent processes are a fault-tolerant collective. If one agent stops running, the remaining agents continue to do work. If all of your agents stop, when you restart the agents the transfer resumes where the agents stopped. This enables you to avoid monitoring agents, retrying transfers, or implementing recovery logic. You can patch, move, and dynamically scale your agent pools without transfer downtime by coordinating agents with Google Kubernetes Engine.
For example, you submit two transfers while two agents are running. If one of the agents stops due to a machine reboot or operating system patch, the remaining agent continues working. The two transfers are still running, but slower since a single agent is moving data. If the remaining agent also stops, then all transfers stop making progress, since there are no agents running. When you restart the agent processes, the transfers resume where they left off.
Agent processes belong to a pool. They collectively move your data in parallel. Because of this, all agents must have the same access to all data sources that you want to transfer.
For example, if you are transferring data from a particular file system, you must mount the file system to every machine that you've installed agents on. If some agents can reach a data source and others can't, transfers from that data source won't succeed.
How agent pools work
An agent pool is a collection of agents that use the same configuration, with uniform access and visibility to your source and destination. For example, if you have two data centers with separate file systems as a source, you'd create a separate agent pool for each. This is because a single agent pool could not have uniform access or visibility to the source to effectively transfer the data.
Each agent pool is distinct, relying on its specific transfer configuration. This enables you to have agents performing work with different sources and destinations and manage transfer resources such as bandwidth limits for each pool.
Transfer service for on-premises data requirements
To use Transfer for on-premises, you need:
A POSIX-compliant source.
Network connection that is 300Mbps or faster.
A Docker-supported 64-bit Linux server or virtual machine that can access the data you plan to transfer.
Docker Community Edition, supports CentOs, Debian, Fedora, and Ubuntu operating systems.
To use other Linux operating systems, see Docker Enterprise.
A Cloud Storage bucket without a retention policy.
To transfer to a bucket with a retention policy, we recommend the following process:
Create a Cloud Storage bucket within the same region as the final bucket. Ensure that this temporary bucket does not have a retention policy.
For more information about regions, see Bucket locations.
Use Transfer service for on-premises data to transfer your data to the temporary bucket you created without a retention policy.
Perform a bucket-to-bucket transfer to transfer the data to the bucket with a retention policy.
Delete the Cloud Storage bucket that you created to temporarily store your data.
Before you start a transfer, verify that:
- TCP ports 80 (HTTP) and 443 (HTTPS) are open for outbound connections.
All agent processes within a single Google Cloud project have the same filesystem mounted at the same mount point.