Transfer service for on-premises data overview

This page describes Transfer service for on-premises data, its requirements, and its features.

About Transfer service for on-premises data

Transfer service for on-premises data is a software service that enables you to transfer large amounts of data from your data center to a Cloud Storage bucket. It is well suited for customers that are moving billions of files and 100s of TB of data in a single transfer. It can scale to network connections in the 10s of Gbps.

Benefits of Transfer service for on-premises data

Transfer service for on-premises data enables you to transfer large volumes of data without writing custom scripts or buying off-the-shelf solutions. Custom scripts can be:

  • Unreliable
  • Slow
  • Insecure
  • Difficult to maintain and troubleshoot

Off-the-shelf solutions can be costly to deploy.

Transfer service for on-premises data is a scalable, reliable, and managed service that enables you to move your data without investing in engineering teams or buying transfer solutions. You install a Docker container containing the on-premises agent for Linux on your data center's computers, and Transfer service for on-premises data coordinates the agents to transfer your data securely to Cloud Storage.

Using Transfer service for on-premises data with limited bandwidth

If you have limited bandwidth, you can still use Transfer service for on-premises data. You can set a bandwidth limit for your Google Cloud project, which limits the rate that on-premises agents copy data to Google Cloud. The bandwidth limit is shared across all transfer jobs and their associated on-premises agents within your Google Cloud project.

How Transfer service for on-premises data works

The following is a high-level overview of how Transfer service for on-premises data works:

  1. Install Docker and run a small piece of software, called an agent, in your private data center. The agent runs within a Docker container and has access to your locally mounted NFS data.

    See Installing and running the on-premises agent for more information.

  2. Complete Transfer for on-premises first-time setup. This includes granting access to resources used by Storage Transfer Service, such as Pub/Sub and Cloud Storage.

  3. Start a Transfer service for on-premises data transfer from the Google Cloud Console. You'll provide the NFS directory and a destination Cloud Storage bucket to transfer data to.

    See Creating a transfer job for more information.

  4. When the transfer starts, it recursively traverses through the given NFS directory and moves data it finds to your Cloud Storage bucket.

    Transferred data is checksummed, files with errors are re-tried, and data is sent via a secure connection. A record of the transfer's progress is written to log objects within your destination Cloud Storage bucket. You can track the progress of the transfer within the Google Cloud Console.

  5. When the transfer completes, you can view error samples within the Google Cloud Console. You can also review the transfer log for a catalog of files transferred and any errors.

How Transfer service for on-premises data agents work

The following describes Transfer service for on-premises data agent processes:

  • Agent processes are dynamic. While you are running a transfer, you can add agents to increase performance. Newly started agents join the agent pool and perform work from existing transfers. You can use this to adjust how many agents are running, or to adapt transfer performance to changing transfer demand.

  • Agent processes are a fault-tolerant collective. If one agent stops running, the remaining agents continue to do work. If all of your agents stop, when you restart the agents the transfer resumes where the agents stopped. This enables you to avoid monitoring agents, retrying transfers, or implementing recovery logic. You can patch, move, and dynamically scale your agent pool without transfer downtime by coordinating agents with Google Kubernetes Engine.

    For example, you submit two transfers while two agents are running. If one of the agents stops due to a machine reboot or operating system patch, the remaining agent continues working. The two transfers are still running, but slower since a single agent is moving data. If the remaining agent also stops, then all transfers stop making progress, since there are no agents running. When you restart the agent processes, the transfers resume where they left off.

  • Agent processes are a pool. They collectively move your data in parallel. Because of this, all agents must have the same access to all data sources that you want to transfer.

    For example, if you are transferring data from a particular file system, you must mount the file system to every machine that you've installed agents on. If some agents can reach a data source and others can't, transfers from that data source won't succeed.

Transfer service for on-premises data requirements

To use Transfer for on-premises, you need:

  • A POSIX-compliant source.

  • Network connection that is 300Mbps or faster.

  • A Docker-supported 64-bit Linux server or virtual machine that can access the data you plan to transfer.

    Docker Community Edition, supports CentOs, Debian, Fedora, and Ubuntu operating systems.

    To use other Linux operating systems, see Docker Enterprise.

  • Complete Transfer for on-premises first-time setup.

Before you start a transfer, verify that:

  • TCP ports 80 (HTTP) and 443 (HTTPS) are open for outbound connections.
  • All agent processes within a single Google Cloud project have the same filesystem mounted at the same mount point.

What's next?

Start your transfer by completing first-time setup.