Storage & Data Transfer
Getting to the cloud: Best practices for migrating from On-prem to Google Cloud using Storage Transfer Service
Organizations have been moving their on-premises data and applications to the cloud for the past several years, driven by reasons as varied as application modernization and content delivery to archival. In particular, we have seen migration momentum pick up in sectors like media and entertainment, where customers are rethinking how they monetize and store their valuable historical content, often while exploring Google Cloud’s many analytical and AI solutions.
Many of these customers are interested in moving their unstructured data from on-premises appliances to Google’s Cloud Storage. Over the past year, we’ve noticed an uptick in larger migrations, where customers move tens of petabytes or more of on-premises file data to Google’s flexible, secure object storage.
For customers like Telecom Italia/TIM Brasil, Google’s fully managed Storage Transfer Service played a key role in making this transformation possible by moving data from on-premises filesystems to extensible, low-cost Cloud Storage over the network.
“Storage Transfer Service helped us move petabyte-scale data from on-premises filesystem to Google Cloud in a highly performant and fully-managed way,” said Auana Mattar, CIO at TIM Brasil. “Setting up the transfer pipeline required performing some tests to figure out the ideal number of agents and networking settings in our on-prem environment. Once the initial setup was done, transferring data was seamless, and the service was able to saturate a 20 Gbps Partner Interconnect link.”
While large-scale cloud migrations can be intimidating, there are a number of actions that customers can take to ensure that a multi-petabyte data transfer goes as smoothly as possible. In the past, we’ve shared general architectural guidance for customers new to their cloud journey. And for customers looking for options, Google has multiple paths to move on-premises file data to the cloud, including our fully offline Transfer Appliance.
In this blog post, we’ll provide an updated perspective focused on how to use Storage Transfer Service to move data from on-prem to the cloud. Specifically, we’ll look at three different factors to consider prior to moving large amounts of data from your on-premises filesystem to Cloud Storage with our Storage Transfer Service.
Understanding the source files and filesystem
If you are moving data from an on-premises filesystem, you should be aware of how your source files, and your source filesystem, can impact transfer performance.
Each copy you make to Cloud Storage incurs some overhead from associated operations like metadata transfer, checksumming, and encryption. This means that, for a given amount of storage, transferring large numbers of very small files will take longer. As a rule of thumb, Storage Transfer Service will be most performant when moving files that are 16 MB or larger.
If you have a large number of smaller files, you may choose to batch them using tools like tar and upload as a single object. This will improve transfer performance but will limit how you can use those files in Cloud Storage. It’s an option best considered for use cases like archival storage, where the data transferred to Google Cloud may not be managed or accessed regularly. Our Nearline, Coldline, and Archive archival tiers offer excellent performance should you ever need to retrieve the archived data.
The source filesystem may also slow down transfer performance, particularly if the filesystem has limited read throughput. Tools like Fio can be used to test read throughput. We’ve included a command below to run a series of 1MB sequential read operations in Fio and to generate a report:
> sudo apt install -y fio
#Create a new directory fiotest
> sudo mkdir -p $TEST_DIR
#Test read throughput
> sudo fio --directory=$TEST_DIR --direct=1 --rw=randread --randrepeat=0 --ioengine=libaio --bs=1M --iodepth=8 --time_based=1 --runtime=180 --name=read_test --size=1G
Fio will then generate a report. The final line labeled ‘bw’ represents the total aggregate bandwidth of all threads, and it can be used as a proxy for read throughput. In general, you should strive for read throughput (‘bw’) that is 1.5x of your desired upload throughput or speed. (And one easy way to increase read throughput is to ensure that the filesystem itself is not imposing any limits on maximum throughput.)
Optimizing Storage Transfer Service resources
Within the Storage Transfer Service, there are a few settings that we can adjust ahead of time to ensure optimal performance for a larger workload.
First, we should ensure that we have the right number of transfer agents for our source data. We would advise that, for any transfer job larger than 1 GB, you start with at least three agents in separate VMs, with each agent assigned at least 4 vCPU and 8 GB of RAM. In addition to providing a foundation for performant data transfer, this architecture also ensures that the transfer is fault tolerant should one agent machine become unavailable.
Google Cloud supports up to 100 concurrent agents for a given Google Cloud project. To help you identify the right number of agents to support your workload, you should start your larger transfer first, then wait three minutes after adding each agent to ensure that throughput has stabilized.
In general, each agent can facilitate roughly 1 Gbps of throughput for up to 10 agents, at which point it may be necessary to add more agents for a very large amount of network bandwidth. For example, in one larger migration, a customer with 20 Gbps of dedicated network capacity ran ~30 agents at once. These numbers illustrate what was required at one enterprise data center. Across all of your environments, it is important to test and monitor your throughput via Cloud Monitoring to ensure you have the right configuration for your transfer goals.
Another area to optimize is where and how you install your agents. As we mentioned earlier, agents should be installed in separate VMs, and each host machine should dedicate at least 4 vCPUs and 8 GB of memory per agent. This is a starting off point, and larger, long-running transfers may require additional CPU or memory. For those longer jobs, we advise that you monitor CPU utilization and unused memory closely to ensure optimal performance. You should provision more CPUs when utilization exceeds 70%. Similarly, you should be ready to provision additional memory when the agent has less than 1GB of unused memory.
Preparing your network for large-scale data transfer
The third and final area to consider is your network connectivity. While it can be easy to reduce this to the bandwidth between the source filesystem and the Google Cloud bucket, the network includes two other components that can be easier to configure: first, the network interface from the on-premises agents to the WAN; and second, the agents’ connection to the on-premises filesystem.
For the first component, the network interface from the on-premises agents to the WAN, the general guidance is to not let this become a bottleneck. Specifically, you should ensure that this interface is greater than or equal to the bandwidth you require to read from the filesystem, plus the upload bandwidth to write to Google Cloud. In other words, if you plan on moving 10 Gbps of data from on-premises to Google Cloud, you will need 20 Gbps of bandwidth between the on-premises agents to the WAN: 10 Gbps to read from the networked filesystem, and 10 Gbps to transfer and write to Google Cloud.
On-premises filesystems, and on-premises networks, come in many flavors. For the second component, how the agents connect to an on-premises filesystem, our general rule is to be mindful of latency and to test regularly. It is essential to ensure that agents run on machines that can access a networked filesystem with very low latency.
Finally, if you are trying to maximize transfer performance, make sure you’ve configured your network to avoid bandwidth restrictions between on-premises filesystem and Google Cloud that might impact transfer speed. Storage Transfer Service will allow you to cap the bandwidth used by transfer, making it easy to minimize any impact on other production applications. Consider using tools like lperf3, tcpdump, and gsutil to measure the network bandwidth available to upload to Cloud Storage.
In particular, gsutil is worth some additional detail. Gsutil is a Python tool that can help you perform a number of object storage management tasks in Google Cloud, including checking your agent’s connection to the Cloud Storage APIs. It can be installed via the Google Cloud SDK, or separately. In this case, you should also ensure that gsutil is available in the same on-premises VM as the Storage Transfer Service agent.
If you’d like to use gsutil to test connectivity to Google Cloud, here’s the command:
gsutil cp test.txt gs://my-bucket
my-bucket with the name of your Cloud Storage bucket.
CP is a copy command that lets you copy data from on-premises to the cloud. In this case, gsutil is copying a test document, test.txt, to ensure that you have a connection with the Google Cloud APIs. You will need to create a test document before running this command.
Test twice, transfer once
One overarching theme across all factors is that testing can help you understand performance. For many customers, a large-scale data transfer from an on-premises filesystem to Google Cloud is an unusual event. And as with any unusual event in enterprise IT, it is a great idea to make sure that each party - from network administrators to filesystem and storage experts to cloud architects - is able to test their domain multiple times, to ensure the event will proceed seamlessly.
As you fine tune your testing in advance of a data transfer, you may want to learn more about your options for obtaining more network bandwidth, orchestrating transfer from SMB filesystems, or even how to make the right choices to save money on object storage. And we plan to share more in our blog about how customers have used our transfer offerings in the months ahead. Advanced agent setup | Cloud Storage Transfer Service Documentation