This page describes best practices for Transfer service for on-premises data agents.
Performance best practices
The following are best practices for ensuring good transfer performance:
Benchmark your performance by transferring a large data corpus, typically at least 100 GB in size.
Transfer service for on-premises data is a large-scale, throughput-optimized service, so your performance on very small test data sets is not indicative of your performance on large data sets in production.
Run agents in separate virtual machines (VMs) so that you can scale your resource consumption more effectively.
Verify that the network interface on the agent machines is sized for the read/write bandwidth you need.
For example, if you intend to fully utilize a 20 Gbps wide-area network (WAN), your agent machine's network interface must support 20 Gbps to read data from your networked file system, and another 20 Gbps to transfer data to Cloud Storage, or 40 Gbps of total bandwidth.
Monitor the CPU, memory, and network on agent machines to ensure that the machines aren't overwhelmed by other workloads, as this can negatively affect performance. We expect agent machines to have at least 8 GB of memory per container and at least four CPUs to be most effective.
Setup best practices
This section describes the following best practices for setting up your agents:
Maximizing transfer agent performance
Your transfer performance is affected by the following variables:
File system capabilities.
Underlying hardware limitations.
The hard drive media type, input/output bus, and local area network (LAN) connectivity all affect performance.
WAN throughput and utilization.
A slower or highly utilized WAN slows performance.
For example, many large files have a higher network throughput than many small files due to networking overhead.
Because of these variables, we can't predict actual performance or provide an optimal number of agents to use.
At a minimum, we recommend that you use three agents, across different machines if possible, so that your transfer remains fault-tolerant. You can add transfer agents while transfers are running, as performance dynamically increases.
To observe the impact of adding agents, and to choose the number of agents that works best for your environment, do the following:
Start a large transfer that takes at least 1 hour to run. For example, start a transfer that contains at least 100k files and is at least 100 GB in total size.
Wait for the throughput to level off, and determine if you are limited by your WAN capacity or your bandwidth cap.
If you haven't saturated your WAN capacity, and you haven't reached your desired transfer limit, add another agent. The additional agent automatically increases transfer throughput. Wait approximately 3 minutes to throughput stabilize in Cloud Monitoring.
Repeat steps 3 and 4, adding one agent at a time until you reach your desired limit. As long as computational, file system, and network resources are available, you can run up to 100 agents concurrently per transfer project.
If you saturate your outbound bandwidth before you reach your desired limit, you can do any of the following:
If you've added agents, but the throughput isn't increasing and your WAN isn't saturated, investigate the file system throughput. In rare cases the file system throughput is saturated, hampering your ability to increase your transfer performance.
When naming agents, we recommend that you do the following:
Always include the hostname in your agent. This helps you find the machine an agent is running on. We recommend that you pass
--hostname=$(hostname)to the Docker
Choose an agent prefix scheme that helps you identify agents in the context of your monitoring and infrastructure organization. For example:
If you have three separate transfer projects, you may want to include the team name in your agent. For example,
If you are running two different transfer projects for two different data centers, you may want to include the data center name in the agent prefix. For example,