This document describes further details for using Transfer service for on-premises data, such:
- Setup best practices
- Advanced setup options
- Using private API endpoints
Setup best practices
This section describes the following best practices for setting up your agents:
Determining the number of agents to run
We can't provide specific guidance on how many agents to run for a particular use case, because performance varies significantly based on the data source corpus. At a minimum, we recommend that you use three agents, across different machines if possible, so that your transfer remains fault-tolerant.
Typically, many large files will have a higher network throughput than many small files. When measuring performance we recommend the following:
Test with a small transfer before you start a large data migration.
Increase the number of agents until the outbound bandwidth is saturated or you no longer see any gains in the bandwidth used, up to 100 agents.
If your initial sizing is either too small or too large, you can start and stop agent processes while transfers are running. Performance will adjust dynamically without any other changes. As long as computational and file system system resources are available, you can continue to run up to 100 agents concurrently per transfer project.
When naming agents, we recommend that you:
Always include the hostname in your agent. This will help you find the machine an agent is running on. We recommend that you pass
--hostname=$(hostname)to the Docker run command.
Chose an agent prefix scheme that helps you identify agents in the context of your monitoring and infrastructure organization. For example:
If you have three separate transfer projects, you may want to include the team name in your agent. For example, "logistics".
If you are running two different transfer projects for two different data centers, you may want to include the data center name in the agent prefix. For example, "omaha".
On-premises advanced setup
This section describes the following advanced setup options:
- Copying data on CIFS or SMB volumes
- Using service account credentials
- Restricting agent directory access
- Coordinating agents with Kubernetes
Copying data on CIFS or SMB volumes
Transfer for on-premises agents aren't directly supported on Windows servers. However, you can move data stored on any POSIX-compliant filesystem by mounting it on a Linux server or virtual machine (VM), and then running an agent from the Linux server or VM to copy your data to Cloud Storage.
To move data from a CIFS or SMB volume:
Provision a Linux server or VM.
For supported operating systems, see Prerequisites.
Run the following command on the Linux server or VM you provisioned to mount the volume:
sudo mount -t cifs -o username=windows-share-user,password=windows-share-password//ip-address/share-name /mnt
ip-address- the IP address of the Microsoft Windows server that the CIFS or SMB volume is located on.
share-name- the share name you are mounting.
windows-share-user- an authorized user for accessing the CIFS or SMB volume.
windows-share-password- the password for the authorized user of the CIFS or SMB volume.
Confirm that the CIFS volume is mounted by running the following command:
Confirm that the user that will run the agent can list and copy files on the mounted volume by running the following commands:
sudo -u username cp /mnt/file1 /mnt/file2
username- user that will run the agent.
file1- file to copying from.
file2- filename to copy to.
Using service account credentials
You can use service account credentials to run the agent. Using service account credentials provides you a way to authenticate the transfer agent without relying on a single user account. For more information about account types, see Principals.
Before using service account credentials with your agents, ensure that Transfer service for on-premises data is ready by verifying that:
To use service account credentials with your agents:
Create service account keys. For more information, see Creating and managing service account keys.
Start the agent Docker container by running the following command:
sudo docker run --ulimit memlock=64000000 -d --rm -v /:/transfer_root
project-idis the project ID that is hosting the transfer and Pub/Sub resources are created and billed.
credential-fileis a JSON-formatted service account credential file. For more information about generating a service account credential file, see creating and managing service account keys.
id-prefixis a prefix that is prepended to the agent ID to help identify the agent or its machine in the Google Cloud Console. When a prefix is used, the agent ID is formatted as
prefix + hostname + Docker container ID.
Restricting agent directory access
To specify directories that the agent can access while performing a
to the agent, where:
host-directoryis the directory on the host machine that you intend to copy from.
container-directoryis the directory mapped within the agent container.
You can use more than one
-v flag to further specify directories to
copy from. For example:
sudo docker run --ulimit memlock=64000000 -d -rm --volumes-from gcloud-config \ -v /usr/local/research:/usr/local/research \ -v /usr/local/billing:/usr/local/billing \ -v /tmp:/tmp \ gcr.io/cloud-ingest/tsop-agent:latest \ --project-id=project-id \ --hostname=$(hostname) \ --agent-id-prefix=id-prefix
If you are using a service account, ensure that you mount the credentials file
into the container and pass the
--creds-file=credential-file. For example:
sudo docker run --ulimit memlock=64000000 -d -rm \ -v host-directory:container-directory \ -v /tmp:/tmp -v full-credential-file-path:full-credential-file-path gcr.io/cloud-ingest/tsop-agent:latest \ --project-id=project-id \ --creds-file=credential-file \ --hostname=$(hostname) \ --agent-id-prefix=agent-prefix
Coordinating agents with Kubernetes
Docker is a supported container runtime for Kubernetes. You can use Kubernetes to orchestrate starting and stopping many agents simultaneously. From Kubernetes perspective, the agent container is considered a stateless application, so you can follow Kubernetes instructions for deploying a stateless application.
Using private API endpoints in Cloud Interconnect
To use private API endpoints in Cloud Interconnect:
Log into the on-premises host that you intend to run the agent.
Configure Private Google Access. For more information, see Configuring Private Google Access for on-premises hosts.
Confirm that you can connect to Cloud Storage APIs and Pub/Sub APIs:
- For Cloud Storage APIs, run the following command from the same
machine as the transfer agent to test moving a file into your
Cloud Storage bucket:
gsutil cp test.txt gs://my-bucketwhere
my-bucketis the name of your Cloud Storage bucket. If the transfer works, the test is successful.
- For Pub/Sub APIs, run the following command from the same machine as
the transfer agent to confirm that you can find existing
gcloud pubsub topics list --project=project-idwhere
project-idis the Google Cloud project name. If a list of Pub/Sub topics is displayed, the test is successful.
- For Cloud Storage APIs, run the following command from the same machine as the transfer agent to test moving a file into your Cloud Storage bucket: