Advanced agent setup

This document describes further details for using Transfer service for on-premises data, such:

  • Setup best practices
  • Advanced setup options
  • Using private API endpoints

Setup best practices

This section describes the following best practices for setting up your agents:

Determining the number of agents to run

We can't provide specific guidance on how many agents to run for a particular use case, because performance varies significantly based on the data source corpus. At a minimum, we recommend that you use three agents, across different machines if possible, so that your transfer remains fault-tolerant.

Typically, many large files will have a higher network throughput than many small files. When measuring performance we recommend the following:

  • Test with a small transfer before you start a large data migration.

  • Increase the number of agents until the outbound bandwidth is saturated or you no longer see any gains in the bandwidth used, up to 100 agents.

If your initial sizing is either too small or too large, you can start and stop agent processes while transfers are running. Performance will adjust dynamically without any other changes. As long as computational and file system system resources are available, you can continue to run up to 100 agents concurrently per transfer project.

Naming agents

When naming agents, we recommend that you:

  • Always include the hostname in your agent. This will help you find the machine an agent is running on. We recommend that you pass --hostname=$(hostname) to the Docker run command.

  • Chose an agent prefix scheme that helps you identify agents in the context of your monitoring and infrastructure organization. For example:

    • If you have three separate transfer projects, you may want to include the team name in your agent. For example, "logistics".

    • If you are running two different transfer projects for two different data centers, you may want to include the data center name in the agent prefix. For example, "omaha".

On-premises advanced setup

This section describes the following advanced setup options:

Copying data on CIFS or SMB volumes

Transfer for on-premises agents aren't directly supported on Windows servers. However, you can move data stored on any POSIX-compliant filesystem by mounting it on a Linux server or virtual machine (VM), and then running an agent from the Linux server or VM to copy your data to Cloud Storage.

To move data from a CIFS or SMB volume:

  1. Provision a Linux server or VM.

    For supported operating systems, see Prerequisites.

  2. Run the following command on the Linux server or VM you provisioned to mount the volume:

    sudo mount -t cifs -o
    username=windows-share-user,password=windows-share-password//ip-address/share-name /mnt
    

    Where:

    • ip-address - the IP address of the Microsoft Windows server that the CIFS or SMB volume is located on.
    • share-name - the share name you are mounting.
    • windows-share-user - an authorized user for accessing the CIFS or SMB volume.
    • windows-share-password - the password for the authorized user of the CIFS or SMB volume.
  3. Confirm that the CIFS volume is mounted by running the following command:

    findmnt -l
    
  4. Confirm that the user that will run the agent can list and copy files on the mounted volume by running the following commands:

    sudo -u username cp /mnt/file1 /mnt/file2
    

    Where:

    • username - user that will run the agent.
    • file1 - file to copying from.
    • file2 - filename to copy to.
  5. Install the Transfer for on-premises agent.

Using service account credentials

You can use service account credentials to run the agent. Using service account credentials provides you a way to authenticate the transfer agent without relying on a single user account. For more information about account types, see Principals.

Before using service account credentials with your agents, ensure that Transfer service for on-premises data is ready by verifying that:

  1. First-time setup is complete.

  2. A transfer job exists.

To use service account credentials with your agents:

  1. Stop all agent containers

  2. Create service account keys. For more information, see Creating and managing service account keys.

  3. Start the agent Docker container by running the following command:

    sudo docker run --ulimit memlock=64000000 -d --rm -v /:/transfer_root
    gcr.io/cloud-ingest/tsop-agent:latest
    --enable-mount-directory
    --project-id=project-id
    --creds-file=credential-file
    --hostname=$(hostname)
    --agent-id-prefix=agent-id-prefix

    where:

  • project-idis the project ID that is hosting the transfer and Pub/Sub resources are created and billed.

  • credential-file is a JSON-formatted service account credential file. For more information about generating a service account credential file, see creating and managing service account keys.

  • id-prefix is a prefix that is prepended to the agent ID to help identify the agent or its machine in the Google Cloud Console. When a prefix is used, the agent ID is formatted as prefix + hostname + Docker container ID.

Restricting agent directory access

To specify directories that the agent can access while performing a transfer, pass -v host-directory:container-directory to the agent, where:

  • host-directory is the directory on the host machine that you intend to copy from.
  • container-directory is the directory mapped within the agent container.

You can use more than one -v flag to further specify directories to copy from. For example:

sudo docker run --ulimit memlock=64000000 -d -rm --volumes-from gcloud-config \
-v /usr/local/research:/usr/local/research \
-v /usr/local/billing:/usr/local/billing \
-v /tmp:/tmp \
gcr.io/cloud-ingest/tsop-agent:latest \
--project-id=project-id \
--hostname=$(hostname) \
--agent-id-prefix=id-prefix

If you are using a service account, ensure that you mount the credentials file into the container and pass the --creds-file=credential-file. For example:

sudo docker run --ulimit memlock=64000000 -d -rm \
-v host-directory:container-directory \
-v /tmp:/tmp
-v full-credential-file-path:full-credential-file-path
gcr.io/cloud-ingest/tsop-agent:latest \
--project-id=project-id \
--creds-file=credential-file \
--hostname=$(hostname) \
--agent-id-prefix=agent-prefix

Coordinating agents with Kubernetes

Docker is a supported container runtime for Kubernetes. You can use Kubernetes to orchestrate starting and stopping many agents simultaneously. From Kubernetes perspective, the agent container is considered a stateless application, so you can follow Kubernetes instructions for deploying a stateless application.

Using private API endpoints in Cloud Interconnect

To use private API endpoints in Cloud Interconnect:

  1. Log into the on-premises host that you intend to run the agent.

  2. Configure Private Google Access. For more information, see Configuring Private Google Access for on-premises hosts.

  3. Confirm that you can connect to Cloud Storage APIs and Pub/Sub APIs:

    1. For Cloud Storage APIs, run the following command from the same machine as the transfer agent to test moving a file into your Cloud Storage bucket: gsutil cp test.txt gs://my-bucket where my-bucket is the name of your Cloud Storage bucket. If the transfer works, the test is successful.
    2. For Pub/Sub APIs, run the following command from the same machine as the transfer agent to confirm that you can find existing Pub/Sub topics: gcloud pubsub topics list --project=project-id where project-id is the Google Cloud project name. If a list of Pub/Sub topics is displayed, the test is successful.