Advanced agent setup

This document describes advanced setup options for Transfer service for on-premises data, including:

Copying data on CIFS or SMB volumes

Transfer for on-premises agents aren't directly supported on Windows servers. However, you can move data stored on any POSIX-compliant file system by mounting it on a Linux server or virtual machine (VM), and then running an agent from the Linux server or VM to copy your data to Cloud Storage.

To move data from a CIFS or SMB volume:

  1. Provision a Linux server or VM.

    For supported operating systems, see Prerequisites.

  2. Run the following command on the Linux server or VM you provisioned to mount the volume:

    sudo mount -t cifs -o
    username=WINDOWS-SHARE-USER,password=WINDOWS-SHARE-PASSWORD //IP-ADDRESS/SHARE-NAME /mnt
    

    Replace the following:

    • IP-ADDRESS: the IP address of the Microsoft Windows server that the CIFS or SMB volume is located on.
    • SHARE-NAME: the share name you are mounting.
    • WINDOWS-SHARE-USER: an authorized user for accessing the CIFS or SMB volume.
    • WINDOWS-SHARE-PASSWORD: the password for the authorized user of the CIFS or SMB volume.
  3. Confirm that the CIFS volume is mounted by running the following command:

    findmnt -l
    
  4. Confirm that the user that will run the agent can list and copy files on the mounted volume by running the following commands:

    sudo -u USERNAME cp /mnt/FILE1 /mnt/FILE2
    

    Replace the following:

    • USERNAME: the user that will run the agent.
    • FILE1: the file to copying from.
    • FILE2: filename to copy to.
  5. Install the Transfer for on-premises agent.

Using service account credentials

You can use service account credentials to run the agent. Using service account credentials provides you a way to authenticate the transfer agent without relying on a single user account. For more information about account types, see Principals.

Before using service account credentials with your agents, ensure that Transfer service for on-premises data is ready by verifying that:

  1. Setting up Transfer for on-premises.

  2. A transfer job exists.

To use service account credentials with your agents:

  1. Stop all agent containers

  2. Create service account keys. For more information, see Creating and managing service account keys.

  3. Start the agent Docker container by running the following command:

    sudo docker run --ulimit memlock=64000000 -d --rm -v /:/transfer_root \
    gcr.io/cloud-ingest/tsop-agent:latest \
    --enable-mount-directory \
    --project-id=PROJECT-ID \
    --creds-file=CREDENTIAL-FILE \
    --hostname=$(hostname) \
    --agent-id-prefix=ID-PREFIX
    

    Replace the following:

  • PROJECT-ID: the project ID that is hosting the transfer and Pub/Sub resources are created and billed.
  • CREDENTIAL-FILE: a JSON-formatted service account credential file. For more information about generating a service account credential file, see creating and managing service account keys.
  • ID-PREFIX: the prefix that is prepended to the agent ID to help identify the agent or its machine in the Google Cloud Console. When a prefix is used, the agent ID is formatted as prefix + hostname + Docker container ID.

Adjusting maximum agent memory

Transfer service for on-premises data agents default to using a maximum of 8GiB of system memory. You can adjust the maximum memory used by the agents to fit your environment by passing --max-physical-mem=MAXIMUM-MEMORY, replacing MAXIMUM-MEMORY with a value that fits your environment.

The following are memory requirements for Transfer service for on-premises data agents:
  • Minimum memory: 1GiB
  • Minimum memory to support high-performance uploads: 6GiB

We recommend the default of 8GiB.

The following table describes examples of acceptable formats for MAXIMUM-MEMORY:

max-physical-memory value Maximum memory setting
6g 6 gigabytes
6gb 6 gigabytes
6GiB 6 gibibytes

Restricting agent directory access

To specify directories that the agent can access while performing a transfer, pass -v HOST-DIRECTORY:CONTAINER-DIRECTORY to the agent, where:

  • HOST-DIRECTORY is the directory on the host machine that you intend to copy from.
  • CONTAINER-DIRECTORY is the directory mapped within the agent container.

You can use more than one -v flag to further specify directories to copy from. For example:

sudo docker run --ulimit memlock=64000000 -d -rm --volumes-from gcloud-config \
-v /usr/local/research:/usr/local/research \
-v /usr/local/billing:/usr/local/billing \
-v /tmp:/tmp \
gcr.io/cloud-ingest/tsop-agent:latest \
--project-id=PROJECT-ID \
--hostname=$(hostname) \
--agent-id-prefix=ID-PREFIX

If you are using a service account, ensure that you mount the credentials file into the container and pass the --creds-file=CREDENTIAL-FILE. For example:

sudo docker run --ulimit memlock=64000000 -d -rm \
-v HOST-DIRECTORY:CONTAINER-DIRECTORY \
-v /tmp:/tmp
-v FULL-CREDENTIAL-FILE-PATH:FULL-CREDENTIAL-FILE-PATH
gcr.io/cloud-ingest/tsop-agent:latest \
--project-id=PROJECT-ID \
--creds-file=CREDENTIAL-FILE \
--hostname=$(hostname) \
--agent-id-prefix=ID-PREFIX

Replace the following:

  • HOST-DIRECTORY: the directory on the host machine that you intend to copy from.
  • CONTAINER-DIRECTORY: the directory mapped within the agent container.
  • FULL-CREDENTIAL-FILE-PATH: the fully-qualified path to the credentials file.
  • PROJECT-ID: the project ID that is hosting the transfer and Pub/Sub resources are created and billed.
  • CREDENTIAL-FILE: a JSON-formatted service account credential file. For more information about generating a service account credential file, see creating and managing service account keys.
  • ID-PREFIX: the prefix that is prepended to the agent ID to help identify the agent or its machine in the Google Cloud Console. When a prefix is used, the agent ID is formatted as prefix + hostname + Docker container ID.

Coordinating agents with Kubernetes

Docker is a supported container runtime for Kubernetes. You can use Kubernetes to orchestrate starting and stopping many agents simultaneously. From Kubernetes perspective, the agent container is considered a stateless application, so you can follow Kubernetes instructions for deploying a stateless application.

Using private API endpoints in Cloud Interconnect

To use private API endpoints in Cloud Interconnect:

  1. Log into the on-premises host that you intend to run the agent.

  2. Configure Private Google Access. For more information, see Configuring Private Google Access for on-premises hosts.

  3. Confirm that you can connect to Cloud Storage APIs and Pub/Sub APIs:

    1. For Cloud Storage APIs, run the following command from the same machine as the transfer agent to test moving a file into your Cloud Storage bucket: gsutil cp test.txt gs://MY-BUCKET where MY-BUCKET is the name of your Cloud Storage bucket. If the transfer works, the test is successful.
    2. For Pub/Sub APIs, run the following command from the same machine as the transfer agent to confirm that you can find existing Pub/Sub topics: gcloud pubsub topics list --project=PROJECT-ID where PROJECT-ID is the Google Cloud project name. If a list of Pub/Sub topics is displayed, the test is successful.

Using a forward proxy

Transfer service for on-premises data agents support using a forward proxy on your network by passing the HTTPS_PROXY environment variable.

For example:

sudo docker run -d --ulimit memlock=64000000 --rm \
--volumes-from gcloud-config \
-v /:/transfer_root \
--env HTTPS_PROXY=PROXY\
gcr.io/cloud-ingest/tsop-agent:latest \
--enable-mount-directory \
--project-id=PROJECT-ID \
--hostname=$(hostname) \
--agent-id-prefix=ID-PREFIX

Replace the following:

  • PROXY: the HTTP URL and port of the proxy server. Ensure that you specify the HTTP URL, and not an HTTPS URL, to avoid double-wrapping requests in TLS encryption. Double-wrapped requests prevents the proxy server from sending valid outbound requests.
  • PROJECT-ID: the project ID that is hosting the transfer and Pub/Sub resources are created and billed.
  • ID-PREFIX: the prefix that is prepended to the agent ID to help identify the agent or its machine in the Google Cloud Console. When a prefix is used, the agent ID is formatted as prefix + hostname + Docker container ID.