This document describes advanced setup options for Transfer service for on-premises data, including:
- Copying data on CIFS or SMB volumes
- Using service account credentials
- Adjusting maximum agent memory
- Restricting agent directory access
- Coordinating agents with Kubernetes
- Using a Forward Proxy
Copying data on CIFS or SMB volumes
Transfer for on-premises agents aren't directly supported on Windows servers. However, you can move data stored on any POSIX-compliant file system by mounting it on a Linux server or virtual machine (VM), and then running an agent from the Linux server or VM to copy your data to Cloud Storage.
To move data from a CIFS or SMB volume:
Provision a Linux server or VM.
For supported operating systems, see Prerequisites.
Run the following command on the Linux server or VM you provisioned to mount the volume:
sudo mount -t cifs -o username=WINDOWS-SHARE-USER,password=WINDOWS-SHARE-PASSWORD //IP-ADDRESS/SHARE-NAME /mnt
Replace the following:
IP-ADDRESS
: the IP address of the Microsoft Windows server that the CIFS or SMB volume is located on.SHARE-NAME
: the share name you are mounting.WINDOWS-SHARE-USER
: an authorized user for accessing the CIFS or SMB volume.WINDOWS-SHARE-PASSWORD
: the password for the authorized user of the CIFS or SMB volume.
Confirm that the CIFS volume is mounted by running the following command:
findmnt -l
Confirm that the user that will run the agent can list and copy files on the mounted volume by running the following commands:
sudo -u USERNAME cp /mnt/FILE1 /mnt/FILE2
Replace the following:
USERNAME
: the user that will run the agent.FILE1
: the file to copying from.FILE2
: filename to copy to.
Using service account credentials
You can use service account credentials to run the agent. Using service account credentials provides you a way to authenticate the transfer agent without relying on a single user account. For more information about account types, see Principals.
Before using service account credentials with your agents, ensure that Transfer service for on-premises data is ready by verifying that:
To use service account credentials with your agents:
Create service account keys. For more information, see Creating and managing service account keys.
Start the agent Docker container by running the following command:
sudo docker run --ulimit memlock=64000000 -d --rm -v /:/transfer_root \ gcr.io/cloud-ingest/tsop-agent:latest \ --enable-mount-directory \ --project-id=PROJECT-ID \ --creds-file=CREDENTIAL-FILE \ --hostname=$(hostname) \ --agent-id-prefix=ID-PREFIX
Replace the following:
PROJECT-ID
: the project ID that is hosting the transfer and Pub/Sub resources are created and billed.CREDENTIAL-FILE
: a JSON-formatted service account credential file. For more information about generating a service account credential file, see creating and managing service account keys.ID-PREFIX
: the prefix that is prepended to the agent ID to help identify the agent or its machine in the Google Cloud Console. When a prefix is used, the agent ID is formatted asprefix + hostname + Docker container ID
.
Adjusting maximum agent memory
Transfer service for on-premises data agents default to using a maximum of 8GiB of system memory. You can
adjust the maximum memory used by the agents to fit your environment by passing
--max-physical-mem=MAXIMUM-MEMORY
, replacing
MAXIMUM-MEMORY
with a value that fits your environment.
- Minimum memory: 1GiB
- Minimum memory to support high-performance uploads: 6GiB
We recommend the default of 8GiB.
The following table describes examples of acceptable formats for
MAXIMUM-MEMORY
:
max-physical-memory value |
Maximum memory setting |
---|---|
6g |
6 gigabytes |
6gb |
6 gigabytes |
6GiB |
6 gibibytes |
Restricting agent directory access
To specify directories that the agent can access while performing a
transfer, pass
-v HOST-DIRECTORY:CONTAINER-DIRECTORY
to the agent, where:
HOST-DIRECTORY
is the directory on the host machine that you intend to copy from.CONTAINER-DIRECTORY
is the directory mapped within the agent container.
You can use more than one -v
flag to further specify directories to
copy from. For example:
sudo docker run --ulimit memlock=64000000 -d -rm --volumes-from gcloud-config \ -v /usr/local/research:/usr/local/research \ -v /usr/local/billing:/usr/local/billing \ -v /tmp:/tmp \ gcr.io/cloud-ingest/tsop-agent:latest \ --project-id=PROJECT-ID \ --hostname=$(hostname) \ --agent-id-prefix=ID-PREFIX
If you are using a service account, ensure that you mount the credentials file
into the container and pass the
--creds-file=CREDENTIAL-FILE
. For example:
sudo docker run --ulimit memlock=64000000 -d -rm \ -v HOST-DIRECTORY:CONTAINER-DIRECTORY \ -v /tmp:/tmp -v FULL-CREDENTIAL-FILE-PATH:FULL-CREDENTIAL-FILE-PATH gcr.io/cloud-ingest/tsop-agent:latest \ --project-id=PROJECT-ID \ --creds-file=CREDENTIAL-FILE \ --hostname=$(hostname) \ --agent-id-prefix=ID-PREFIX
Replace the following:
HOST-DIRECTORY
: the directory on the host machine that you intend to copy from.CONTAINER-DIRECTORY
: the directory mapped within the agent container.FULL-CREDENTIAL-FILE-PATH
: the fully-qualified path to the credentials file.PROJECT-ID
: the project ID that is hosting the transfer and Pub/Sub resources are created and billed.CREDENTIAL-FILE
: a JSON-formatted service account credential file. For more information about generating a service account credential file, see creating and managing service account keys.ID-PREFIX
: the prefix that is prepended to the agent ID to help identify the agent or its machine in the Google Cloud Console. When a prefix is used, the agent ID is formatted asprefix + hostname + Docker container ID
.
Coordinating agents with Kubernetes
Docker is a supported container runtime for Kubernetes. You can use Kubernetes to orchestrate starting and stopping many agents simultaneously. From Kubernetes perspective, the agent container is considered a stateless application, so you can follow Kubernetes instructions for deploying a stateless application.
Using private API endpoints in Cloud Interconnect
To use private API endpoints in Cloud Interconnect:
Log into the on-premises host that you intend to run the agent.
Configure Private Google Access. For more information, see Configuring Private Google Access for on-premises hosts.
Confirm that you can connect to Cloud Storage APIs and Pub/Sub APIs:
- For Cloud Storage APIs, run the following command from the same
machine as the transfer agent to test moving a file into your
Cloud Storage bucket:
gsutil cp test.txt gs://MY-BUCKET
whereMY-BUCKET
is the name of your Cloud Storage bucket. If the transfer works, the test is successful. - For Pub/Sub APIs, run the following command from the same machine as
the transfer agent to confirm that you can find existing
Pub/Sub topics:
gcloud pubsub topics list --project=PROJECT-ID
wherePROJECT-ID
is the Google Cloud project name. If a list of Pub/Sub topics is displayed, the test is successful.
- For Cloud Storage APIs, run the following command from the same
machine as the transfer agent to test moving a file into your
Cloud Storage bucket:
Using a forward proxy
Transfer service for on-premises data agents support using a forward proxy on your network by passing the
HTTPS_PROXY
environment variable.
For example:
sudo docker run -d --ulimit memlock=64000000 --rm \ --volumes-from gcloud-config \ -v /:/transfer_root \ --env HTTPS_PROXY=PROXY\ gcr.io/cloud-ingest/tsop-agent:latest \ --enable-mount-directory \ --project-id=PROJECT-ID \ --hostname=$(hostname) \ --agent-id-prefix=ID-PREFIX
Replace the following:
PROXY
: the HTTP URL and port of the proxy server. Ensure that you specify the HTTP URL, and not an HTTPS URL, to avoid double-wrapping requests in TLS encryption. Double-wrapped requests prevents the proxy server from sending valid outbound requests.PROJECT-ID
: the project ID that is hosting the transfer and Pub/Sub resources are created and billed.ID-PREFIX
: the prefix that is prepended to the agent ID to help identify the agent or its machine in the Google Cloud Console. When a prefix is used, the agent ID is formatted asprefix + hostname + Docker container ID
.