Managing Transfer for on-premises jobs

Before you can start a transfer, you must create a transfer job and have a one or more agents installed and connected to the transfer job. This document describes how to create your transfer job, install transfer agents, and how to manage your transfer jobs.

Prerequisites

To use Transfer for on-premises, you need:

  • A POSIX-compliant source.

  • Network connection that is 300Mbps or faster.

  • A Docker-supported 64-bit Linux server or virtual machine that can access the data you plan to transfer.

    Docker Community Edition, supports CentOs, Debian, Fedora, and Ubuntu operating systems.

    To use other Linux operating systems, see Docker Enterprise.

  • A Cloud Storage bucket without a retention policy.

    To transfer to a bucket with a retention policy, we recommend the following process:

    1. Create a Cloud Storage bucket within the same region as the final bucket. Ensure that this temporary bucket does not have a retention policy.

      For more information about regions, see Bucket locations.

    2. Use Transfer service for on-premises data to transfer your data to the temporary bucket you created without a retention policy.

    3. Perform a bucket-to-bucket transfer to transfer the data to the bucket with a retention policy.

    4. Delete the Cloud Storage bucket that you created to temporarily store your data.

  • Complete Transfer for on-premises first-time setup.

Before you start a transfer, verify that:

  • TCP ports 80 (HTTP) and 443 (HTTPS) are open for outbound connections.
  • All agent processes within a single Google Cloud project have the same filesystem mounted at the same mount point.

Scaling restrictions on jobs and agents

Transfer for on-premises has the following scale restrictions on transfer jobs and agents:

  • Fewer than one billion files per job
  • 100 agents or fewer per transfer project
  • Bandwidth cap must be over 1MBps

Creating a transfer job

Before you can start a transfer, you must create a transfer job. The transfer job coordinates and controls your on-premises agents as they move your data.

To create a transfer job:

  1. Go to the Transfer service for on-premises data Web Console page in the Google Cloud Console.

    Go to the Transfer service for on-premises data Page

  2. Click Create Transfer Job.

    The Create a transfer job page is displayed.

  3. Describe the transfer job. Enter a short description of your transfer that will help you track it.

  4. Specify a source by entering the fully qualified path of the source file system directory.

  5. Specify a Cloud Storage destination bucket. You can enter a Cloud Storage bucket name, or you can create a new bucket.

    To create and select new bucket:

    1. Click Browse.

    2. Click New bucket.

      The Create a bucket form is displayed.

    3. Complete the form, and then click Create and then Select.

  6. Optional: Enter an Object prefix. Without an object prefix, objects are transferred to Cloud Storage with the source's path, not including the root path, before the filename on the file system. For example, if you have the following files:

    • /source_root_path/file1.txt
    • /source_root_path/dirA/file2.txt
    • /source_root_path/dirA/dirB/file3.txt
    Then the object names in Cloud Storage are:
    • file1.txt
    • dirA/file2.txt
    • dirA/dirB/file3.txt
    The object prefix is added to the object's destination name in Cloud Storage. The prefix is added after the / character of the destination bucket name and before any paths that the object was transferred from, not including the source's root path. This prefix can help you distinguish between objects transferred from other transfer jobs.

    The following table demonstrates several examples of object prefixes and their resulting object names in Cloud Storage, if the source object's path is /source_root_path/sub_folder_name/object_name:
    Prefix Destination object name
    None /destination_bucket/sub_folder_name/object_name
    prefix /destination_bucket/prefixsub_folder_name/object_name
    prefix- /destination_bucket/prefix-sub_folder_name/object_name
    prefix/ /destination_bucket/prefix/sub_folder_name/object_name

  7. Optional: Create a schedule for your job.

  8. Click Create.

If you haven't already done so, install and run on-premises transfer agents on each of your machines.

Controlling bandwidth usage for Transfer service for on-premises data

Bandwidth limits are helpful if you need to limit how much data Transfer service for on-premises data uses to transfer data to Cloud Storage. Using a bandwidth limit helps ensure that:

  • Your network up-links are not saturated as a result of using Transfer service for on-premises data.

  • Your organization's existing application behavior doesn't degrade during the transfer.

  • If you're on a network connection that charges by peak bandwidth usage, that you don't cause a sudden price increase.

Bandwidth limits apply to an entire project.

Setting a bandwidth limit

To set a bandwidth limit:

  1. Go to the Transfer service for on-premises data Connection Settings page in Google Cloud Console.

    Go to the Transfer service for on-premises data Connection Settings Page

  2. Click Set Bandwidth Limit.

  3. The Set this project's bandwidth limit pane is displayed.

  4. In the Bandwidth limit text box, enter the desired network limit in megabytes per second (MB/s) and click Set Bandwidth Limit

    The bandwidth limit is displayed for the project.

Editing a bandwidth limit

To edit an existing bandwidth limit:

  1. Go to the Transfer service for on-premises data Connection Settings page in Google Cloud Console.

    Go to the Transfer service for on-premises data Connection Settings Page

  2. In the displayed bandwidth limit, click Edit.

  3. In the Bandwidth limit text box, enter the desired network limit in megabytes per second (MB/s) and click Set Bandwidth Limit

    The bandwidth limit is displayed for the project.

Removing a bandwidth limit

To remove an existing bandwidth limit:

  1. Go to the Transfer service for on-premises data Connection Settings page in Google Cloud Console.

    Go to the Transfer service for on-premises data Connection Settings Page

  2. In the displayed bandwidth limit, click Use All Bandwidth.

  3. To confirm that you wish to remove the existing limit, click Confirm.

Monitoring jobs

You can monitor your Transfer service for on-premises data jobs to ensure they're working as expected.

To monitor your transfer jobs:

  1. Go to the Transfer service for on-premises data Transfer Jobs page in Google Cloud Console.

    Go to the Transfer service for on-premises data Transfer Jobs Page

    A list of jobs is displayed. This list includes both running and completed jobs.

  2. To display detailed information on a transfer job, click the Job description for the job you're interested in.

    The Job details page is displayed.

The Job details page displays the following:

  • How much data has been transferred.

  • Configuration information about the transfer job.

  • Scheduled or recurring job information.

  • Details of the most recent job run.

  • History of all past job runs.

Filtering jobs

If you have many jobs and wish to monitor a subset of them, consider using filters to sort and display only the jobs you are interested in.

To filter your transfer jobs:

  1. Click Filter List .

  2. Select the filters you wish to apply.

Editing job configurations

You can edit the following items for an existing transfer job:

  • The job description
  • Sync option
  • Schedule

To edit a job configuration:

  1. Go to the Transfer service for on-premises data Transfer Jobs page in Google Cloud Console.

    Go to the Transfer service for on-premises data Transfer Jobs Page

  2. Click the Job description for the job you're editing.

    The Job details page is displayed.

  3. Click Configuration.

  4. Click beside the configuration item you wish to edit.

Re-running jobs

Transfer service for on-premises data supports running a completed job a single time again. This can be helpful if you have some additional data to move, and you'd like to reuse an existing job configuration.

To re-run a job:

  1. Go to the Transfer service for on-premises data Transfer Jobs page in Google Cloud Console.

    Go to the Transfer service for on-premises data Transfer Jobs Page

  2. Click the Job description for the job you're editing.

    The Job details page is displayed.

  3. Click Run again.

    The job starts.

Viewing errors

To view a sample of errors encountered during the transfer:

  1. Go to the Transfer service for on-premises data Transfer Jobs page in Google Cloud Console.

    Go to the Transfer service for on-premises data Transfer Jobs Page

  2. Click the Job description for the job you're editing.

    The Job details page is displayed.

  3. Click View error details.

    The Error details page is displayed, which shows a sample of errors encountered during the transfer.

Viewing transfer logs

Transfer service for on-premises data produces detailed transfer logs that you can use to verify the results of your transfer job. Each job produces a collection of transfer logs that are stored in the destination Cloud Storage bucket.

Logs are produced while the transfer job is running. The complete logs are typically available within 15 minutes of job completion.

You can view logs in either of the following:

Viewing errors within the Google Cloud Console

To display all errors encountered during the transfer within Google Cloud console:

  1. Click View transfer logs.

    The Bucket details page is displayed. This is a destination in your Cloud Storage bucket.

  2. Click on the transfer log you are interested in.

    The transfer logs are displayed. For more information, see transfer log format.

Viewing logs in the destination bucket

Transfer logs are stored in the destination bucket at the following path:

destination-bucket-name/storage-transfer/logs/transferJobs/job-name/transferOperations/operation-name

where:

  • destination-bucket-name is the name of the job's destination Cloud Storage bucket.
  • job-name is the job name, as displayed in the job list.
  • operation-name is the name of the individual transfer operation, comprised as the IS08601 timestamp and generated ID.

Logs are aggregated and stored as objects. Each batch of logs is named by its creation time. For example:

my bucket/storage-transfer/logs/transferOperations/job1/2019-10-19T10_52_56.519081644-07_00.log

The transfer logs are displayed. For more information, see transfer log format.

Running BigQuery queries on transfer logs

To run BigQuery queries on your transfer logs:

  1. Load the CSV log data into BigQuery.

  2. Run your BigQuery query.

Example queries

Display number of files attempted transfer and failed/success status

select ActionStatus, count(*) as num_files
from big-query-table
where Action="TRANSFER"
group by 1;

Where big-query-table is the name of the BigQuery table that contains the transfer log.

Display all files that failed to transfer

select Src_File_Path  
from big-query-table
where Action="TRANSFER" and ActionStatus="FAILED";

Where big-query-table is the name of the BigQuery table that contains the transfer log.

Display checksum and timestamp for each file that successfully transferred

select Timestamp, Action, ActionStatus, Src_File_Path, Src_File_Size,
Src_File_Crc32C, Dst_Gcs_BucketName, Dst_Gcs_ObjectName, Dst_Gcs_Size,
Dst_Gcs_Crc32C, Dst_Gcs_Md5
from big-query-table
where Action="TRANSFER" and ActionStatus="SUCCEEDED";

Where big-query-table is the name of the BigQuery table that contains the transfer log.

Display all error information for directories that failed to transfer

select FailureDetails_ErrorType, FailureDetails_GrpcCode, FailureDetails_Message
from big-query-table
where Action="FIND" and ActionStatus="FAILED";

Where big-query-table is the name of the BigQuery table that contains the transfer log.