Managing Transfer for on-premises jobs

Before you can start a transfer, you must create a transfer job and have a one or more agents installed and connected to the transfer job. This document describes how to create your transfer job, install transfer agents, and how to manage your transfer jobs.

Prerequisites

To use Transfer for on-premises, you need:

  • A POSIX-compliant source.

  • Network connection that is 300Mbps or faster.

  • A Docker-supported 64-bit Linux server or virtual machine that can access the data you plan to transfer.

    Docker Community Edition, supports CentOs, Debian, Fedora, and Ubuntu operating systems.

    To use other Linux operating systems, see Docker Enterprise.

  • A Cloud Storage bucket without a retention policy.

    To transfer to a bucket with a retention policy, we recommend the following process:

    1. Create a Cloud Storage bucket within the same region as the final bucket. Ensure that this temporary bucket does not have a retention policy.

      For more information about regions, see Bucket locations.

    2. Use Transfer service for on-premises data to transfer your data to the temporary bucket you created without a retention policy.

    3. Perform a bucket-to-bucket transfer to transfer the data to the bucket with a retention policy.

    4. Delete the Cloud Storage bucket that you created to temporarily store your data.

  • Complete Transfer for on-premises first-time setup.

Before you start a transfer, verify that:

  • TCP ports 80 (HTTP) and 443 (HTTPS) are open for outbound connections.
  • All agent processes within a single Google Cloud project have the same filesystem mounted at the same mount point.

Scaling restrictions on jobs and agents

Transfer for on-premises has the following scale restrictions on transfer jobs and agents:

  • Fewer than one billion files per job
  • 100 agents or fewer per transfer project
  • Bandwidth cap must be over 1MBps

Create a transfer job

Before you can start a transfer, you must create a transfer job. The transfer job coordinates and controls your on-premises agents as they move your data.

To create a transfer job:

Cloud Console

  1. Go to the Transfer service for on-premises data Web Console page in the Google Cloud Console.

    Go to the Transfer service for on-premises data Page

  2. Click Create Transfer Job.

    The Create a transfer job page is displayed.

  3. Select an agent pool for your transfer. To create a new agent pool:

    1. Click Create Agent Pool.

      The Create an agent pool form is displayed.

    2. Complete the form, then click Create.

      Your new agent pool is highlighted on the Create a transfer job page. Select it to confirm.

  4. Specify a source by entering the fully qualified path of the source file system directory.

  5. Specify a Cloud Storage destination bucket. You can enter a Cloud Storage bucket name, or you can create a new bucket.

    To create and select new bucket:

    1. Click Browse.

    2. Click New bucket.

      The Create a bucket form is displayed.

    3. Complete the form, and then click Create and then Select.

  6. Optional: To transfer files into a folder instead of your bucket's top-level, specify the folder's name and the full path leading to it.

  7. Describe the transfer job. Enter a short description of your transfer that will help you track it.

  8. Optional: Create a schedule for your job.

  9. Click Create.

REST API

Use transferJobs.create with a posixDataSource:

POST https://storagetransfer.googleapis.com/v1/transferJobs
{
  "name":"transferJobs/sample_transfer",
  "description": "My First Transfer",
  "status": "ENABLED",
  "projectId": "my_transfer_project_id",
  "schedule": {
      "scheduleStartDate": {
          "year": 2022,
          "month": 5,
          "day": 2
      },
      "startTimeOfDay": {
          "hours": 22,
          "minutes": 30,
          "seconds": 0,
          "nanos": 0
      }
      "scheduleEndDate": {
          "year": 2022,
          "month": 12,
          "day": 31
      },
      "repeatInterval": {
          "259200s"
      },
  },
  "transferSpec": {
      "posixDataSource": {
           "rootDirectory": "/bar/",

      },
      "gcsDataSink": {
           "bucketName": "destination_bucket"
           "path": "foo/bar/"
      },
   }
}

The schedule field is optional; if it's not included, the transfer job must be started with a transferJobs.run request.

To check your transfer's status after creating a job, use transferJobs.get:

GET https://storagetransfer.googleapis.com/v1/transferJobs/sample_transfer?project_id=my_transfer_project_id

If you haven't already done so, install and run on-premises transfer agents on each of your machines.

Control bandwidth usage for Transfer service for on-premises data

Bandwidth limits are helpful if you need to limit how much data Transfer service for on-premises data uses to transfer data to Cloud Storage. Using a bandwidth limit helps ensure that:

  • Your network up-links are not saturated as a result of using Transfer service for on-premises data.

  • Your organization's existing application behavior doesn't degrade during the transfer.

  • If you're on a network connection that charges by peak bandwidth usage, that you don't cause a sudden price increase.

Bandwidth limits are applied at an agent pool level and are divided by all agents in the pool.

Set a bandwidth limit

To set a bandwidth limit:

  1. In the Cloud Console, go to the Transfer service for on-premises data page.

    Go to Transfer service for on-premises data

  2. Click Connection settings.

  3. Select the agent pool to update.

  4. Click Set bandwidth limit.

  5. Enter the desired network limit in megabytes per second (MB/s) and click Set limit.

    The bandwidth limit is displayed for the project.

Edit a bandwidth limit

To edit an existing bandwidth limit, from the Connection Settings page, click Edit limit.

To remove a limit, click Use all bandwidth.

Monitor jobs

You can monitor your Transfer service for on-premises data jobs to ensure they're working as expected.

To monitor your transfer jobs:

Cloud Console

  1. Go to the Transfer service for on-premises data Transfer Jobs page in Google Cloud Console.

    Go to the Transfer service for on-premises data Transfer Jobs Page

    A list of jobs is displayed. This list includes both running and completed jobs.

  2. To display detailed information on a transfer job, click the Job description for the job you're interested in.

    The Job details page is displayed.

The Job details page displays the following:

  • How much data has been transferred.

  • Configuration information about the transfer job.

  • Scheduled or recurring job information.

  • Details of the most recent job run.

  • History of all past job runs.

REST API

Use transferJobs.list to return a list of all transfer jobs.

To get more information on a specific transfer job, use transferJobs.get to return a TransferJob object.

To check on the status of an ongoing transfer, pass the TransferJob.latestOperationName value to transferOperations.get.

Filter jobs

If you have many jobs and wish to monitor a subset of them, consider using filters to sort and display only the jobs you are interested in.

To filter your transfer jobs:

Cloud Console

  1. Click Filter List .

  2. Select the filters you wish to apply.

REST API

To filter transfer jobs, provide the filter query parameter to transferJobs.list.

Edit job configurations

You can edit the following items for an existing transfer job:

  • The job description
  • Sync option
  • Schedule

To edit a job configuration:

Cloud Console

  1. Go to the Transfer service for on-premises data Transfer Jobs page in Google Cloud Console.

    Go to the Transfer service for on-premises data Transfer Jobs Page

  2. Click the Job description for the job you're editing.

    The Job details page is displayed.

  3. Click Configuration.

  4. Click beside the configuration item you wish to edit.

REST API

You can update a transfer job after creating the job using transferJobs.patch.

Re-run jobs

Transfer service for on-premises data supports running a completed job a single time again. This can be helpful if you have some additional data to move, and you'd like to reuse an existing job configuration.

To re-run a job:

Cloud Console

  1. Go to the Transfer service for on-premises data Transfer Jobs page in Google Cloud Console.

    Go to the Transfer service for on-premises data Transfer Jobs Page

  2. Click the Job description for the job you're editing.

    The Job details page is displayed.

  3. Click Run again.

    The job starts.

REST API

You can re-run a transfer job using transferJobs.run and providing the jobName.

View errors

To view a sample of errors encountered during the transfer:

Cloud Console

  1. Go to the Transfer service for on-premises data Transfer Jobs page in Google Cloud Console.

    Go to the Transfer service for on-premises data Transfer Jobs Page

  2. Click the Job description for the job you're editing.

    The Job details page is displayed.

  3. Click View error details.

    The Error details page is displayed, which shows a sample of errors encountered during the transfer.

REST API

You can view transfer errors job using transferOperations.get.

View transfer logs

Transfer service for on-premises data produces detailed transfer logs that you can use to verify the results of your transfer job. Each job produces a collection of transfer logs that are stored in the destination Cloud Storage bucket.

Logs are produced while the transfer job is running. The complete logs are typically available within 15 minutes of job completion.

You can view logs in either of the following:

View errors within the Google Cloud Console

To display all errors encountered during the transfer within Google Cloud console:

  1. Click View transfer logs.

    The Bucket details page is displayed. This is a destination in your Cloud Storage bucket.

  2. Click on the transfer log you are interested in.

    The transfer logs are displayed. For more information, see transfer log format.

View logs in the destination bucket

Transfer logs are stored in the destination bucket at the following path:

destination-bucket-name/storage-transfer/logs/transferJobs/job-name/transferOperations/operation-name

where:

  • destination-bucket-name is the name of the job's destination Cloud Storage bucket.
  • job-name is the job name, as displayed in the job list.
  • operation-name is the name of the individual transfer operation, comprised as the IS08601 timestamp and generated ID.

Logs are aggregated and stored as objects. Each batch of logs is named by its creation time. For example:

my bucket/storage-transfer/logs/transferOperations/job1/2019-10-19T10_52_56.519081644-07_00.log

The transfer logs are displayed. For more information, see transfer log format.

Run BigQuery queries on transfer logs

To run BigQuery queries on your transfer logs:

  1. Load the CSV log data into BigQuery.

  2. Run your BigQuery query.

Example queries

Display number of files attempted transfer and failed/success status

select ActionStatus, count(*) as num_files
from big-query-table
where Action="TRANSFER"
group by 1;

Where big-query-table is the name of the BigQuery table that contains the transfer log.

Display all files that failed to transfer

select Src_File_Path
from big-query-table
where Action="TRANSFER" and ActionStatus="FAILED";

Where big-query-table is the name of the BigQuery table that contains the transfer log.

Display checksum and timestamp for each file that successfully transferred

select Timestamp, Action, ActionStatus, Src_File_Path, Src_File_Size,
Src_File_Crc32C, Dst_Gcs_BucketName, Dst_Gcs_ObjectName, Dst_Gcs_Size,
Dst_Gcs_Crc32C, Dst_Gcs_Md5
from big-query-table
where Action="TRANSFER" and ActionStatus="SUCCEEDED";

Where big-query-table is the name of the BigQuery table that contains the transfer log.

Display all error information for directories that failed to transfer

select FailureDetails_ErrorType, FailureDetails_GrpcCode, FailureDetails_Message
from big-query-table
where Action="FIND" and ActionStatus="FAILED";

Where big-query-table is the name of the BigQuery table that contains the transfer log.