Transfer specific files or objects using a manifest

This page shows you how to create a manifest of specific objects or files that you want to transfer. A manifest is a Cloud Storage object in CSV format that contains a list of files or objects for Storage Transfer Service to act upon. Manifests allow you to:

  • Transfer a specific list of files from a POSIX file system to a Cloud Storage bucket.
  • Transfer a specific list of objects from a Cloud Storage bucket to a POSIX file system.
  • Transfer a specific list of objects from AWS S3, Azure Blobstore, or Cloud Storage to a Cloud Storage bucket.

By specifying a manifest when creating a transfer job, only the files or objects listed in the manifest are transferred.

Create a manifest of files or objects for a transfer

Manifests must be in the CSV file format and can contain any UTF-8 characters. The first column must be a file path or object name specified as a string. We recommend testing the transfer with a small subset of files or objects to avoid making a large number of wasted API calls due to configuration errors.

You can monitor the status of file transfers from the Transfer Jobs page. Files or objects that fail to transfer are listed in the transfer logs.

Manifest of files

To create a manifest of files from a POSIX file system to transfer to Cloud Storage, create a CSV file with a single column containing the file paths relative to root_directory in createTransferJob.

Example manifest of files

A manifest of the following files:

File path
rootdir/dir1/subdir1/file1.txt
rootdir/File2.txt
rootdir/dir1/subdir1/file3.txt
rootdir/dir1/subdir4/file4.txt
rootdir/dir1/subdir1/file5.txt

would look like:

"dir1/subdir1/file1.txt"
"File2.txt"
"dir1/subdir1/file3.txt"
"dir1/subdir4/file4.txt"
"dir1/subdir1/file5.txt"

Manifest of objects

To create a manifest of objects, create a CSV file with the first column containing the object names relative to bucketName/path in createTransferJob. If an object name contains commas, they need to be properly escaped according to CSV standards. You can also specify an optional second column with the object version. All listed objects must be in one bucket that's specified in the source path.

Example manifest of objects

A manifest of the following objects:

Object path Object version (Optional)
SOURCE_PATH/object1.pdf 15857022
SOURCE_PATH/object2.pdf 585902
SOURCE_PATH/object3.pdf 74845
SOURCE_PATH/object4.jpg 149937

would look like:

object1.pdf,15857022
object2.pdf,585902
object3.pdf,74845
object4.jpg,149937

Upload the manifest to the proper location

Once you've created the manifest, upload the manifest to a Cloud Storage bucket. The service agent running the transfer must have storage.objects.get permission for the bucket where you upload the manifest file to. For instructions on how to grant permissions to the service agent, see Setting up access to the data source.

You can encrypt manifests that are located in a Cloud Storage bucket using customer-managed Cloud KMS encryption keys. In this case, ensure that any service accounts accessing the manifest are assigned the applicable encryption keys. Customer supplied keys are not supported.

Start a transfer with the manifest specified

gcloud

To transfer the files or objects that are listed in the manifest, include the --manifest-file=MANIFEST_FILE flag with your gcloud transfer jobs create command. MANIFEST_FILE must be the path to the CSV file in a Cloud Storage bucket, as shown in the example below.

gcloud transfer jobs create SOURCE DESTINATION \
  --manifest-file=gs://my_bucket/sample_manifest.csv

REST + Client libraries

REST


To transfer the files or objects that are listed in the manifest, make a createTransferJob API call that specifies a transferSpec with the transferManifest field added. For example:

POST https://storagetransfer.googleapis.com/v1/transferJobs

...
  "transferSpec": {
      "PosixFilesystem": {
          "root_directory": "/home/",
      },
      "gcsDataSink": {
          "bucketName": "GCS_NEARLINE_SINK_NAME",
          "path": "GCS_SINK_PATH",
      },
      "transferManifest": {
          "location": "gs://my_bucket/sample_manifest.csv"
      }
  }

The objects or files in the manifest aren't necessarily transferred in the listed order.

If the manifest includes files that already exist in the destination, those files are skipped.

If the manifest includes objects that exist in a different version in the destination, the object in the destination is overwritten with the source version of the object. If the destination is a versioned bucket, a new version of the object is created. If the destination object is the same as the source object, the object is skipped unless overwriteObjectsAlreadyExistingInSink=true is specified.

Go


import (
	"context"
	"fmt"
	"io"

	storagetransfer "cloud.google.com/go/storagetransfer/apiv1"
	storagetransferpb "google.golang.org/genproto/googleapis/storagetransfer/v1"
)

func transferUsingManifest(w io.Writer, projectID string, sourceAgentPoolName string, rootDirectory string, gcsSinkBucket string, manifestBucket string, manifestObjectName string) (*storagetransferpb.TransferJob, error) {
	// Your project id
	// projectId := "myproject-id"

	// The agent pool associated with the POSIX data source. If not provided, defaults to the default agent
	// sourceAgentPoolName := "projects/my-project/agentPools/transfer_service_default"

	// The root directory path on the source filesystem
	// rootDirectory := "/directory/to/transfer/source"

	// The ID of the GCS bucket to transfer data to
	// gcsSinkBucket := "my-sink-bucket"

	// The ID of the GCS bucket that contains the manifest file
	// manifestBucket := "my-manifest-bucket"

	// The name of the manifest file in manifestBucket that specifies which objects to transfer
	// manifestObjectName := "path/to/manifest.csv"

	ctx := context.Background()
	client, err := storagetransfer.NewClient(ctx)
	if err != nil {
		return nil, fmt.Errorf("storagetransfer.NewClient: %v", err)
	}
	defer client.Close()

	manifestLocation := "gs://" + manifestBucket + "/" + manifestObjectName
	req := &storagetransferpb.CreateTransferJobRequest{
		TransferJob: &storagetransferpb.TransferJob{
			ProjectId: projectID,
			TransferSpec: &storagetransferpb.TransferSpec{
				SourceAgentPoolName: sourceAgentPoolName,
				DataSource: &storagetransferpb.TransferSpec_PosixDataSource{
					PosixDataSource: &storagetransferpb.PosixFilesystem{RootDirectory: rootDirectory},
				},
				DataSink: &storagetransferpb.TransferSpec_GcsDataSink{
					GcsDataSink: &storagetransferpb.GcsData{BucketName: gcsSinkBucket},
				},
				TransferManifest: &storagetransferpb.TransferManifest{Location: manifestLocation},
			},
			Status: storagetransferpb.TransferJob_ENABLED,
		},
	}

	resp, err := client.CreateTransferJob(ctx, req)
	if err != nil {
		return nil, fmt.Errorf("failed to create transfer job: %v", err)
	}
	if _, err = client.RunTransferJob(ctx, &storagetransferpb.RunTransferJobRequest{
		ProjectId: projectID,
		JobName:   resp.Name,
	}); err != nil {
		return nil, fmt.Errorf("failed to run transfer job: %v", err)
	}
	fmt.Fprintf(w, "Created and ran transfer job from %v to %v using manifest file %v with name %v", rootDirectory, gcsSinkBucket, manifestLocation, resp.Name)
	return resp, nil
}

Java


import com.google.storagetransfer.v1.proto.StorageTransferServiceClient;
import com.google.storagetransfer.v1.proto.TransferProto;
import com.google.storagetransfer.v1.proto.TransferTypes.GcsData;
import com.google.storagetransfer.v1.proto.TransferTypes.PosixFilesystem;
import com.google.storagetransfer.v1.proto.TransferTypes.TransferJob;
import com.google.storagetransfer.v1.proto.TransferTypes.TransferManifest;
import com.google.storagetransfer.v1.proto.TransferTypes.TransferSpec;
import java.io.IOException;

public class TransferUsingManifest {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.

    // Your project id
    String projectId = "my-project-id";

    // The agent pool associated with the POSIX data source. If not provided, defaults to the
    // default agent
    String sourceAgentPoolName = "projects/my-project-id/agentPools/transfer_service_default";

    // The root directory path on the source filesystem
    String rootDirectory = "/directory/to/transfer/source";

    // The ID of the GCS bucket to transfer data to
    String gcsSinkBucket = "my-sink-bucket";

    // The ID of the GCS bucket which has your manifest file
    String manifestBucket = "my-bucket";

    // The ID of the object in manifestBucket that specifies which files to transfer
    String manifestObjectName = "path/to/manifest.csv";

    transferUsingManifest(
        projectId,
        sourceAgentPoolName,
        rootDirectory,
        gcsSinkBucket,
        manifestBucket,
        manifestObjectName);
  }

  public static void transferUsingManifest(
      String projectId,
      String sourceAgentPoolName,
      String rootDirectory,
      String gcsSinkBucket,
      String manifestBucket,
      String manifestObjectName)
      throws IOException {
    String manifestLocation = "gs://" + manifestBucket + "/" + manifestObjectName;
    TransferJob transferJob =
        TransferJob.newBuilder()
            .setProjectId(projectId)
            .setTransferSpec(
                TransferSpec.newBuilder()
                    .setSourceAgentPoolName(sourceAgentPoolName)
                    .setPosixDataSource(
                        PosixFilesystem.newBuilder().setRootDirectory(rootDirectory).build())
                    .setGcsDataSink((GcsData.newBuilder().setBucketName(gcsSinkBucket)).build())
                    .setTransferManifest(
                        TransferManifest.newBuilder().setLocation(manifestLocation).build()))
            .setStatus(TransferJob.Status.ENABLED)
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources,
    // or use "try-with-close" statement to do this automatically.
    try (StorageTransferServiceClient storageTransfer = StorageTransferServiceClient.create()) {

      // Create the transfer job
      TransferJob response =
          storageTransfer.createTransferJob(
              TransferProto.CreateTransferJobRequest.newBuilder()
                  .setTransferJob(transferJob)
                  .build());

      System.out.println(
          "Created and ran a transfer job from "
              + rootDirectory
              + " to "
              + gcsSinkBucket
              + " using "
              + "manifest file "
              + manifestLocation
              + " with name "
              + response.getName());
    }
  }
}

Node.js


// Imports the Google Cloud client library
const {
  StorageTransferServiceClient,
} = require('@google-cloud/storage-transfer');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// Your project id
// const projectId = 'my-project'

// The agent pool associated with the POSIX data source. Defaults to the default agent
// const sourceAgentPoolName = 'projects/my-project/agentPools/transfer_service_default'

// The root directory path on the source filesystem
// const rootDirectory = '/directory/to/transfer/source'

// The ID of the GCS bucket to transfer data to
// const gcsSinkBucket = 'my-sink-bucket'

// Transfer manifest location. Must be a `gs:` URL
// const manifestLocation = 'gs://my-bucket/sample_manifest.csv'

// Creates a client
const client = new StorageTransferServiceClient();

/**
 * Creates a request to transfer from the local file system to the sink bucket
 */
async function transferViaManifest() {
  const createRequest = {
    transferJob: {
      projectId,
      transferSpec: {
        sourceAgentPoolName,
        posixDataSource: {
          rootDirectory,
        },
        gcsDataSink: {bucketName: gcsSinkBucket},
        transferManifest: {
          location: manifestLocation,
        },
      },
      status: 'ENABLED',
    },
  };

  // Runs the request and creates the job
  const [transferJob] = await client.createTransferJob(createRequest);

  const runRequest = {
    jobName: transferJob.name,
    projectId: projectId,
  };

  await client.runTransferJob(runRequest);

  console.log(
    `Created and ran a transfer job from '${rootDirectory}' to '${gcsSinkBucket}' using manifest \`${manifestLocation}\` with name ${transferJob.name}`
  );
}

transferViaManifest();

Python

from google.cloud import storage_transfer


def create_transfer_with_manifest(
        project_id: str, description: str, source_agent_pool_name: str,
        root_directory: str, sink_bucket: str, manifest_location: str):
    """Create a transfer from a POSIX file system to a GCS bucket using
    a manifest file."""

    client = storage_transfer.StorageTransferServiceClient()

    # The ID of the Google Cloud Platform Project that owns the job
    # project_id = 'my-project-id'

    # A useful description for your transfer job
    # description = 'My transfer job'

    # The agent pool associated with the POSIX data source.
    # Defaults to 'projects/{project_id}/agentPools/transfer_service_default'
    # source_agent_pool_name = 'projects/my-project/agentPools/my-agent'

    # The root directory path on the source filesystem
    # root_directory = '/directory/to/transfer/source'

    # Google Cloud Storage destination bucket name
    # sink_bucket = 'my-gcs-destination-bucket'

    # Transfer manifest location. Must be a `gs:` URL
    # manifest_location = 'gs://my-bucket/sample_manifest.csv'

    transfer_job_request = storage_transfer.CreateTransferJobRequest({
        'transfer_job': {
            'project_id': project_id,
            'description': description,
            'status': storage_transfer.TransferJob.Status.ENABLED,
            'transfer_spec': {
                'source_agent_pool_name': source_agent_pool_name,
                'posix_data_source': {
                    'root_directory': root_directory,
                },
                'gcs_data_sink': {
                    'bucket_name': sink_bucket,
                },
                'transfer_manifest': {
                    'location': manifest_location
                }
            }
        }
    })

    result = client.create_transfer_job(transfer_job_request)
    print(f'Created transferJob: {result.name}')

What's next