XML API multipart uploads

This page discusses XML API multipart uploads in Cloud Storage. This upload method uploads files in parts and then assembles them into a single object using a final request. XML API multipart uploads are compatible with Amazon S3 multipart uploads.

Overview

An XML API multipart upload allows you to upload data in multiple parts and then assemble them into a final object. This behavior has several advantages, particularly for large files:

  • You can upload parts simultaneously, reducing the time it takes to upload the data in its entirety.

  • If one of the upload operations fails, you only have to re-upload a portion of the overall object, instead of restarting from the beginning.

  • Since the total file size is not specified in advance, you can use XML API multipart uploads for streaming uploads or for compressing data on-the-fly while uploading.

An XML API multipart upload has three required steps:

  1. Initiate the upload using a POST request, which includes specifying any metadata that the completed object should have. The response returns an UploadId that you use in all subsequent requests associated with the upload.

  2. Upload the data using one or more PUT requests.

  3. Complete the upload using a POST request. This request overwrites any existing object in the bucket with the same name.

There is no limit to how long a multipart upload and its uploaded parts can remain unfinished or idle in a bucket, but note that successfully uploaded parts count toward your monthly storage usage. You can avoid a buildup of abandoned multipart uploads by using Object Lifecycle Management to automatically remove multipart uploads when they reach a specified age.

Considerations

The following limitations apply to using XML API multipart uploads:

  • There are limits to the minimum size a part can be, the maximum size a part can be, and the number of parts used to assemble the completed upload.
  • Preconditions are not supported in the requests.
  • MD5 hashes do not exist for objects uploaded using this method.
  • This upload method is not supported in the Google Cloud console or the Google Cloud CLI.

Keep in mind the following when working with XML API multipart uploads:

  • XML API multipart uploads have specific IAM permissions. If you use custom IAM roles, you should ensure those roles have the permissions you need.

  • While you can initiate an upload and upload parts, the request to complete the upload fails if it would overwrite an object that has a hold on it or an unfulfilled retention period.

  • You can list ongoing uploads in a bucket, but only a completed upload appears in the normal list of objects in the bucket.

  • Uploaded parts are subject to early deletion charges if they use an applicable storage class and one of the following occurs before the part reaches its minimum storage duration:

    • The upload completes but the part is not used in the completion request.
    • The part is overwritten by another uploaded part.
    • The overall multipart upload is aborted, either directly or through Object Lifecycle Management.

    The storage duration for each part in a multipart upload begins at the time the upload of the part completes.

How client libraries use XML API multipart uploads

This section provides information about performing XML API multipart uploads with client libraries that support it.

Client libraries

Node.js

For more information, see the Cloud Storage Node.js API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

You can perform XML API multipart uploads using the uploadFileInChunks method. For example:

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// The ID of your GCS bucket
// const bucketName = 'your-unique-bucket-name';

// The path of file to upload
// const fileName = 'path/to/your/file';

// The size of each chunk to be uploaded
// const chunkSize = 32 * 1024 * 1024;

// Imports the Google Cloud client library
const {Storage, TransferManager} = require('@google-cloud/storage');

// Creates a client
const storage = new Storage();

// Creates a transfer manager client
const transferManager = new TransferManager(storage.bucket(bucketName));

async function uploadFileInChunksWithTransferManager() {
  // Uploads the files
  await transferManager.uploadFileInChunks(filePath, {
    chunkSizeBytes: chunkSize,
  });

  console.log(`${filePath} uploaded to ${bucketName}.`);
}

uploadFileInChunksWithTransferManager().catch(console.error);

Python

For more information, see the Cloud Storage Python API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

You can perform XML API multipart uploads using the upload_chunks_concurrently method. For example:

def upload_chunks_concurrently(
    bucket_name,
    source_filename,
    destination_blob_name,
    chunk_size=32 * 1024 * 1024,
    workers=8,
):
    """Upload a single file, in chunks, concurrently in a process pool."""
    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"

    # The path to your file to upload
    # source_filename = "local/path/to/file"

    # The ID of your GCS object
    # destination_blob_name = "storage-object-name"

    # The size of each chunk. The performance impact of this value depends on
    # the use case. The remote service has a minimum of 5 MiB and a maximum of
    # 5 GiB.
    # chunk_size = 32 * 1024 * 1024 (32 MiB)

    # The maximum number of processes to use for the operation. The performance
    # impact of this value depends on the use case. Each additional process
    # occupies some CPU and memory resources until finished. Threads can be used
    # instead of processes by passing `worker_type=transfer_manager.THREAD`.
    # workers=8

    from google.cloud.storage import Client, transfer_manager

    storage_client = Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    transfer_manager.upload_chunks_concurrently(
        source_filename, blob, chunk_size=chunk_size, max_workers=workers
    )

    print(f"File {source_filename} uploaded to {destination_blob_name}.")

What's next