Resumable uploads

Go to examples

This page discusses resumable uploads in Cloud Storage. Resumable uploads are the recommended method for uploading large files, because you don't have to restart them from the beginning if there is a network failure while the upload is underway.

Introduction

A resumable upload allows you to resume data transfer operations to Cloud Storage after a communication failure has interrupted the flow of data. Resumable uploads work by sending multiple requests, each of which contains a portion of the object you're uploading. This is different from a simple upload, which contains all of the object's data in a single request and must restart from the beginning if it fails part way through.

  • Use a resumable upload if you are uploading large files or uploading over a slow connection. For example file size cutoffs for using resumable uploads, see upload size considerations.

  • A completed resumable upload is considered one Class A operation.

How tools and APIs use resumable uploads

Depending on how you interact with Cloud Storage, resumable uploads may be managed automatically on your behalf. Click a tab in the table below to learn more:

Console

The Cloud Console manages resumable uploads automatically on your behalf. However, if you refresh or navigate away from the Cloud Console while an upload is underway, the upload is cancelled.

gsutil

The gsutil command-line tool allows you to set a minimum size for performing resumable uploads with the resumable_threshold parameter in the boto configuration file. The default value for resumable_threshold is 8 MiB.

Client libraries

C++

You can toggle the use of resumable uploads as part of the WriteObject method.

C#

You can initiate a resumable upload with CreateObjectUploader.

Go

You control the minimum size for performing resumable uploads with Writer.ChunkSize. Go always performs chunked resumable uploads.

Java

Resumable uploads are controlled through the writer method.

Node.js

Resumable uploads are automatically managed when using the createWriteStream method.

PHP

Resumable uploads are automatically managed on you behalf, but can be directly controlled using the resumable option.

Python

Resumable uploads occur automatically when the file is larger than 8 MiB. Alternatively, you can use Resumable Media to manage resumable uploads on your own.

Ruby

All uploads are treated as resumable uploads.

REST APIs

JSON API

The Cloud Storage JSON API uses a POST Object request that includes the query parameter uploadType=resumable to initiate the resumable upload and then one or more PUT Object requests to upload the object data. For a step-by-step guide to building your own logic for resumable uploading, see Performing resumable uploads.

XML API

The Cloud Storage XML API uses a POST Object request that includes the header x-goog-resumable: start to initiate the resumable upload and then one or more PUT Object requests to upload the object data. For a step-by-step guide to building your own logic for resumable uploading, see Performing resumable uploads.

Resumable uploads of unknown size

The resumable upload mechanism supports transfers where the file size is not known in advance. This can be useful for cases like compressing an object on-the-fly while uploading, since it's difficult to predict the exact file size for the compressed file at the start of a transfer. The mechanism is useful either if you want to stream a transfer that can be resumed after being interrupted, or if chunked transfer encoding does not work for your application.

For more information, see Streaming transfers.

Recommended practices

  • When you initiate a resumable upload session, a session URI is returned that you use when uploading your object data. This session URI acts as an authentication token, so the upload PUT requests don't need to be signed. Be judicious in sharing the session URI and only transmit it over HTTPS, because it can be used by anyone to upload data to the target bucket without any further authentication.

  • A session URI expires after one week. We recommend that you start a resumable upload as soon as you obtain the session URI and that you resume an interrupted upload shortly after the interruption occurred.

  • If you use an expired session URI in a request, you receive a 400 Bad Request status code. In this case, you have to initiate a new resumable upload, obtain a new session URI, and start the upload from the beginning using the new session URI.

  • You should retry any requests that return the following status codes:

    • 408 Request Timeout
    • 500 Internal Server Error
    • 502 Bad Gateway
    • 503 Service Unavailable
    • 504 Gateway Timeout
  • When performing a resumable upload, handle 404 Not Found errors by starting the entire upload over from the beginning.

When performing retry requests, use truncated exponential backoff.

  • In addition, we recommend that you request an integrity check of the final uploaded object to be sure that it matches the source file. You can do this by calculating the MD5 digest of the source file and adding it to the Content-MD5 request header. Checking the integrity of the uploaded file is particularly important if you are uploading a large file over a long period of time, because there is an increased likelihood of the source file being modified over the course of the upload operation.

  • Resumable uploads are pinned in the region they start in. For example, if you create a resumable upload URL in the US and give it to a client in Asia, the upload will still go through the US. Continuing a resumable upload in a region where it wasn't initiated can cause slow uploads.

  • If you use Compute Engine instances with processes that POST to Cloud Storage to initiate a resumable upload, then you should use Compute Engine instances in the same locations as your Cloud Storage buckets. You can then use a geo IP service to pick the Compute Engine region to which you route customer requests, which will help keep traffic localized to a geo-region.

Optional optimization

If you receive a 308 Resume Incomplete response with no Range header, it's possible some bytes have been received by Cloud Storage but were not yet persisted at the time Cloud Storage received the query. Retransmitting from the beginning of the file in this case is somewhat wasteful. To reduce the likelihood of this case, you can wait a few seconds after the first 308 response and then query a second time; at that point you might receive a Range header, allowing you to avoid retransmitting the start of the file. If you don't receive a Range header on this second try you should not continue to wait and try a third time, as continuing to re-query could lead to a "hung" upload in the case where Cloud Storage truly has not received any data for the upload.

What's next