Resumable Uploads with the XML API

The Google Cloud Storage API provides a resumable data transfer feature that lets you resume upload operations after a communication failure has interrupted the flow of data. Resumable uploads are useful if you are transferring large files, because the likelihood of a network interruption or some other transmission failure is high. Also, by using the resumable upload feature you can reduce your bandwidth usage (and therefore your bandwidth cost) because you do not have to restart large file uploads from the beginning. For tips on uploading to Google Cloud Storage, see best practices.

This section shows you how to implement the resumable upload feature using the XML API. You can also perform a resumable upload by using the gsutil tool.

Implementing Resumable Uploads with the XML API

The Google Cloud Storage XML API provides two standard HTTP methods for uploading data: POST Object and PUT Object. To implement a resumable upload, you use both of these methods in conjunction with various headers and query string parameters. The following procedure shows you how to do this:

Step 1—Initiate the resumable upload

To begin a resumable upload, you send a POST Object request to Google Cloud Storage. The POST Object request does not contain the file you are uploading. Rather, it contains a few headers that inform the Google Cloud Storage system that you want to perform a resumable upload. Specifically, the POST Object request must have the following:

You can include a Content-Type request header if you want to specify a content type for the file you are uploading. If you do not specify a content type, the Google Cloud Storage system will set the content type to application/octet-stream when it serves the object you are uploading.

The x-goog-resumable header is a Google Cloud Storage extension (custom) header. The header notifies the Google Cloud Storage system that you want to initiate a resumable upload. The header can be used only with a POST Object request and can be used only for resumable uploads.

In addition, you must use the standard Google Cloud Storage host name in the request (storage.googleapis.com), and you must authenticate the POST Object request just as you would any authenticated request. For more information, see Request URIs and Authentication.

The following example shows how to initiate a resumable upload for a file named music.mp3 that's being uploaded into a bucket named example.

POST /music.mp3 HTTP/1.1
Host: example.storage.googleapis.com
Date: Fri, 01 Oct 2010 21:56:18 GMT
Content-Length: 0
Content-Type: audio/mpeg
x-goog-resumable: start
Authorization: Bearer ya29.AHES6ZRVmB7fkLtd1XTmq6mo0S1wqZZi3-Lh_s-6Uw7p8vtgSwg

Step 2—Process the response

After you initiate the resumable upload with a POST Object request, Google Cloud Storage responds with a 201 Created status message. The status message includes a Location header whose value is the resumable session URI. You must save the session URI, because you will use it in all further requests during your upload operation.

The following example shows the response to the Post Object request that was shown in Step 1.

HTTP/1.1 201 Created
Location: https://example.storage.googleapis.com/music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot
Date: Fri, 01 Oct 2010 21:56:18 GMT
Content-Length: 0
Content-Type: audio/mpeg

Step 3—Upload the file

Next, you implement a PUT Object request that sends the file blocks to Google Cloud Storage. Use the session URI you obtained in Step 2 as the PUT request's request URI. The request also includes a Content-Length header, which you must use to specify the size of the file you are uploading.

As with the POST Object request in Step 1, you must use the standard Google Cloud Storage host name in the request (storage.googleapis.com). You do not need to use an explicit authentication token since the session URI is, in effect, an authentication token.

The following example shows how to upload the music.mp3 file that was initiated in Step 1:

PUT https://example.storage.googleapis.com/music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1
Date: Fri, 01 Oct 2010 21:56:18 GMT
Content-Length: 7351375

If the PUT Object request is not interrupted and the file is successfully uploaded, Google Cloud Storage responds with a 200 OK status code. If the upload is interrupted, you can resume the upload by performing Steps 4, 5, and 6.

Step 4—Query Google Cloud Storage for the upload status

If the upload operation is interrupted or gets an HTTP 503 or 500 response, you should query for the number of bytes it has received by sending another PUT Object request. The PUT Object request must have the following:

  • An empty entity body.
  • A Content-Length request header, which must be set to 0.
  • A Content-Range request header, which specifies the byte range you are seeking status for.
  • A request URI equal to the session URI for the resumable upload.

The value of the Content-Range request header must be in the following format:

Content-Range: bytes */<content-length>

Where <content-length> is the value of the Content-Length header that you specified in the original PUT Object request (Step 3).

In addition, you must use the standard Google Cloud Storage host name in the request (storage.googleapis.com).

The following example shows how to query the Google Cloud Storage system after a resumable upload is interrupted:

PUT https://example.storage.googleapis.com/music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1
Date: Fri, 01 Oct 2010 22:25:53 GMT
Content-Range: bytes */7351375
Content-Length: 0

Step 5—Process the status response

After you query the Google Cloud Storage system for the status of the interrupted upload, the Google Cloud Storage system responds with a 308 Resume Incomplete status code. This status code contains a Range response header, which tells you the range of bytes that the Google Cloud Storage system has received. You must use the value of the Range header to determine the byte range that was not successfully uploaded. You will use this number in Step 6.

The following example shows the response to the PUT Object request that was shown in Step 4:

HTTP/1.1 308 Resume Incomplete
Range: bytes=0-2359295
Date: Fri, 01 Oct 2010 22:25:53 GMT
Content-Length: 0
Content-Type: audio/mpeg

The example indicates that Google Cloud Storage received the first 2359296 bytes of the music.mp3 file.

Step 6—Resume the upload

Finally, you can resume the upload operation by sending a PUT Object request. The PUT Object request must have the following:

  • An entity body containing the range of bytes that still need to be uploaded. You can determine this range by subtracting the Range (which you obtained in Step 5) from the Content-Length (which you specified in Step 3).
  • A Content-Length request header, which specifies the number of bytes you are uploading in the current request.
  • A Content-Range request header, which specifies the byte range you are uploading in the request.
  • A request URI equal to the session URI for the resumable upload.

You must use the standard Google Cloud Storage host name in the request (storage.googleapis.com).

The following example shows a PUT Object request that resumes the upload of the music.mp3 file into the example bucket.

PUT https://example.storage.googleapis.com/music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1
Date: Fri, 01 Oct 2010 22:25:53 GMT
Content-Range: bytes 2359296-7351374/7351375
Content-Length: 4992079

You can perform steps 4, 5, or 6 as many times as necessary, but when retrying requests, use truncated exponential backoff. For an example, see the boto implementation of this logic.

When the file is successfully uploaded, Google Cloud Storage responds with a 200 OK status code.

Resumable Uploads of Unknown Size

The resumable upload mechanism supports transfers where the file size is not known in advance. This can be useful for cases like compressing an object on-the-fly while uploading, since it's difficult to predict the exact file size for the compressed file at the start of a transfer. The mechanism is useful either if you want to stream a transfer that can be resumed after being interrupted, or if chunked transfer encoding does not work for your application.

Step 1—Initiate the resumable upload

To begin a resumable upload, you send a POST Object request to Google Cloud Storage. The POST Object request does not contain the file you are uploading. Rather, it contains a few headers that inform the Google Cloud Storage system that you want to perform a resumable upload. The following example shows how to initiate a resumable upload for a file named myFile.zip.

POST https://example.storage.googleapis.com/myFile.zip HTTP/1.1
Date: Fri, 22 Jun 2012 21:56:18 GMT
Content-Type: application/octet-stream
x-goog-resumable: start
Authorization: Bearer ya29.AHES6ZRVmB7fkLtd1XTmq6mo0S1wqZZi3-Lh_s-6Uw7p8vtgSwg

Step 2—Process the response

After you initiate the resumable upload with a POST Object request, Google Cloud Storage responds with a 201 Created status message. The status message includes a Location header whose value is the resumable session URI. You must save the session URI, because you will use it in all further requests during your upload operation.

The following example shows the response to the POST Object request.

HTTP/1.1 201 Created
Location: https://example.storage.googleapis.com/myFile.zip?upload_id=tvA0ExBntDa...gAAEnB2Uowrot
Date: Fri, 22 Jun 2012 21:56:18 GMT
Content-Length: 0
Content-Type: text/html; charset=UTF-8

Step 3—Upload the file blocks

Next, you implement a PUT Object request that sends the file blocks to Google Cloud Storage. The PUT Object request URI is the session URI you obtained in Step 2.

The sizes of all the blocks written, except the final block, must be a multiple of 256K bytes (that is, 262144 bytes). For each block, with the exclusion of the last block, perform a PUT request and assign to the Content-Range the value X-Y/*, where X represents the first byte, and Y the last byte of the block. For example, if the size of the first block to transfer is 512K, then X = 0, Y = 524287 (that is, 524288 - 1). The following example shows how to perform the related request.

PUT https://example.storage.googleapis.com/myFile.zip?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1
Date: Fri, 22 Jun 2012 21:56:18 GMT
Content-Length: 524288
Content-Range: bytes 0-524287/*

After each upload, Google Cloud Storage responds with a 308 Resume Incomplete status code. This status code contains a Range response header, which tells you the range of bytes that the Google Cloud Storage system has received.

For the final block, perform the PUT request and assign to the Content-Range the value X-Y/Z, where X and Y are as defined before, and Z is the total byte count. To keep things simple, let's assume that the size of the file to transfer is 588288 bytes. After the first transfer of 524288 bytes, shown in the previous example, there are 64000 (that is, 588288 - 524288) bytes left. From this, it results that X = 524288 (the first byte in the block), Z = 588288 (the file size), and Y = 588287 (the last byte in the block, that is, 588288 - 1). The following example shows how to perform the related request.

PUT https://example.storage.googleapis.com/myFile.zip?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1
Date: Fri, 22 Jun 2012 21:56:18 GMT
Content-Length: 64000
Content-Range: bytes 524288-588287/588288

Recommended Practices

A session URI expires after one week. We recommend that you start a resumable upload as soon as you obtain the session URI, and that you resume an interrupted upload shortly after the interruption occurred.

If you use an expired session URI in a request, you will receive a 400 Bad Request status code. In this case, you will have to initiate the resumable upload, obtain a new session URI, and start the upload from the beginning using the new session URI.

Also, you should retry any requests that return the following status codes:

  • 408 Request Timeout
  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout

When performing retry requests, use truncated exponential backoff.

In addition, we recommend that you request an integrity check of the final uploaded object to be sure that it matches the source file. You can do this by calculating the MD5 digest of the source file and adding it to the Content-MD5 request header. Checking the integrity of the uploaded file is particularly important if you are uploading a large file over a long period of time, because there is an increased likelihood of the source file being modified over the course of the upload operation.

Resumable uploads are pinned in the region they start in. For example, if you create a resumable upload URL in the US and give it to a client in Asia, the upload will still go through the US. Continuing a resumable upload in a region where it wasn't initiated can cause slow uploads.

If you use Google Compute Engine instances with processes that POST to Cloud Storage to initiate a resumable upload, then you should use Compute Engine instances in the same locations as your Cloud Storage buckets. You can then use a geo IP service to pick the Google Compute Engine region to which you route customer requests, which will help keep traffic localized to a geo-region.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Storage Documentation