The Cloud Storage API provides a resumable data transfer feature that lets you resume upload operations after a communication failure has interrupted the flow of data. Resumable uploads are useful if you are transferring large files, because the likelihood of a network interruption or some other transmission failure is high. Also, by using the resumable upload feature you can reduce your bandwidth usage (and therefore your bandwidth cost) because you do not have to restart large file uploads from the beginning. For tips on uploading to Cloud Storage, see best practices.
This section shows you how to implement the resumable upload feature using the XML API. You can also perform a resumable upload by using the gsutil tool.
Implementing Resumable Uploads with the XML API
The Cloud Storage XML API provides two standard HTTP methods for uploading data: POST Object and PUT Object. To implement a resumable upload, you use both of these methods in conjunction with various headers and query string parameters. The following procedure shows you how to do this:
Step 1—Initiate the resumable upload
To begin a resumable upload, you send a POST Object request to Cloud Storage. The POST Object request does not contain the file you are uploading. Rather, it contains a few headers that inform the Cloud Storage system that you want to perform a resumable upload. Specifically, the POST Object request must have the following:
- An empty entity body.
- A
Content-Length
request header, which must be set to 0. - An
x-goog-resumable
header, which must be set tostart
. - If you have enabled Cross-Origin Resource Sharing, an
Origin
header that you use in subsequent upload requests for the file you are uploading.
You can include a Content-Type
request header if you want to
specify a content type for the file you are uploading. If you do not specify a
content type, the Cloud Storage system will set the content type to
application/octet-stream
when it serves the object you are uploading.
The x-goog-resumable
header is a Cloud Storage extension (custom) header. The
header notifies the Cloud Storage system that you want to initiate a resumable
upload. The header can be used only with a POST Object request and can be used
only for resumable uploads.
In addition, you must use the standard Cloud Storage host name in the request (storage.googleapis.com), and you must authenticate the POST Object request just as you would any authenticated request. For more information, see Request Endpoints and Authentication.
The following example shows how to initiate a resumable upload for a file named
music.mp3
that's being uploaded into a bucket named example
.
POST /music.mp3 HTTP/1.1 Host: example.storage.googleapis.com Date: Fri, 01 Oct 2010 21:56:18 GMT Content-Length: 0 Content-Type: audio/mpeg x-goog-resumable: start Authorization: Bearer ya29.AHES6ZRVmB7fkLtd1XTmq6mo0S1wqZZi3-Lh_s-6Uw7p8vtgSwg
Step 2—Process the response
After you initiate the resumable upload with a POST Object request, Cloud Storage
responds with a 201 Created
status message. The status message includes a
Location
header whose value is the resumable session URI. You must
save the session URI, because you will use it in all further requests during
your upload operation.
The following example shows the response to the Post Object request that was shown in Step 1.
HTTP/1.1 201 Created Location: https://example.storage.googleapis.com/music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot Date: Fri, 01 Oct 2010 21:56:18 GMT Content-Length: 0 Content-Type: audio/mpeg
Step 3—Upload the file
Next, you implement a PUT Object request that sends the file blocks to
Cloud Storage. Use the session URI you obtained in Step 2 as the PUT request's
request URI. The request also includes a Content-Length
header, which you must
use to specify the size of the file you are uploading.
As with the POST Object request in Step 1, you must use the standard Cloud Storage host name in the request (storage.googleapis.com). You do not need to use an explicit authentication token since the session URI is, in effect, an authentication token.
The following example shows how to upload the music.mp3
file that was
initiated in Step 1:
PUT https://example.storage.googleapis.com/music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Date: Fri, 01 Oct 2010 21:56:18 GMT Content-Length: 7351375
If the PUT Object request is not interrupted and the file is successfully
uploaded, Cloud Storage responds with a 200 OK
status code. If the upload is
interrupted, you can resume the upload by performing Steps 4, 5, and 6.
Step 4—Query Cloud Storage for the upload status
If the upload operation is interrupted or gets an HTTP 503
or 500
response,
you should query for the number of bytes it has received by sending another PUT
Object request. The PUT Object request must have the following:
- An empty entity body.
- A
Content-Length
request header, which must be set to 0. - A
Content-Range
request header, which specifies the byte range you are seeking status for. - A request URI equal to the session URI for the resumable upload.
The value of the Content-Range
request header must be in the following format:
Content-Range: bytes */<content-length>
Where <content-length>
is the value of the Content-Length
header that you specified in the original PUT Object request (Step 3).
In addition, you must use the standard Cloud Storage host name in the request (storage.googleapis.com).
The following example shows how to query the Cloud Storage system after a resumable upload is interrupted:
PUT https://example.storage.googleapis.com/music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Date: Fri, 01 Oct 2010 22:25:53 GMT Content-Range: bytes */7351375 Content-Length: 0
Step 5—Process the status response
- A
200 OK
or201 Created
response indicates that the upload was completed, and no further action is necessary. - A
308 Resume Incomplete
response indicates that you need to continue uploading the file.
If you received a 308 Resume Incomplete
response, process the response's
Range
header, which specifies which bytes Cloud Storage has received so
far. You will use this number in Step 6. The response does not have a Range
header if Cloud Storage has not yet received any bytes.
The following example shows the response to the PUT Object request that was shown in Step 4:
HTTP/1.1 308 Resume Incomplete Range: bytes=0-2359295 Date: Fri, 01 Oct 2010 22:25:53 GMT Content-Length: 0 Content-Type: audio/mpeg
The example indicates that Cloud Storage received the first 2359296 bytes of the
music.mp3
file.
Step 6—Resume the upload
Finally, you can resume the upload operation by sending a PUT Object request. The PUT Object request must have the following:
- An entity body containing the range of bytes that still need to be uploaded.
You can determine this range by subtracting the
Range
(which you obtained in Step 5) from theContent-Length
(which you specified in Step 3). - A
Content-Length
request header, which specifies the number of bytes you are uploading in the current request. - A
Content-Range
request header, which specifies the byte range you are uploading in the request. - A request URI equal to the session URI for the resumable upload.
You must use the standard Cloud Storage host name in the request (storage.googleapis.com).
The following example shows a PUT Object request that resumes the upload of the
music.mp3
file into the example bucket.
PUT https://example.storage.googleapis.com/music.mp3?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Date: Fri, 01 Oct 2010 22:25:53 GMT Content-Range: bytes 2359296-7351374/7351375 Content-Length: 4992079
You can perform steps 4, 5, or 6 as many times as necessary, but when retrying requests, use truncated exponential backoff. For an example, see the boto implementation of this logic.
When the file is successfully uploaded, Cloud Storage responds with a 200 OK
status code.
Optional Optimization
If you receive a 308 Resume Incomplete
response with no Range
header,
it's possible some bytes have been received by Cloud Storage but were not
yet persisted at the time Cloud Storage received the query. Retransmitting
from the beginning of the file in this case is somewhat wasteful. To
reduce the likelihood of this case, you can wait a few seconds after
the first 308
response and then query a second time; at that point you
might receive a Range
header, allowing you to avoid retransmitting the
start of the file. If you don't receive a Range
header on this second
try you should not continue to wait and try a third time, as continuing
to re-query could lead to a "hung" upload in the case where Cloud Storage
truely has not received any data for the upload.
Resumable Uploads of Unknown Size
The resumable upload mechanism supports transfers where the file size is not known in advance. This can be useful for cases like compressing an object on-the-fly while uploading, since it's difficult to predict the exact file size for the compressed file at the start of a transfer. The mechanism is useful either if you want to stream a transfer that can be resumed after being interrupted, or if chunked transfer encoding does not work for your application.
Step 1—Initiate the resumable upload
To begin a resumable upload, you send a POST Object request to Cloud Storage. The
POST Object request does not contain the file you are uploading. Rather, it
contains a few headers that inform the Cloud Storage system that you want to
perform a resumable upload. The following example shows how to initiate a
resumable upload for a file named myFile.zip
.
POST https://example.storage.googleapis.com/myFile.zip HTTP/1.1 Date: Fri, 22 Jun 2012 21:56:18 GMT Content-Type: application/octet-stream x-goog-resumable: start Authorization: Bearer ya29.AHES6ZRVmB7fkLtd1XTmq6mo0S1wqZZi3-Lh_s-6Uw7p8vtgSwg
Step 2—Process the response
After you initiate the resumable upload with a POST Object request, Cloud Storage
responds with a 201 Created
status message. The status message includes a
Location
header whose value is the resumable session URI. You must
save the session URI, because you will use it in all further requests during
your upload operation.
The following example shows the response to the POST Object request.
HTTP/1.1 201 Created Location: https://example.storage.googleapis.com/myFile.zip?upload_id=tvA0ExBntDa...gAAEnB2Uowrot Date: Fri, 22 Jun 2012 21:56:18 GMT Content-Length: 0 Content-Type: text/html; charset=UTF-8
Step 3—Upload the file blocks
Next, you implement a PUT Object request that sends the file blocks to Cloud Storage. The PUT Object request URI is the session URI you obtained in Step 2.
The sizes of all the blocks written, except the final block, must be a multiple
of 256K bytes (that is, 262144 bytes). For each block, with the exclusion of the
last block, perform a PUT request and assign to the Content-Range
the value
X-Y/*
, where X
represents the first byte, and Y
the last byte of the block.
For example, if the size of the first block to transfer is 512K, then X = 0
,
Y = 524287
(that is, 524288 - 1). The following example shows how to perform
the related request.
PUT https://example.storage.googleapis.com/myFile.zip?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Date: Fri, 22 Jun 2012 21:56:18 GMT Content-Length: 524288 Content-Range: bytes 0-524287/*
After each upload, Cloud Storage responds with a 308 Resume Incomplete
status code.
This status code contains a Range
response header, which tells you the
range of bytes that the Cloud Storage system has received.
For the final block, perform the PUT request and assign to the Content-Range
the value X-Y/Z
, where X
and Y
are as defined before, and Z
is the total
byte count. To keep things simple, let's assume that the size of the file to
transfer is 588288 bytes. After the first transfer of 524288 bytes, shown in the
previous example, there are 64000 (that is, 588288 - 524288) bytes left. From
this, it results that X = 524288
(the first byte in the block), Z = 588288
(the file size), and Y = 588287
(the last byte in the block, that is,
588288 - 1). The following example shows how to perform the related request.
PUT https://example.storage.googleapis.com/myFile.zip?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Date: Fri, 22 Jun 2012 21:56:18 GMT Content-Length: 64000 Content-Range: bytes 524288-588287/588288
Cancelling an Upload
If you want to cancel the upload and prevent any further action on it, issue a
DELETE
request on the unique upload URI. After the DELETE
request has
succeeded, future attempts to query or resume the upload will result in a
4xx
response.
Example: Cancelling an upload
The following example shows how to cancel a resumable upload:
DELETE https://example.storage.googleapis.com/myFile.zip?upload_id=tvA0ExBntDa...gAAEnB2Uowrot HTTP/1.1 Content-Length: 0
Recommended Practices
A session URI expires after one week. We recommend that you start a resumable upload as soon as you obtain the session URI, and that you resume an interrupted upload shortly after the interruption occurred.
If you use an expired session URI in a request, you will receive a 400 Bad
Request
status code. In this case, you will have to initiate the resumable
upload, obtain a new session URI, and start the upload from the beginning using
the new session URI.
Also, you should retry any requests that return the following status codes:
408 Request Timeout
500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
When performing retry requests, use truncated exponential backoff.
In addition, we recommend that you request an integrity check of the final
uploaded object to be sure that it matches the source file. You can do this by
calculating the MD5 digest of the source file and adding it to the
Content-MD5
request header. Checking the integrity of the
uploaded file is particularly important if you are uploading a large file over a
long period of time, because there is an increased likelihood of the source file
being modified over the course of the upload operation.
Resumable uploads are pinned in the region they start in. For example, if you create a resumable upload URL in the US and give it to a client in Asia, the upload will still go through the US. Continuing a resumable upload in a region where it wasn't initiated can cause slow uploads.
If you use Compute Engine instances with processes that POST to Cloud Storage to initiate a resumable upload, then you should use Compute Engine instances in the same locations as your Cloud Storage buckets. You can then use a geo IP service to pick the Compute Engine region to which you route customer requests, which will help keep traffic localized to a geo-region.