This page discusses concepts related to uploading and downloading objects. You can upload and store any MIME type of data up to 5 TiB in size.
You can send upload requests to Cloud Storage in the following ways:
Single-request upload. Use this if the file is small enough to upload in its entirety if the connection fails.
- The JSON API further distinguishes between media uploads, in which only object data is included in the request, and JSON API multipart uploads, in which both object data and object metadata are included in the request.
Resumable upload. Use this for a more reliable transfer, which is especially important with large files. Resumable uploads are a good choice for most applications, since they also work for small files at the cost of one additional HTTP request per upload. You can also use resumable uploads to perform streaming transfers, which allows you to upload an object of unknown size.
XML API multipart upload. An upload method that is compatible with Amazon S3 multipart uploads. Files are uploaded in parts and assembled into a single object with the final request. XML API multipart uploads allow you to upload the parts in parallel, potentially reducing the time to complete the overall upload.
Upload size considerations
When choosing whether to use a single-request upload instead of a resumable upload or XML API multipart upload, consider the amount of time that you're willing to lose should a network failure occur and you need to restart the upload from the beginning. For faster connections, your cutoff size can typically be larger.
For example, say you're willing to tolerate 30 seconds of lost time:
If you upload from a local system with an average upload speed of 8 Mbps, you can use single-request uploads for files as large as 30 MB.
If you upload from an in-region service that averages 500 Mbps for its upload speed, the cutoff size for files is almost 2 GB.
Parallel composite uploads
One strategy for uploading large files is called parallel composite uploads. In such an upload, a file is divided into up to 32 chunks, the chunks are uploaded in parallel to temporary objects, the final object is recreated using the temporary objects, and the temporary objects are deleted.
Parallel composite uploads can be significantly faster if network and disk speed are not limiting factors; however, the final object stored in your bucket is a composite object, which only has a crc32c hash and not an MD5 hash. As a result, you must use crcmod to perform integrity checks when downloading the object with gsutil or other Python applications. You should only perform parallel composite uploads if the following apply:
Any Python user who needs to download your objects has either google-crc32c or crcmod installed.
Any gsutil user who needs to download your objects has crcmod installed.
For example, if you use gsutil to upload video assets that are only served by a Java application, parallel composite uploads are a good choice because there are efficient CRC32C implementations available in Java.
You do not need the uploaded objects to have an MD5 hash.
You can configure how and when
gsutil cp performs parallel composite
uploads, which are disabled by default, by modifying the following two
parallel_composite_upload_threshold: The minimum total file size for performing a parallel composite upload. You can disable all parallel composite uploads in gsutil by setting this value to
parallel_composite_upload_component_size: The maximum size for each temporary object. The parameter is ignored if the total file size is so large that it would require more than 32 chunks at this size.
No additional local disk space is required when using gsutil to perform parallel composite uploads. If a parallel composite upload fails prior to composition, run the gsutil command again to take advantage of resumable uploads for the temporary objects that failed. Any temporary objects that uploaded successfully before the failure do not get re-uploaded when you resume the upload.
Temporary objects are named in the following fashion:
RANDOM_ID is a numerical value, and
HASH is an MD5 hash (not related to the hash of the
contents of the file or object).
Generally, temporary objects are deleted at the end of a parallel composite upload, but to avoid leaving temporary objects around, you should check the exit status from the gsutil command, and you should manually delete any temporary objects that were uploaded as part of any aborted upload.
JSON and XML support
Both the JSON API and XML API support uploading object chunks in parallel and
recombining them into a single object using the
Keep the following in mind when designing code for parallel composite uploads:
When using the
composeoperation, the source objects are unaffected by the composition process.
This means that if they are meant to be temporary, you must explicitly delete them once you've successfully completed the composition, or else the source objects remain in your bucket and are billed accordingly.
In order to protect against changes to source objects between the upload and compose requests, you should provide an expected generation number for each source.
All downloads from Cloud Storage have the same basic behavior: an
HTTP or HTTPS
GET request that can include an optional
Range header, which
defines a specific portion of the object to download.
Upload and download support per tool
Click the tabs below to view supported operations for each tool:
If you use REST APIs to upload and download, see Request endpoints for a complete discussion on the request endpoints you can use.
- Transfer objects from your Compute Engine instance.
- View best practices for uploading objects.
- Learn how to perform streamed uploads.
- Make your data publicly accessible.
- View and edit your object metadata.