One strategy for downloading large files is called sliced object downloads.
In such a download, ranged GET
requests are made in parallel, storing data
within a temporary, pre-allocated destination file. Once all slices have
completed downloading, the temporary file is renamed to the destination file.
Sliced object downloads can be significantly faster if network and disk speed are not limiting factors; however, sliced object downloads cause multiple writes to occur at various locations on disk, so this download strategy can degrade performance for disks with slow seek times, especially when breaking a download into a large number of slices. Tools such as gcloud and gsutil have low default values for the number of slices they create to minimize the possibility of performance impacts.
Sliced object downloads should always use a fast composable checksum (CRC32C) to verify the data integrity of the slices. To perform sliced object downloads, tools such as gsutil and gcloud require a compiled version of crcmod on the machine performing the download. If compiled crcmod is not available, gsutil and gcloud perform non-sliced object downloads instead.
How tools and APIs use sliced object downloads
Depending on how you interact with Cloud Storage, sliced object downloads might be managed automatically on your behalf. This section describes the sliced object download behavior for different tools and provides information for how you can modify the behavior.
Console
The Google Cloud console does not perform sliced object downloads.
Command line
gcloud
By default, gcloud storage cp
enables sliced object downloads.
You can control how and when gcloud performs sliced object downloads by
modifying the following properties:
storage/sliced_object_download_threshold
: The minimum total file size for performing a sliced object download. You can disable all sliced object downloads by setting this value to0
.storage/sliced_object_download_max_components
: The maximum number of slices to use in the download. Set0
for no limit, in which case the number of slices is determined solely bystorage/sliced_object_download_component_size
.storage/sliced_object_download_component_size
: The target size for each download slice. This property is ignored if the total file size is so large that downloading slices of this size would require more slices than allowed, as set instorage/sliced_object_download_max_components
.
You can modify these properties by creating a named configuration
and applying the configuration either on a per-command basis by using
the --configuration
project-wide flag or for all gcloud commands
by using the gcloud config set
command.
No additional local disk space is required when using gcloud to perform sliced object downloads. If the download fails prior to completion, run the gcloud command again to resume the slices that failed. Slices that were downloaded successfully before the failure are not re-downloaded when you retry, except in the case where the source object has changed between download attempts.
Temporary downloaded objects appear in the destination directory with
the suffix _.gstmp
in their name.
gsutil
By default, gsutil cp
enables sliced object downloads. You can
control how and when gsutil performs sliced object downloads by
modifying the following parameters:
sliced_object_download_threshold
: The minimum total file size for performing a sliced object download. You can disable all sliced object downloads by setting this value to0
.sliced_object_download_max_components
: The maximum number of slices to use in the download. Set0
for no limit, in which case the number of slices is determined solely bysliced_object_download_component_size
.sliced_object_download_component_size
: The target size for each download slice. This parameter is ignored if the total file size is so large that downloading slices of this size would require more slices than allowed, as set insliced_object_download_max_components
.
You can modify these parameters either on a per-command basis by using
the -o
global option or for all gsutil commands by editing the
.boto configuration file.
No additional local disk space is required when using gsutil to perform sliced object downloads. If the download fails prior to completion, run the gsutil command again to resume the slices that failed. Slices that were downloaded successfully before the failure are not re-downloaded when you retry the command.
Temporary downloaded objects appear in the destination directory with
the suffix _.gstmp
in their name.
Client libraries
For more information, see the
Cloud Storage Node.js API
reference documentation.
To authenticate to Cloud Storage, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
You can perform sliced object downloads using the
For more information, see the
Cloud Storage Python API
reference documentation.
To authenticate to Cloud Storage, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
You can perform sliced object downloads using the
Node.js
downloadFileInChunks
method. For example:Python
download_chunks_concurrently
method. For example:
REST APIs
Both the JSON API and XML API support ranged GET
requests, which
means you can use either API to implement your own sliced object download
strategy.
In order to protect against data corruption due to the source object changing during the download, you should provide the generation number of the source object in each download request for a slice of the object.