- 2.17.0 (latest)
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.1
- 2.1.0
- 2.0.0
- 1.44.0
- 1.43.0
- 1.42.3
- 1.41.1
- 1.40.0
- 1.39.0
- 1.38.0
- 1.37.1
- 1.36.2
- 1.35.1
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.2
- 1.30.0
- 1.29.0
- 1.28.1
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.1
- 1.23.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
Concurrent media operations. This is a PREVIEW FEATURE: API may change.
Modules Functions
download_chunks_concurrently
download_chunks_concurrently(
blob,
filename,
chunk_size=33554432,
download_kwargs=None,
deadline=None,
worker_type="process",
max_workers=8,
)
Download a single file in chunks, concurrently.
This function is a PREVIEW FEATURE: the API may change in a future version.
In some environments, using this feature with mutiple processes will result in faster downloads of large files.
Using this feature with multiple threads is unlikely to improve download performance under normal circumstances due to Python interpreter threading behavior. The default is therefore to use processes instead of threads.
Checksumming (md5 or crc32c) is not supported for chunked operations. Any
checksum
parameter passed in to download_kwargs will be ignored.
Parameters | |
---|---|
Name | Description |
blob |
The blob to be downloaded. |
filename |
str
The destination filename or path. |
chunk_size |
int
The size in bytes of each chunk to send. The optimal chunk size for maximum throughput may vary depending on the exact network environment and size of the blob. |
download_kwargs |
dict
A dictionary of keyword arguments to pass to the download method. Refer to the documentation for blob.download_to_file() or blob.download_to_filename() for more information. The dict is directly passed into the download methods and is not validated by this function. Keyword arguments "start" and "end" which are not supported and will cause a ValueError if present. |
deadline |
int
The number of seconds to wait for all threads to resolve. If the deadline is reached, all threads will be terminated regardless of their progress and concurrent.futures.TimeoutError will be raised. This can be left as the default of None (no deadline) for most use cases. |
worker_type |
str
The worker type to use; one of google.cloud.storage.transfer_manager.PROCESS or google.cloud.storage.transfer_manager.THREAD. Although the exact performance impact depends on the use case, in most situations the PROCESS worker type will use more system resources (both memory and CPU) and result in faster operations than THREAD workers. Because the subprocesses of the PROCESS worker type can't access memory from the main process, Client objects have to be serialized and then recreated in each subprocess. The serialization of the Client object for use in subprocesses is an approximation and may not capture every detail of the Client object, especially if the Client was modified after its initial creation or if |
max_workers |
int
The maximum number of workers to create to handle the workload. With PROCESS workers, a larger number of workers will consume more system resources (memory and CPU) at once. How many workers is optimal depends heavily on the specific use case, and the default is a conservative number that should work okay in most cases without consuming excessive resources. |
Exceptions | |
---|---|
Type | Description |
`concurrent.futures.TimeoutError | if deadline is exceeded. |
upload_chunks_concurrently
upload_chunks_concurrently(filename, blob, content_type=None, chunk_size=33554432, deadline=None, worker_type='process', max_workers=8, *, checksum='md5', timeout=60, retry=<google.api_core.retry.Retry object>)
Upload a single file in chunks, concurrently.
This function uses the XML MPU API to initialize an upload and upload a file in chunks, concurrently with a worker pool.
The XML MPU API is significantly different from other uploads; please review the documentation at https://cloud.google.com/storage/docs/multipart-uploads before using this feature.
The library will attempt to cancel uploads that fail due to an exception.
If the upload fails in a way that precludes cancellation, such as a
hardware failure, process termination, or power outage, then the incomplete
upload may persist indefinitely. To mitigate this, set the
AbortIncompleteMultipartUpload
with a nonzero Age
in bucket lifecycle
rules, or refer to the XML API documentation linked above to learn more
about how to list and delete individual downloads.
Using this feature with multiple threads is unlikely to improve upload performance under normal circumstances due to Python interpreter threading behavior. The default is therefore to use processes instead of threads.
ACL information cannot be sent with this function and should be set
separately with ObjectACL
methods.
Parameters | |
---|---|
Name | Description |
filename |
str
The path to the file to upload. File-like objects are not supported. |
blob |
The blob to which to upload. |
content_type |
str
(Optional) Type of content being uploaded. |
chunk_size |
int
The size in bytes of each chunk to send. The optimal chunk size for maximum throughput may vary depending on the exact network environment and size of the blob. The remote API has restrictions on the minimum and maximum size allowable, see: https://cloud.google.com/storage/quotas#requests |
deadline |
int
The number of seconds to wait for all threads to resolve. If the deadline is reached, all threads will be terminated regardless of their progress and concurrent.futures.TimeoutError will be raised. This can be left as the default of None (no deadline) for most use cases. |
worker_type |
str
The worker type to use; one of google.cloud.storage.transfer_manager.PROCESS or google.cloud.storage.transfer_manager.THREAD. Although the exact performance impact depends on the use case, in most situations the PROCESS worker type will use more system resources (both memory and CPU) and result in faster operations than THREAD workers. Because the subprocesses of the PROCESS worker type can't access memory from the main process, Client objects have to be serialized and then recreated in each subprocess. The serialization of the Client object for use in subprocesses is an approximation and may not capture every detail of the Client object, especially if the Client was modified after its initial creation or if |
max_workers |
int
The maximum number of workers to create to handle the workload. With PROCESS workers, a larger number of workers will consume more system resources (memory and CPU) at once. How many workers is optimal depends heavily on the specific use case, and the default is a conservative number that should work okay in most cases without consuming excessive resources. |
checksum |
str
(Optional) The checksum scheme to use: either 'md5', 'crc32c' or None. Each individual part is checksummed. At present, the selected checksum rule is only applied to parts and a separate checksum of the entire resulting blob is not computed. Please compute and compare the checksum of the file to the resulting blob separately if needed, using the 'crc32c' algorithm as per the XML MPU documentation. |
timeout |
float or tuple
(Optional) The amount of time, in seconds, to wait for the server response. See: |
retry |
google.api_core.retry.Retry
(Optional) How to retry the RPC. A None value will disable retries. A google.api_core.retry.Retry value will enable retries, and the object will configure backoff and timeout options. Custom predicates (customizable error codes) are not supported for media operations such as this one. This function does not accept ConditionalRetryPolicy values because preconditions are not supported by the underlying API call. See the retry.py source code and docstrings in this package (google.cloud.storage.retry) for information on retry types and how to configure them. |
Exceptions | |
---|---|
Type | Description |
`concurrent.futures.TimeoutError | if deadline is exceeded. |