This page describes retry strategies such as truncated exponential backoff for failed requests to Cloud Storage.
Overview
To decide whether to retry a failed request to Cloud Storage, consider the type of the request and its idempotency, which determines whether the operation is safe to retry. Generally, you should use truncated exponential backoff to retry the following types of requests:
All requests to Cloud Storage that return HTTP
5xx
and429
response codes, including uploads and downloads of data or metadata.Resumable uploads that return HTTP
408
response codes.Socket timeouts and TCP disconnects.
For more information, see the status and error codes for JSON and XML.
Exponential backoff algorithm
Truncated exponential backoff is a standard error handling strategy for network applications in which a client periodically retries a failed request with increasing delays between requests.
An exponential backoff algorithm retries requests exponentially, increasing the waiting time between retries up to a maximum backoff time. An example is:
Make a request to Cloud Storage.
If the request fails, wait 1 +
random_number_milliseconds
seconds and retry the request.If the request fails, wait 2 +
random_number_milliseconds
seconds and retry the request.If the request fails, wait 4 +
random_number_milliseconds
seconds and retry the request.And so on, up to a
maximum_backoff
time.Continue waiting and retrying up to a maximum amount of time (
deadline
), but do not increase themaximum_backoff
wait period between retries.
where:
The wait time is min((2n +
random_number_milliseconds
),maximum_backoff
), withn
incremented by 1 for each iteration (request).random_number_milliseconds
is a random number of milliseconds less than or equal to 1000. This helps to avoid cases where many clients become synchronized and all retry at once, sending requests in synchronized waves. The value ofrandom_number_milliseconds
is recalculated after each retry request.maximum_backoff
is typically 32 or 64 seconds. The appropriate value depends on the use case.
You can continue retrying once you reach the maximum_backoff
time, but we
recommend your request fail out after an amount of time to prevent your
application from becoming unresponsive. For example, if a client uses a
maximum_backoff
time of 64 seconds, then after reaching this value, the client
can retry every 64 seconds. The client then stops retrying after a deadline
of
600 seconds.
How long clients should wait between retries and how many times they should retry depends on your use case and network conditions. For example, mobile clients of an application may need to retry more times and for longer intervals when compared to desktop clients of the same application.
If the retry requests fail after exceeding the maximum_backoff
plus any
additional time allowed for retries, report or log an error using one of the
methods listed under Support & help.
Idempotency
To determine whether it's safe to retry a failed request to Cloud Storage, consider whether the request is idempotent, which means that applying the same operation multiple times has the same effect on the state of the targeted resource. Idempotent operations are generally safe to retry.
The following are examples of conditions that satisfy idempotency:
The operation has the same observable effect on the targeted resource even when continually requested.
The operation only succeeds once.
The operation has no observable effect on the state of the targeted resource.
For example, a request to list buckets has the same effect even if the request succeeds multiple times. On the other hand, an operation like creating a new Pub/Sub notification is not idempotent, because it creates a new notification ID each time the request succeeds.
Conditional idempotency
A subset of requests are conditionally idempotent, which means they are only idempotent if they include specific optional arguments. Operations which are conditionally safe to retry should only be retried by default if the condition case passes. Cloud Storage accepts preconditions and ETags as condition cases for requests.
Retry strategy per Cloud Storage tool
Click the tabs below to view retry strategy recommendations for each Cloud Storage tool.
Console
The Cloud Console sends requests to Cloud Storage on your behalf and handles any necessary backoff.
gsutil
gsutil retries the errors listed in the Overview section without requiring you to take additional action. You may have to take action for other errors, such as the following:
Invalid credentials or insufficient permissions.
Network unreachable because of a proxy configuration problem.
Individual operations that fail within a command where you use the
-m
top-level flag.
For retryable errors, gsutil retries requests using a truncated binary exponential backoff strategy. By default, gsutil retries 23 times over 1+2+4+8+16+32+60... seconds for about 10 minutes:
- If a request fails, wait a random period between [0..1] seconds and retry;
- If the request fails again, wait a random period between [0..2] seconds and retry;
- If the request fails again, wait a random period between [0..4] seconds and retry;
- And so on, up to 23 retries, with each retry period bounded by a default maximum of 60 seconds.
You can configure the number of retries and maximum delay
of any individual retry by editing the num_retries
and max_retry_delay
configuration variables in the "[Boto]"
section of the .boto
config file.
For data transfers using the gsutil cp
and rsync
commands, gsutil
provides additional retry functionality in the form of resumable
transfers.
Client libraries
C++
The C++ client library uses exponential backoff by default.
C#
The C# client library uses exponential backoff by default.
Go
The Go client library uses exponential backoff by default.
Java
The Java client library uses exponential backoff by default.
Node.js
Node.js can automatically use backoff strategies to retry requests
with the autoRetry
parameter.
PHP
The PHP client library uses exponential backoff by default.
Python
For retry strategy, the Python client library distinguishes between media and non-media operations:
Media operations include all actions that fetch or send payload data to objects. For example, this includes all methods of a
Blob
starting with the words "upload" or "download", as well asClient.download_blob_to_file
.Non-media operations are actions that only handle object metadata.
By default, media and non-media operations support retries for the following error codes:
- Connection errors:
requests.exceptions.ConnectionError
requests.exceptions.ChunkedEncodingError
(media API calls only)
- HTTP codes:
429 Too Many Requests
500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
508 Resource Limit Exceeded
Operations through Python use the following default settings for exponential backoff:
Default setting | Media calls | Non-media calls |
---|---|---|
Initial wait time (seconds) | 1 | 1 |
Wait time multiplier per iteration (seconds) | 2 | 2 |
Maximum amount of wait time (seconds) | 64 | 60 |
Default deadline (seconds) | 600 | 120 |
Jitter implemented | Yes | No |
A subset of media and non-media operations are only idempotent if they include specific optional arguments. Operations which are conditionally safe to retry are only retried by default if the condition case passes. Currently, these conditions include the following:
DEFAULT_RETRY_IF_GENERATION_SPECIFIED
- Safe to retry
if generation
orif_generation_match
was passed in as an argument to the method. Often methods only accept one of these two parameters.
- Safe to retry
DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED
- Safe to retry if
if_metageneration_match
was passed in as an argument to the method.
- Safe to retry if
DEFAULT_RETRY_IF_ETAG_IN_JSON
- Safe to retry if the method inserts an
etag
into the JSON request body. ForHMACKeyMetadata.update()
this means etag must be set on theHMACKeyMetadata
object itself. For theset_iam_policy()
method on other classes, this means the etag must be set in the "policy" argument passed into the method.
- Safe to retry if the method inserts an
Retry policies for media operations
For media operations, you can configure the num_retries
argument for
upload methods to specify the number of upload retries. By default, only
uploads with the if_metageneration_match
condition are retried to
guarantee idempotency. Setting the num_retries
argument overrides the
default behavior and guarantees retries even without the
if_metageneration_match
condition.
Retry policies for non-media operations
Non-media operations which are either safe or conditionally safe to retry
have a retry
parameter added to their method signature. The default for
these parameters is one of the following:
DEFAULT_RETRY
DEFAULT_RETRY_IF_GENERATION_SPECIFIED
DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED
DEFAULT_RETRY_IF_ETAG_IN_JSON
To modify the default retry behavior, create a copy of the
google.cloud.storage.retry.DEFAULT_RETRY
object by calling it with a
with_XXX
method. For example, to modify the default deadline to 30
seconds, pass retry=DEFAULT_RETRY.with_deadline(30)
. We recommend you
modify attributes one by one. For more information, see the
google-api-core Retry reference.
To configure your own conditional retry, create a ConditionalRetryPolicy
object and wrap your custom Retry
object with
DEFAULT_RETRY_IF_GENERATION_SPECIFIED
,
DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED
, or
DEFAULT_RETRY_IF_ETAG_IN_JSON
.
The following are examples of customized conditional retries:
blob.reload()
usesDEFAULT_RETRY
by default. To override this so that the function is not retried at all, call it asblob.reload(retry=None)
.bucket.update()
usesDEFAULT_RETRY_IF_METAGENERATION_SPECIFIED
by default. To override this so that the function retries even if the metageneration number is not specified, call it as:from google.cloud.storage.retry import DEFAULT_RETRY bucket.update(retry=DEFAULT_RETRY)
bucket.list_blobs()
usesDEFAULT_RETRY
by default. To override this so that the API call retries with a deadline of 20 seconds instead of the default 120 seconds, call it as:from google.cloud.storage.retry import DEFAULT_RETRY modified_retry = DEFAULT_RETRY.with_deadline(20) bucket.list_blobs(retry=modified_retry)
Ruby
The Ruby client library uses exponential backoff by default.
REST APIs
When calling the JSON or XML API directly, you should use the exponential backoff algorithm to implement your own retry strategy.
Idempotency of operations
The following table lists the Cloud Storage operations that fall into each category of idempotency.
Idempotency | Operations |
---|---|
Always idempotent |
|
Conditionally idempotent |
|
Never idempotent |
|
What's next
- Learn how to retry requests in Storage Transfer Service with Java or Python.
- Learn more about preconditions and generation numbers.