Retry strategy

This page describes retry strategies such as truncated exponential backoff for failed requests to Cloud Storage.

Overview

To decide whether to retry a failed request to Cloud Storage, consider the type of the request and its idempotency, which determines whether the operation is safe to retry. Generally, you should use truncated exponential backoff to retry the following types of requests:

  • All requests to Cloud Storage that return HTTP 5xx and 429 response codes, including uploads and downloads of data or metadata.

  • Resumable uploads that return HTTP 408 response codes.

  • Socket timeouts and TCP disconnects.

For more information, see the status and error codes for JSON and XML.

Exponential backoff algorithm

Truncated exponential backoff is a standard error handling strategy for network applications in which a client periodically retries a failed request with increasing delays between requests.

An exponential backoff algorithm retries requests exponentially, increasing the waiting time between retries up to a maximum backoff time. An example is:

  1. Make a request to Cloud Storage.

  2. If the request fails, wait 1 + random_number_milliseconds seconds and retry the request.

  3. If the request fails, wait 2 + random_number_milliseconds seconds and retry the request.

  4. If the request fails, wait 4 + random_number_milliseconds seconds and retry the request.

  5. And so on, up to a maximum_backoff time.

  6. Continue waiting and retrying up to a maximum amount of time (deadline), but do not increase the maximum_backoff wait period between retries.

where:

  • The wait time is min((2n +random_number_milliseconds), maximum_backoff), with n incremented by 1 for each iteration (request).

  • random_number_milliseconds is a random number of milliseconds less than or equal to 1000. This helps to avoid cases where many clients become synchronized and all retry at once, sending requests in synchronized waves. The value of random_number_milliseconds is recalculated after each retry request.

  • maximum_backoff is typically 32 or 64 seconds. The appropriate value depends on the use case.

You can continue retrying once you reach the maximum_backoff time, but we recommend your request fail out after an amount of time to prevent your application from becoming unresponsive. For example, if a client uses a maximum_backoff time of 64 seconds, then after reaching this value, the client can retry every 64 seconds. The client then stops retrying after a deadline of 600 seconds.

How long clients should wait between retries and how many times they should retry depends on your use case and network conditions. For example, mobile clients of an application may need to retry more times and for longer intervals when compared to desktop clients of the same application.

If the retry requests fail after exceeding the maximum_backoff plus any additional time allowed for retries, report or log an error using one of the methods listed under Support & help.

Idempotency

To determine whether it's safe to retry a failed request to Cloud Storage, consider whether the request is idempotent, which means that applying the same operation multiple times has the same effect on the state of the targeted resource. Idempotent operations are generally safe to retry.

The following are examples of conditions that satisfy idempotency:

  • The operation has the same observable effect on the targeted resource even when continually requested.

  • The operation only succeeds once.

  • The operation has no observable effect on the state of the targeted resource.

For example, a request to list buckets has the same effect even if the request succeeds multiple times. On the other hand, an operation like creating a new Pub/Sub notification is not idempotent, because it creates a new notification ID each time the request succeeds.

Conditional idempotency

A subset of requests are conditionally idempotent, which means they are only idempotent if they include specific optional arguments. Operations which are conditionally safe to retry should only be retried by default if the condition case passes. Cloud Storage accepts preconditions and ETags as condition cases for requests.

Retry strategy per Cloud Storage tool

Click the tabs below to view retry strategy recommendations for each Cloud Storage tool.

Console

The Cloud Console sends requests to Cloud Storage on your behalf and handles any necessary backoff.

gsutil

gsutil retries the errors listed in the Overview section without requiring you to take additional action. You may have to take action for other errors, such as the following:

  • Invalid credentials or insufficient permissions.

  • Network unreachable because of a proxy configuration problem.

  • Individual operations that fail within a command where you use the -m top-level flag.

For retryable errors, gsutil retries requests using a truncated binary exponential backoff strategy. By default, gsutil retries 23 times over 1+2+4+8+16+32+60... seconds for about 10 minutes:

  • If a request fails, wait a random period between [0..1] seconds and retry;
  • If the request fails again, wait a random period between [0..2] seconds and retry;
  • If the request fails again, wait a random period between [0..4] seconds and retry;
  • And so on, up to 23 retries, with each retry period bounded by a default maximum of 60 seconds.

You can configure the number of retries and maximum delay of any individual retry by editing the num_retries and max_retry_delay configuration variables in the "[Boto]" section of the .boto config file.

For data transfers using the gsutil cp and rsync commands, gsutil provides additional retry functionality in the form of resumable transfers.

Client libraries

C++

The C++ client library uses exponential backoff by default.

C#

The C# client library uses exponential backoff by default.

Go

The Go client library uses exponential backoff by default.

Java

The Java client library uses exponential backoff by default.

Node.js

Node.js can automatically use backoff strategies to retry requests with the autoRetry parameter.

PHP

The PHP client library uses exponential backoff by default.

Python

For retry strategy, the Python client library distinguishes between media and non-media operations:

  • Media operations include all actions that fetch or send payload data to objects. For example, this includes all methods of a Blob starting with the words "upload" or "download", as well as Client.download_blob_to_file.

  • Non-media operations are actions that only handle object metadata.

By default, media and non-media operations support retries for the following error codes:

  • Connection errors:
    • requests.exceptions.ConnectionError
    • requests.exceptions.ChunkedEncodingError (media API calls only)
  • HTTP codes:
    • 429 Too Many Requests
    • 500 Internal Server Error
    • 502 Bad Gateway
    • 503 Service Unavailable
    • 504 Gateway Timeout
    • 508 Resource Limit Exceeded

Operations through Python use the following default settings for exponential backoff:

Default setting Media calls Non-media calls
Initial wait time (seconds) 1 1
Wait time multiplier per iteration (seconds) 2 2
Maximum amount of wait time (seconds) 64 60
Default deadline (seconds) 600 120
Jitter implemented Yes No

A subset of media and non-media operations are only idempotent if they include specific optional arguments. Operations which are conditionally safe to retry are only retried by default if the condition case passes. Currently, these conditions include the following:

  • DEFAULT_RETRY_IF_GENERATION_SPECIFIED

    • Safe to retry if generation or if_generation_match was passed in as an argument to the method. Often methods only accept one of these two parameters.
  • DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED

    • Safe to retry if if_metageneration_match was passed in as an argument to the method.
  • DEFAULT_RETRY_IF_ETAG_IN_JSON

    • Safe to retry if the method inserts an etag into the JSON request body. For HMACKeyMetadata.update() this means etag must be set on the HMACKeyMetadata object itself. For the set_iam_policy() method on other classes, this means the etag must be set in the "policy" argument passed into the method.

Retry policies for media operations

For media operations, you can configure the num_retries argument for upload methods to specify the number of upload retries. By default, only uploads with the if_metageneration_match condition are retried to guarantee idempotency. Setting the num_retries argument overrides the default behavior and guarantees retries even without the if_metageneration_match condition.

Retry policies for non-media operations

Non-media operations which are either safe or conditionally safe to retry have a retry parameter added to their method signature. The default for these parameters is one of the following:

  • DEFAULT_RETRY
  • DEFAULT_RETRY_IF_GENERATION_SPECIFIED
  • DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED
  • DEFAULT_RETRY_IF_ETAG_IN_JSON

To modify the default retry behavior, create a copy of the google.cloud.storage.retry.DEFAULT_RETRY object by calling it with a with_XXX method. For example, to modify the default deadline to 30 seconds, pass retry=DEFAULT_RETRY.with_deadline(30). We recommend you modify attributes one by one. For more information, see the google-api-core Retry reference.

To configure your own conditional retry, create a ConditionalRetryPolicy object and wrap your custom Retry object with DEFAULT_RETRY_IF_GENERATION_SPECIFIED, DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED, or DEFAULT_RETRY_IF_ETAG_IN_JSON.

The following are examples of customized conditional retries:

  • blob.reload() uses DEFAULT_RETRY by default. To override this so that the function is not retried at all, call it as blob.reload(retry=None).

  • bucket.update() uses DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED by default. To override this so that the function retries even if the metageneration number is not specified, call it as:

    from google.cloud.storage.retry import DEFAULT_RETRY
    bucket.update(retry=DEFAULT_RETRY)
  • bucket.list_blobs() uses DEFAULT_RETRY by default. To override this so that the API call retries with a deadline of 20 seconds instead of the default 120 seconds, call it as:

    from google.cloud.storage.retry import DEFAULT_RETRY
    modified_retry = DEFAULT_RETRY.with_deadline(20)
    bucket.list_blobs(retry=modified_retry)

Ruby

The Ruby client library uses exponential backoff by default.

REST APIs

When calling the JSON or XML API directly, you should use the exponential backoff algorithm to implement your own retry strategy.

Idempotency of operations

The following table lists the Cloud Storage operations that fall into each category of idempotency.

Idempotency Operations
Always idempotent
  • All get and list requests
  • Insert or delete buckets
  • Test bucket IAM policies and permissions
  • Lock retention policies
  • Delete an HMAC key or Pub/Sub notification
Conditionally idempotent
  • Update/patch requests for buckets with IfMetagenerationMatch or ETag as HTTP precondition
  • Update/patch requests for objects with IfMetagenerationMatch or ETag as HTTP precondition
  • Set a bucket IAM policy with ETag as HTTP precondition or in resource body
  • Update an HMAC key with ETag as HTTP precondition or in resource body
  • Insert, copy, compose, or rewrite objects with ifGenerationMatch
  • Delete an object with ifGenerationMatch (or with a generation number for object versions)
Never idempotent
  • Create an HMAC key
  • Create a Pub/Sub notification
  • Create, delete, or send patch/update requests for bucket and object ACLs or default object ACLs

What's next