Retry strategy

This page describes retry strategies such as truncated exponential backoff for failed requests to Cloud Storage.

Overview

Many Cloud Storage tools, such as the Cloud Console and most client libraries, automatically use a retry strategy, so you typically don't need to implement your own. If you do implement your own retry strategy, there's two factors that determine whether or not a request is safe to retry:

  1. The response that you receive from the request.
  2. The idempotency of the request.

See Implementing a retry strategy for a discussion of these factors and the recommended retry algorithm to use.

Retry strategy per Cloud Storage tool

Click the tabs below to view retry strategy recommendations for each Cloud Storage tool.

Console

The Cloud Console sends requests to Cloud Storage on your behalf and handles any necessary backoff.

gsutil

gsutil retries the errors listed in the Response section without requiring you to take additional action. You may have to take action for other errors, such as the following:

  • Invalid credentials or insufficient permissions.

  • Network unreachable because of a proxy configuration problem.

  • Individual operations that fail within a command where you use the -m top-level flag.

For retryable errors, gsutil retries requests using a truncated binary exponential backoff strategy. By default, gsutil retries 23 times over 1+2+4+8+16+32+60... seconds for about 10 minutes:

  • If a request fails, wait a random period between [0..1] seconds and retry;
  • If the request fails again, wait a random period between [0..2] seconds and retry;
  • If the request fails again, wait a random period between [0..4] seconds and retry;
  • And so on, up to 23 retries, with each retry period bounded by a default maximum of 60 seconds.

You can configure the number of retries and maximum delay of any individual retry by editing the num_retries and max_retry_delay configuration variables in the "[Boto]" section of the .boto config file.

Client libraries

C++

Default retry behavior

By default, operations support retries for the following HTTP error codes, as well as any socket errors that indicate the connection was lost or never successfully established.

  • 408 Request Timeout
  • 429 Too Many Requests
  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout

All exponential backoff and retry settings in the C++ library are configurable. If the algorithms implemented in the library do not support your needs, you can provide custom code to implement your own strategies.

Setting Default value
Auto retry True
Maximum time retrying a request 15 minutes
Initial wait (backoff) time 1 second
Wait time multiplier per iteration 2
Maximum amount of wait time 5 minutes

By default, the C++ library retries all operations with retryable errors, even those that never idempotent and can delete or create multiple resources when repeatedly successful. To only retry idempotent operations, use the google::cloud::storage::StrictIdempotencyPolicy.

Customize retries

To customize the retry behavior, provide values for the following options when you initialize the google::cloud::storage::Client object:

  • google::cloud::storage::RetryPolicyOption: The library provides google::cloud::storage::LimitedErrorCountRetryPolicy and google::cloud::storage::LimitedTimeRetryPolicy classes. You can provide your own class, which must implement the google::cloud::RetryPolicy interface.

  • google::cloud::storage::BackoffPolicyOption: The library provides the google::cloud::storage::ExponentialBackoffPolicy class. You can provide your own class, which must implement the google::cloud::storage::BackoffPolicy interface.

  • google::cloud::storage::IdempotencyPolicyOption: The library provides the google::cloud::storage::StrictIdempotencyPolicy and google::cloud::storage::AlwaysRetryIdempotencyPolicy classes. You can provide your own class, which must implement the google::cloud::storage::IdempotencyPolicy interface.

namespace gcs = ::google::cloud::storage;
// Create the client configuration:
auto options = google::cloud::Options{};
// Retries only idempotent operations.
options.set<gcs::IdempotencyPolicyOption>(
    gcs::StrictIdempotencyPolicy().clone());
// On error, it backs off for 1 second, then 3 seconds, then 9 seconds, etc.
// The backoff time never grows larger than 1 minute. The strategy introduces
// jitter around the backoff delay.
options.set<gcs::BackoffPolicyOption>(
    gcs::ExponentialBackoffPolicy(
        /*initial_delay=*/std::chrono::seconds(1),
        /*maximum_delay=*/std::chrono::minutes(1),
        /*scaling=*/3.0)
        .clone());
// Retries all operations for up to 5 minutes, including any backoff time.
options.set<gcs::RetryPolicyOption>(
    gcs::LimitedTimeRetryPolicy(std::chrono::minutes(5)).clone());
return gcs::Client(std::move(options));

C#

The C# client library uses exponential backoff by default.

Go

The Go client library uses exponential backoff by default.

Java

The Java client library uses exponential backoff by default.

Node.js

Default retry behavior

By default, operations support retries for the following error codes:

  • Connection errors:
    • EAI_again: This is a DNS lookup error. More information can be found here.
    • Connection reset by peer: This means that GCP has reset the connection.
    • Unexpected connection closure: This means GCP has closed the connection.
  • HTTP codes:
    • 408 Request Timeout
    • 429 Too Many Requests
    • 500 Internal Server Error
    • 502 Bad Gateway
    • 503 Service Unavailable
    • 504 Gateway Timeout

All exponential backoff settings in the Node.js library are configurable. By default, operations through Node.js use the following settings for exponential backoff:

Setting Default value (in seconds)
Auto retry True
Maximum number of retries 3
Initial wait time 1 second
Wait time multiplier per iteration 2
Maximum amount of wait time 64 seconds
Default deadline 600 seconds

There is a subset of Node.js operations that are conditionally idempotent (conditionally safe to retry). These operations only retry if they include specific arguments, also known as preconditions:

  • ifGenerationMatch or generation

    • Safe to retry if ifGenerationMatch or generation was passed in as an option to the method. Often, methods only accept one of these two parameters.
  • ifMetagenerationMatch

    • Safe to retry if ifMetagenerationMatch was passed in as an option.

retryOptions.idempotencyStrategy is set to IdempotencyStrategy.RetryConditional by default. See the Customize retries section below for examples on how to modify the default retry behavior.

Customize retries

When you initialize Cloud Storage, a retryOptions config file is initialized as well. Unless they're overridden, the options in the config are set to the values in the table above. To modify the default retry behavior, pass the custom retry configuration retryOptions into the storage constructor upon initialization. The Node.js client library can automatically use backoff strategies to retry requests with the autoRetry parameter.

See the following code sample to learn how to customize your retry behavior.

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// The ID of your GCS bucket
// const bucketName = 'your-unique-bucket-name';

// The ID of your GCS file
// const fileName = 'your-file-name';

// Imports the Google Cloud client library
const {Storage} = require('@google-cloud/storage');

// Creates a client
const storage = new Storage({
  retryOptions: {
    // If this is false, requests will not retry and the parameters
    // below will not affect retry behavior.
    autoRetry: true,
    // The multiplier by which to increase the delay time between the
    // completion of failed requests, and the initiation of the subsequent
    // retrying request.
    retryDelayMultiplier: 3,
    // The total time between an initial request getting sent and its timeout.
    // After timeout, an error will be returned regardless of any retry attempts
    // made during this time period.
    totalTimeout: 500,
    // The maximum delay time between requests. When this value is reached,
    // retryDelayMultiplier will no longer be used to increase delay time.
    maxRetryDelay: 60,
    // The maximum number of automatic retries attempted before returning
    // the error.
    maxRetries: 5,
    // Will respect other retry settings and attempt to always retry
    // conditionally idempotent operations, regardless of precondition
    idempotencyStrategy: IdempotencyStrategy.RetryAlways,
  },
});
console.log(
  'Functions are customized to be retried according to the following parameters:'
);
console.log(`Auto Retry: ${storage.retryOptions.autoRetry}`);
console.log(
  `Retry delay multiplier: ${storage.retryOptions.retryDelayMultiplier}`
);
console.log(`Total timeout: ${storage.retryOptions.totalTimeout}`);
console.log(`Maximum retry delay: ${storage.retryOptions.maxRetryDelay}`);
console.log(`Maximum retries: ${storage.retryOptions.maxRetries}`);
console.log(
  `Idempotency strategy: ${storage.retryOptions.idempotencyStrategy}`
);

async function deleteFileWithCustomizedRetrySetting() {
  await storage.bucket(bucketName).file(fileName).delete();
  console.log(`File ${fileName} deleted with a customized retry strategy.`);
}

deleteFileWithCustomizedRetrySetting();

PHP

The PHP client library uses exponential backoff by default.

Python

Default retry behavior

By default, operations support retries for the following error codes:

  • Connection errors:
    • requests.exceptions.ConnectionError
    • requests.exceptions.ChunkedEncodingError (only for operations that fetch or send payload data to objects, like uploads and downloads)
    • ConnectionError
  • HTTP codes:
    • 408 Request Timeout
    • 429 Too Many Requests
    • 500 Internal Server Error
    • 502 Bad Gateway
    • 503 Service Unavailable
    • 504 Gateway Timeout

Operations through Python use the following default settings for exponential backoff:

Setting Default value (in seconds)
Initial wait time 1
Wait time multiplier per iteration 2
Maximum amount of wait time 60
Default deadline 120

There is a subset of Python operations that are conditionally idempotent (conditionally safe to retry) when they include specific arguments. These operations only retry if a condition case passes:

  • DEFAULT_RETRY_IF_GENERATION_SPECIFIED

    • Safe to retry if generation or if_generation_match was passed in as an argument to the method. Often methods only accept one of these two parameters.
  • DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED

    • Safe to retry if if_metageneration_match was passed in as an argument to the method.
  • DEFAULT_RETRY_IF_ETAG_IN_JSON

    • Safe to retry if the method inserts an etag into the JSON request body. For HMACKeyMetadata.update() this means etag must be set on the HMACKeyMetadata object itself. For the set_iam_policy() method on other classes, this means the etag must be set in the "policy" argument passed into the method.

Customize retries

To modify the default retry behavior, create a copy of the google.cloud.storage.retry.DEFAULT_RETRY object by calling it with a with_XXX method. The Python client library automatically uses backoff strategies to retry requests if you include the DEFAULT_RETRY parameter.

Note that with_predicate is not supported for operations that fetch or send payload data to objects, like uploads and downloads. We recommend you modify attributes one by one. For more information, see the google-api-core Retry reference.

To configure your own conditional retry, create a ConditionalRetryPolicy object and wrap your custom Retry object with DEFAULT_RETRY_IF_GENERATION_SPECIFIED, DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED, or DEFAULT_RETRY_IF_ETAG_IN_JSON.

See the following code sample to learn how to customize your retry behavior.

from google.cloud import storage
from google.cloud.storage.retry import DEFAULT_RETRY


def configure_retries(bucket_name, blob_name):
    """Configures retries with customizations."""
    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"
    # The ID of your GCS object
    # blob_name = "your-object-name"

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(blob_name)

    # Customize retry with a deadline of 500 seconds (default=120 seconds).
    modified_retry = DEFAULT_RETRY.with_deadline(500.0)
    # Customize retry with an initial wait time of 1.5 (default=1.0).
    # Customize retry with a wait time multiplier per iteration of 1.2 (default=2.0).
    # Customize retry with a maximum wait time of 45.0 (default=60.0).
    modified_retry = modified_retry.with_delay(initial=1.5, multiplier=1.2, maximum=45.0)

    # blob.delete() uses DEFAULT_RETRY_IF_GENERATION_SPECIFIED by default.
    # Override with modified_retry so the function retries even if the generation
    # number is not specified.
    print(
        f"The following library method is customized to be retried according to the following configurations: {modified_retry}"
    )

    blob.delete(retry=modified_retry)
    print("Blob {} deleted with a customized retry strategy.".format(blob_name))

Ruby

The Ruby client library uses exponential backoff by default.

REST APIs

When calling the JSON or XML API directly, you should use the exponential backoff algorithm to implement your own retry strategy.

Implementing a retry strategy

This section discusses the factors to consider when building your own retry strategy, as well as the recommended retry algorithm to use.

Response

The response that you receive from your request indicates whether or not it's useful to retry the request. Responses related to transient problems are generally retryable. On the other hand, response related to permanent errors indicate you need to make changes, such as authorization or configuration changes, before it's useful to try the request again. The following responses indicate transient problems that are useful to retry:

  • HTTP 408, 429, and 5xx response codes.
  • Socket timeouts and TCP disconnects.

For more information, see the status and error codes for JSON and XML.

Idempotency

A request that is idempotent means it can be performed repeatedly and always leave the resource it targets in the same end state. For example, listing requests are always idempotent, because such requests do not modify resources. On the other hand, creating a new Pub/Sub notification is never idempotent, because it creates a new notification ID each time the request succeeds.

The following are examples of conditions that satisfy idempotency:

  • The operation has the same observable effect on the targeted resource even when continually requested.

  • The operation only succeeds once.

  • The operation has no observable effect on the state of the targeted resource.

When you receive a retryable response, you should consider the idempotency of the request, because retrying requests that are not idempotent can lead to race conditions and other conflicts.

Conditional idempotency

A subset of requests are conditionally idempotent, which means they are only idempotent if they include specific optional arguments. Operations which are conditionally safe to retry should only be retried by default if the condition case passes. Cloud Storage accepts preconditions and ETags as condition cases for requests.

Idempotency of operations

The following table lists the Cloud Storage operations that fall into each category of idempotency.

Idempotency Operations
Always idempotent
  • All get and list requests
  • Insert or delete buckets
  • Test bucket IAM policies and permissions
  • Lock retention policies
  • Delete an HMAC key or Pub/Sub notification
Conditionally idempotent
  • Update/patch requests for buckets with IfMetagenerationMatch or ETag as HTTP precondition
  • Update/patch requests for objects with IfMetagenerationMatch or ETag as HTTP precondition
  • Set a bucket IAM policy with ETag as HTTP precondition or in resource body
  • Update an HMAC key with ETag as HTTP precondition or in resource body
  • Insert, copy, compose, or rewrite objects with ifGenerationMatch
  • Delete an object with ifGenerationMatch (or with a generation number for object versions)
Never idempotent
  • Create an HMAC key
  • Create a Pub/Sub notification
  • Create, delete, or send patch/update requests for bucket and object ACLs or default object ACLs

Exponential backoff algorithm

For requests that meet both the response and idempotency criteria, you should generally use truncated exponential backoff.

Truncated exponential backoff is a standard error handling strategy for network applications in which a client periodically retries a failed request with increasing delays between requests.

An exponential backoff algorithm retries requests exponentially, increasing the waiting time between retries up to a maximum backoff time. An example is:

  1. Make a request to Cloud Storage.

  2. If the request fails, wait 1 + random_number_milliseconds seconds and retry the request.

  3. If the request fails, wait 2 + random_number_milliseconds seconds and retry the request.

  4. If the request fails, wait 4 + random_number_milliseconds seconds and retry the request.

  5. And so on, up to a maximum_backoff time.

  6. Continue waiting and retrying up to a maximum amount of time (deadline), but do not increase the maximum_backoff wait period between retries.

where:

  • The wait time is min((2n +random_number_milliseconds), maximum_backoff), with n incremented by 1 for each iteration (request).

  • random_number_milliseconds is a random number of milliseconds less than or equal to 1000. This helps to avoid cases where many clients become synchronized and all retry at once, sending requests in synchronized waves. The value of random_number_milliseconds is recalculated after each retry request.

  • maximum_backoff is typically 32 or 64 seconds. The appropriate value depends on the use case.

You can continue retrying once you reach the maximum_backoff time, but we recommend your request fail out after an amount of time to prevent your application from becoming unresponsive. For example, if a client uses a maximum_backoff time of 64 seconds, then after reaching this value, the client can retry every 64 seconds. The client then stops retrying after a deadline of 600 seconds.

How long clients should wait between retries and how many times they should retry depends on your use case and network conditions. For example, mobile clients of an application may need to retry more times and for longer intervals when compared to desktop clients of the same application.

If the retry requests fail after exceeding the maximum_backoff plus any additional time allowed for retries, report or log an error using one of the methods listed under Support & help.

What's next