Retry strategy

Stay organized with collections Save and categorize content based on your preferences.

This page describes retry strategies such as truncated exponential backoff for failed requests to Cloud Storage. Most Cloud Storage tools provide automatic retries so you don't need to implement your own retry strategy.

How Cloud Storage tools implement retry strategies

Console

The Google Cloud console sends requests to Cloud Storage on your behalf and handles any necessary backoff.

Command line

Both gcloud storage commands and gsutil retry the errors listed in the Response section without requiring you to take additional action. You may have to take action for other errors, such as the following:

  • Invalid credentials or insufficient permissions.

  • Network unreachable because of a proxy configuration problem.

  • Individual operations that fail within a gsutil command where you use the -m top-level flag.

For retryable errors, the command line tools retry requests using a truncated binary exponential backoff strategy. The default number of maximum retries is 32 for gcloud CLI and 23 for gsutil.

For gcloud storage commands, you can control the retry strategy by creating a named configuration and setting some or all of the following properties:

  • base_retry_delay
  • exponential_sleep_multiplier
  • max_retries
  • max_retry_delay

You then apply the defined configuration either on a per-command basis by using the --configuration project-wide flag or for all gcloud commands by using the gcloud config set command.

For gsutil, you can configure the number of retries and the maximum delay of any individual retry by editing the num_retries and max_retry_delay configuration variables in the "[Boto]" section of the .boto config file.

Client libraries

C++

Default retry behavior

By default, operations support retries for the following HTTP error codes, as well as any socket errors that indicate the connection was lost or never successfully established.

  • 408 Request Timeout
  • 429 Too Many Requests
  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout

All exponential backoff and retry settings in the C++ library are configurable. If the algorithms implemented in the library do not support your needs, you can provide custom code to implement your own strategies.

Setting Default value
Auto retry True
Maximum time retrying a request 15 minutes
Initial wait (backoff) time 1 second
Wait time multiplier per iteration 2
Maximum amount of wait time 5 minutes

By default, the C++ library retries all operations with retryable errors, even those that never idempotent and can delete or create multiple resources when repeatedly successful. To only retry idempotent operations, use the google::cloud::storage::StrictIdempotencyPolicy.

Customize retries

To customize the retry behavior, provide values for the following options when you initialize the google::cloud::storage::Client object:

  • google::cloud::storage::RetryPolicyOption: The library provides google::cloud::storage::LimitedErrorCountRetryPolicy and google::cloud::storage::LimitedTimeRetryPolicy classes. You can provide your own class, which must implement the google::cloud::RetryPolicy interface.

  • google::cloud::storage::BackoffPolicyOption: The library provides the google::cloud::storage::ExponentialBackoffPolicy class. You can provide your own class, which must implement the google::cloud::storage::BackoffPolicy interface.

  • google::cloud::storage::IdempotencyPolicyOption: The library provides the google::cloud::storage::StrictIdempotencyPolicy and google::cloud::storage::AlwaysRetryIdempotencyPolicy classes. You can provide your own class, which must implement the google::cloud::storage::IdempotencyPolicy interface.

namespace gcs = ::google::cloud::storage;
// Create the client configuration:
auto options = google::cloud::Options{};
// Retries only idempotent operations.
options.set<gcs::IdempotencyPolicyOption>(
    gcs::StrictIdempotencyPolicy().clone());
// On error, it backs off for 1 second, then 3 seconds, then 9 seconds, etc.
// The backoff time never grows larger than 1 minute. The strategy introduces
// jitter around the backoff delay.
options.set<gcs::BackoffPolicyOption>(
    gcs::ExponentialBackoffPolicy(
        /*initial_delay=*/std::chrono::seconds(1),
        /*maximum_delay=*/std::chrono::minutes(1),
        /*scaling=*/3.0)
        .clone());
// Retries all operations for up to 5 minutes, including any backoff time.
options.set<gcs::RetryPolicyOption>(
    gcs::LimitedTimeRetryPolicy(std::chrono::minutes(5)).clone());
return gcs::Client(std::move(options));

C#

The C# client library uses exponential backoff by default.

Go

Default retry behavior

By default, operations support retries for the following errors:

  • Connection errors:
    • io.ErrUnexpectedEOF: This may occur due to transient network issues.
    • url.Error containing connection refused: This may occur due to transient network issues.
    • url.Error containing connection reset by peer: This means that GCP has reset the connection.
    • net.ErrClosed: This means that GCP has closed the connection.
  • HTTP codes:
    • 408 Request Timeout
    • 429 Too Many Requests
    • 500 Internal Server Error
    • 502 Bad Gateway
    • 503 Service Unavailable
    • 504 Gateway Timeout
  • Errors that implement the Temporary() interface and give a value of err.Temporary() == true
  • Any of the above errors that have been wrapped using Go 1.13 error wrapping

All exponential backoff settings in the Go library are configurable. By default, operations in Go use the following settings for exponential backoff (defaults are taken from gax):

Setting Default value (in seconds)
Auto retry True if idempotent
Max number of attempts No limit
Initial retry delay 1 second
Retry delay multiplier 2.0
Maximum retry delay 30 seconds
Total timeout (resumable upload chunk) 32 seconds
Total timeout (all other operations) No limit

In general, retrying continues indefinitely unless the controlling context is canceled, the client is closed, or a non-transient error is received. To stop retries from continuing, use context timeouts or cancellation. The only exception to this behavior is when performing resumable uploads using Writer, where the data is large enough that it requires multiple requests. In this scenario, each chunk times out and stops retrying after 32 seconds by default. You can adjust the default timeout by changing Writer.ChunkRetryDeadline.

There is a subset of Go operations that are conditionally idempotent (conditionally safe to retry). These operations only retry if they meet specific conditions:

  • GenerationMatch or Generation

    • Safe to retry if a GenerationMatch precondition was applied to the call, or if ObjectHandle.Generation was set.
  • MetagenerationMatch

    • Safe to retry if a MetagenerationMatch precondition was applied to the call.
  • Etag

    • Safe to retry if the method inserts an etag into the JSON request body. Only used in HMACKeyHandle.Update when HmacKeyMetadata.Etag has been set.

RetryPolicy is set to RetryPolicy.RetryIdempotent by default. See the Customize retries section below for examples on how to modify the default retry behavior.

Customize retries

When you initialize a storage client, a default retry configuration will be set. Unless they're overridden, the options in the config are set to the values in the table above. Users can configure non-default retry behavior for a single library call (using BucketHandle.Retryer and ObjectHandle.Retryer) or for all calls made by a client (using Client.SetRetry). To modify retry behavior, pass in the desired RetryOptions to one of these methods.

See the following code sample to learn how to customize your retry behavior.

import (
	"context"
	"fmt"
	"io"
	"time"

	"cloud.google.com/go/storage"
	"github.com/googleapis/gax-go/v2"
)

// configureRetries configures a custom retry strategy for a single API call.
func configureRetries(w io.Writer, bucket, object string) error {
	// bucket := "bucket-name"
	// object := "object-name"
	ctx := context.Background()
	client, err := storage.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("storage.NewClient: %v", err)
	}
	defer client.Close()

	// Configure retries for all operations using this ObjectHandle. Retries may
	// also be configured on the BucketHandle or Client types.
	o := client.Bucket(bucket).Object(object).Retryer(
		// Use WithBackoff to control the timing of the exponential backoff.
		storage.WithBackoff(gax.Backoff{
			// Set the initial retry delay to a maximum of 2 seconds. The length of
			// pauses between retries is subject to random jitter.
			Initial: 2 * time.Second,
			// Set the maximum retry delay to 60 seconds.
			Max: 60 * time.Second,
			// Set the backoff multiplier to 3.0.
			Multiplier: 3,
		}),
		// Use WithPolicy to customize retry so that all requests are retried even
		// if they are non-idempotent.
		storage.WithPolicy(storage.RetryAlways),
	)

	// Use context timeouts to set an overall deadline on the call, including all
	// potential retries.
	ctx, cancel := context.WithTimeout(ctx, 500*time.Second)
	defer cancel()

	// Delete an object using the specified retry policy.
	if err := o.Delete(ctx); err != nil {
		return fmt.Errorf("Object(%q).Delete: %v", object, err)
	}
	fmt.Fprintf(w, "Blob %v deleted with a customized retry strategy.\n", object)
	return nil
}

Java

Default retry behavior

By default, operations support retries for the following errors:

  • Connection errors:
    • Connection reset by peer: This means that GCP has reset the connection.
    • Unexpected connection closure: This means GCP has closed the connection.
  • HTTP codes:
    • 408 Request Timeout
    • 429 Too Many Requests
    • 500 Internal Server Error
    • 502 Bad Gateway
    • 503 Service Unavailable
    • 504 Gateway Timeout

All exponential backoff settings in the Java library are configurable. By default, operations through Java use the following settings for exponential backoff:

Setting Default value (in seconds)
Auto retry True if idempotent
Max number of attempts 6
Initial retry delay 1 second
Retry delay multiplier 2.0
Maximum retry delay 32 seconds
Total Timeout 50 seconds
Initial RPC Timeout 50 seconds
RPC Timeout Multiplier 1.0
Max RPC Timeout 50 seconds
Connect Timeout 20 seconds
Read Timeout 20 seconds

For more information about the parameters above, see the Java reference documentation for RetrySettings.Builder and HttpTransportOptions.Builder.

There is a subset of Java operations that are conditionally idempotent (conditionally safe to retry). These operations only retry if they include specific arguments:

  • ifGenerationMatch or generation

    • Safe to retry if ifGenerationMatch or generation was passed in as an option to the method.
  • ifMetagenerationMatch

    • Safe to retry if ifMetagenerationMatch was passed in as an option.

StorageOptions.setStorageRetryStrategy is set to StorageRetryStrategy#getDefaultStorageRetryStrategy by default. See the Customize retries section below for examples on how to modify the default retry behavior.

Customize retries

When you initialize Storage, an instance of RetrySettings is initialized as well. Unless they are overridden, the options in the RetrySettings are set to the values in the table above. To modify the default automatic retry behavior, pass the custom StorageRetryStrategy into the StorageOptions used to construct the Storage instance. To modify any of the other scalar parameters, pass a custom RetrySettings into the StorageOptions used to construct the Storage instance.

See the following example to learn how to customize your retry behavior:


import com.google.api.gax.retrying.RetrySettings;
import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;
import com.google.cloud.storage.StorageRetryStrategy;
import org.threeten.bp.Duration;

public final class ConfigureRetries {
  public static void main(String[] args) {
    String bucketName = "my-bucket";
    String blobName = "blob/to/delete";
    deleteBlob(bucketName, blobName);
  }

  static void deleteBlob(String bucketName, String blobName) {
    // Customize retry behavior
    RetrySettings retrySettings =
        StorageOptions.getDefaultRetrySettings()
            .toBuilder()
            // Set the max number of attempts to 10 (initial attempt plus 9 retries)
            .setMaxAttempts(10)
            // Set the backoff multiplier to 3.0
            .setRetryDelayMultiplier(3.0)
            // Set the max duration of all attempts to 5 minutes
            .setTotalTimeout(Duration.ofMinutes(5))
            .build();

    StorageOptions alwaysRetryStorageOptions =
        StorageOptions.newBuilder()
            // Customize retry so all requests are retried even if they are non-idempotent.
            .setStorageRetryStrategy(StorageRetryStrategy.getUniformStorageRetryStrategy())
            // provide the previously configured retrySettings
            .setRetrySettings(retrySettings)
            .build();

    // Instantiate a client
    Storage storage = alwaysRetryStorageOptions.getService();

    // Delete the blob
    BlobId blobId = BlobId.of(bucketName, blobName);
    boolean success = storage.delete(blobId);

    System.out.printf(
        "Deletion of Blob %s completed %s.%n", blobId, success ? "successfully" : "unsuccessfully");
  }
}

Node.js

Default retry behavior

By default, operations support retries for the following error codes:

  • Connection errors:
    • EAI_again: This is a DNS lookup error. More information can be found here.
    • Connection reset by peer: This means that GCP has reset the connection.
    • Unexpected connection closure: This means GCP has closed the connection.
  • HTTP codes:
    • 408 Request Timeout
    • 429 Too Many Requests
    • 500 Internal Server Error
    • 502 Bad Gateway
    • 503 Service Unavailable
    • 504 Gateway Timeout

All exponential backoff settings in the Node.js library are configurable. By default, operations through Node.js use the following settings for exponential backoff:

Setting Default value (in seconds)
Auto retry True if idempotent
Maximum number of retries 3
Initial wait time 1 second
Wait time multiplier per iteration 2
Maximum amount of wait time 64 seconds
Default deadline 600 seconds

There is a subset of Node.js operations that are conditionally idempotent (conditionally safe to retry). These operations only retry if they include specific arguments:

  • ifGenerationMatch or generation

    • Safe to retry if ifGenerationMatch or generation was passed in as an option to the method. Often, methods only accept one of these two parameters.
  • ifMetagenerationMatch

    • Safe to retry if ifMetagenerationMatch was passed in as an option.

retryOptions.idempotencyStrategy is set to IdempotencyStrategy.RetryConditional by default. See the Customize retries section below for examples on how to modify the default retry behavior.

Customize retries

When you initialize Cloud Storage, a retryOptions config file is initialized as well. Unless they're overridden, the options in the config are set to the values in the table above. To modify the default retry behavior, pass the custom retry configuration retryOptions into the storage constructor upon initialization. The Node.js client library can automatically use backoff strategies to retry requests with the autoRetry parameter.

See the following code sample to learn how to customize your retry behavior.

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// The ID of your GCS bucket
// const bucketName = 'your-unique-bucket-name';

// The ID of your GCS file
// const fileName = 'your-file-name';

// Imports the Google Cloud client library
const {Storage} = require('@google-cloud/storage');

// Creates a client
const storage = new Storage({
  retryOptions: {
    // If this is false, requests will not retry and the parameters
    // below will not affect retry behavior.
    autoRetry: true,
    // The multiplier by which to increase the delay time between the
    // completion of failed requests, and the initiation of the subsequent
    // retrying request.
    retryDelayMultiplier: 3,
    // The total time between an initial request getting sent and its timeout.
    // After timeout, an error will be returned regardless of any retry attempts
    // made during this time period.
    totalTimeout: 500,
    // The maximum delay time between requests. When this value is reached,
    // retryDelayMultiplier will no longer be used to increase delay time.
    maxRetryDelay: 60,
    // The maximum number of automatic retries attempted before returning
    // the error.
    maxRetries: 5,
    // Will respect other retry settings and attempt to always retry
    // conditionally idempotent operations, regardless of precondition
    idempotencyStrategy: IdempotencyStrategy.RetryAlways,
  },
});
console.log(
  'Functions are customized to be retried according to the following parameters:'
);
console.log(`Auto Retry: ${storage.retryOptions.autoRetry}`);
console.log(
  `Retry delay multiplier: ${storage.retryOptions.retryDelayMultiplier}`
);
console.log(`Total timeout: ${storage.retryOptions.totalTimeout}`);
console.log(`Maximum retry delay: ${storage.retryOptions.maxRetryDelay}`);
console.log(`Maximum retries: ${storage.retryOptions.maxRetries}`);
console.log(
  `Idempotency strategy: ${storage.retryOptions.idempotencyStrategy}`
);

async function deleteFileWithCustomizedRetrySetting() {
  await storage.bucket(bucketName).file(fileName).delete();
  console.log(`File ${fileName} deleted with a customized retry strategy.`);
}

deleteFileWithCustomizedRetrySetting();

PHP

The PHP client library uses exponential backoff by default.

Python

Default retry behavior

By default, operations support retries for the following error codes:

  • Connection errors:
    • requests.exceptions.ConnectionError
    • requests.exceptions.ChunkedEncodingError (only for operations that fetch or send payload data to objects, like uploads and downloads)
    • ConnectionError
  • HTTP codes:
    • 408 Request Timeout
    • 429 Too Many Requests
    • 500 Internal Server Error
    • 502 Bad Gateway
    • 503 Service Unavailable
    • 504 Gateway Timeout

Operations through Python use the following default settings for exponential backoff:

Setting Default value (in seconds)
Auto retry True if idempotent
Initial wait time 1
Wait time multiplier per iteration 2
Maximum amount of wait time 60
Default deadline 120

There is a subset of Python operations that are conditionally idempotent (conditionally safe to retry) when they include specific arguments. These operations only retry if a condition case passes:

  • DEFAULT_RETRY_IF_GENERATION_SPECIFIED

    • Safe to retry if generation or if_generation_match was passed in as an argument to the method. Often methods only accept one of these two parameters.
  • DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED

    • Safe to retry if if_metageneration_match was passed in as an argument to the method.
  • DEFAULT_RETRY_IF_ETAG_IN_JSON

    • Safe to retry if the method inserts an etag into the JSON request body. For HMACKeyMetadata.update() this means etag must be set on the HMACKeyMetadata object itself. For the set_iam_policy() method on other classes, this means the etag must be set in the "policy" argument passed into the method.

Customize retries

To modify the default retry behavior, create a copy of the google.cloud.storage.retry.DEFAULT_RETRY object by calling it with a with_XXX method. The Python client library automatically uses backoff strategies to retry requests if you include the DEFAULT_RETRY parameter.

Note that with_predicate is not supported for operations that fetch or send payload data to objects, like uploads and downloads. It's recommended that you modify attributes one by one. For more information, see the google-api-core Retry reference.

To configure your own conditional retry, create a ConditionalRetryPolicy object and wrap your custom Retry object with DEFAULT_RETRY_IF_GENERATION_SPECIFIED, DEFAULT_RETRY_IF_METAGENERATION_SPECIFIED, or DEFAULT_RETRY_IF_ETAG_IN_JSON.

See the following code sample to learn how to customize your retry behavior.

from google.cloud import storage
from google.cloud.storage.retry import DEFAULT_RETRY


def configure_retries(bucket_name, blob_name):
    """Configures retries with customizations."""
    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"
    # The ID of your GCS object
    # blob_name = "your-object-name"

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(blob_name)

    # Customize retry with a deadline of 500 seconds (default=120 seconds).
    modified_retry = DEFAULT_RETRY.with_deadline(500.0)
    # Customize retry with an initial wait time of 1.5 (default=1.0).
    # Customize retry with a wait time multiplier per iteration of 1.2 (default=2.0).
    # Customize retry with a maximum wait time of 45.0 (default=60.0).
    modified_retry = modified_retry.with_delay(initial=1.5, multiplier=1.2, maximum=45.0)

    # blob.delete() uses DEFAULT_RETRY_IF_GENERATION_SPECIFIED by default.
    # Override with modified_retry so the function retries even if the generation
    # number is not specified.
    print(
        f"The following library method is customized to be retried according to the following configurations: {modified_retry}"
    )

    blob.delete(retry=modified_retry)
    print(f"Blob {blob_name} deleted with a customized retry strategy.")

Ruby

The Ruby client library uses exponential backoff by default.

REST APIs

When calling the JSON or XML API directly, you should use the exponential backoff algorithm to implement your own retry strategy.

Implement your own retry strategy

This section describes how you can implement your own retry strategy. It also provides guidance for using the exponential backoff algorithm.

There's two factors that determine whether or not a request is safe to retry:

  1. The response that you receive from the request.
  2. The idempotency of the request.

Response

The response that you receive from your request indicates whether or not it's useful to retry the request. Responses related to transient problems are generally retryable. On the other hand, response related to permanent errors indicate you need to make changes, such as authorization or configuration changes, before it's useful to try the request again. The following responses indicate transient problems that are useful to retry:

  • HTTP 408, 429, and 5xx response codes.
  • Socket timeouts and TCP disconnects.

For more information, see the status and error codes for JSON and XML.

Idempotency

A request that is idempotent means it can be performed repeatedly and always leaves the targeted resource in the same end state. For example, listing requests are always idempotent, because such requests do not modify resources. On the other hand, creating a new Pub/Sub notification is never idempotent, because it creates a new notification ID each time the request succeeds.

The following are examples of conditions that make an operation idempotent:

  • The operation has the same observable effect on the targeted resource even when continually requested.

  • The operation only succeeds once.

  • The operation has no observable effect on the state of the targeted resource.

When you receive a retryable response, you should consider the idempotency of the request, because retrying requests that are not idempotent can lead to race conditions and other conflicts.

Conditional idempotency

A subset of requests are conditionally idempotent, which means they are only idempotent if they include specific optional arguments. Operations that are conditionally safe to retry should only be retried by default if the condition case passes. Cloud Storage accepts preconditions and ETags as condition cases for requests.

Idempotency of operations

The following table lists the Cloud Storage operations that fall into each category of idempotency.

Idempotency Operations
Always idempotent
  • All get and list requests
  • Insert or delete buckets
  • Test bucket IAM policies and permissions
  • Lock retention policies
  • Delete an HMAC key or Pub/Sub notification
Conditionally idempotent
  • Update/patch requests for buckets with IfMetagenerationMatch or ETag as HTTP precondition
  • Update/patch requests for objects with IfMetagenerationMatch or ETag as HTTP precondition
  • Set a bucket IAM policy with ETag as HTTP precondition or in resource body
  • Update an HMAC key with ETag as HTTP precondition or in resource body
  • Insert, copy, compose, or rewrite objects with ifGenerationMatch
  • Delete an object with ifGenerationMatch (or with a generation number for object versions)
Never idempotent
  • Create an HMAC key
  • Create a Pub/Sub notification
  • Create, delete, or send patch/update requests for bucket and object ACLs or default object ACLs

Exponential backoff algorithm

For requests that meet both the response and idempotency criteria, you should generally use truncated exponential backoff.

Truncated exponential backoff is a standard error handling strategy for network applications in which a client periodically retries a failed request with increasing delays between requests.

An exponential backoff algorithm retries requests exponentially, increasing the waiting time between retries up to a maximum backoff time. See the following workflow example to learn how exponential backoff works:

  1. You make a request to Cloud Storage.

  2. If the request fails, wait 1 + random_number_milliseconds seconds and retry the request.

  3. If the request fails, wait 2 + random_number_milliseconds seconds and retry the request.

  4. If the request fails, wait 4 + random_number_milliseconds seconds and retry the request.

  5. And so on, up to a maximum_backoff time.

  6. Continue waiting and retrying up to a maximum amount of time (deadline), but do not increase the maximum_backoff wait period between retries.

where:

  • The wait time is min((2n +random_number_milliseconds), maximum_backoff), with n incremented by 1 for each iteration (request).

  • random_number_milliseconds is a random number of milliseconds less than or equal to 1000. This helps to avoid cases where many clients become synchronized and all retry at once, sending requests in synchronized waves. The value of random_number_milliseconds is recalculated after each retry request.

  • maximum_backoff is typically 32 or 64 seconds. The appropriate value depends on the use case.

You can continue retrying once you reach the maximum_backoff time, but it's recommended that you abort your request after a certain amount of time to prevent your application from becoming unresponsive. For example, say a client uses a maximum_backoff time of 64 seconds. After reaching this value, the client can retry every 64 seconds. The client then stops retrying after a deadline of 600 seconds.

How long clients should wait between retries and how many times they should retry depends on your use case and network conditions. For example, mobile clients of an application may need to retry more times and for longer intervals when compared to desktop clients of the same application.

If the retry requests fail after exceeding the maximum_backoff plus any additional time allowed for retries, report or log an error to Support.

What's next