Override Retry, Backoff, and Idempotency Policies
When it is safe to do so, the library automatically retries requests that fail due to a transient error. The library then uses exponential backoff to backoff before trying again. Which operations are considered safe to retry, which errors are treated as transient failures, the details of the exponential backoff algorithm, and for how long the library retries are all configurable via policies.
This document provides examples showing how to override the default policies.
The policies can be set when the *Connection object is created. The library provides default policies for any policy that is not set. The application can also override some (or all) policies when the *Client object is created. This can be useful if multiple *Client objects share the same *Connection object, but you want different retry behavior in some of the clients. Finally, the application can override some retry policies when calling a specific member function.
The library uses three different options to control the retry loop. The options have per-client names.
Configuring the transient errors and retry duration
The *RetryPolicyOption controls:
- Which errors are to be treated as transient errors.
- How long the library will keep retrying transient errors.
You can provide your own class for this option. The library also provides two built-in policies:
- *LimitedErrorCountRetryPolicy: stops retrying after a specified number of transient errors.
- *LimitedTimeRetryPolicy: stops retrying after a specified time.
Note that a library may have more than one version of these classes. Their name match the *Client and *Connection object they are intended to be used with. Some *Client objects treat different error codes as transient errors. In most cases, only kUnavailable is treated as a transient error.
Controlling the backoff algorithm
The *BackoffPolicyOption controls how long the client library will wait before retrying a request that failed with a transient error. You can provide your own class for this option.
The only built-in backoff policy is ExponentialBackoffPolicy. This class implements a truncated exponential backoff algorithm, with jitter. In summary, it doubles the current backoff time after each failure. The actual backoff time for an RPC is chosen at random, but never exceeds the current backoff. The current backoff is doubled after each failure, but never exceeds (or is "truncated") if it reaches a prescribed maximum.
Controlling which operations are retryable
The *IdempotencyPolicyOption controls which requests are retryable, as some requests are never safe to retry.
Only one built-in idempotency policy is provided by the library. The name matches the name of the client it is intended for. For example, FooBarClient will use FooBarIdempotencyPolicy. This policy is very conservative.
Example
For example, this will override the retry policies for datamigration_v1::DataMigrationServiceClient:
  auto options =
      google::cloud::Options{}
          .set<google::cloud::datamigration_v1::
                   DataMigrationServiceConnectionIdempotencyPolicyOption>(
              CustomIdempotencyPolicy().clone())
          .set<google::cloud::datamigration_v1::
                   DataMigrationServiceRetryPolicyOption>(
              google::cloud::datamigration_v1::
                  DataMigrationServiceLimitedErrorCountRetryPolicy(3)
                      .clone())
          .set<google::cloud::datamigration_v1::
                   DataMigrationServiceBackoffPolicyOption>(
              google::cloud::ExponentialBackoffPolicy(
                  /*initial_delay=*/std::chrono::milliseconds(200),
                  /*maximum_delay=*/std::chrono::seconds(45),
                  /*scaling=*/2.0)
                  .clone());
  auto connection =
      google::cloud::datamigration_v1::MakeDataMigrationServiceConnection(
          options);
  // c1 and c2 share the same retry policies
  auto c1 =
      google::cloud::datamigration_v1::DataMigrationServiceClient(connection);
  auto c2 =
      google::cloud::datamigration_v1::DataMigrationServiceClient(connection);
  // You can override any of the policies in a new client. This new client
  // will share the policies from c1 (or c2) *except* for the retry policy.
  auto c3 = google::cloud::datamigration_v1::DataMigrationServiceClient(
      connection, google::cloud::Options{}
                      .set<google::cloud::datamigration_v1::
                               DataMigrationServiceRetryPolicyOption>(
                          google::cloud::datamigration_v1::
                              DataMigrationServiceLimitedTimeRetryPolicy(
                                  std::chrono::minutes(5))
                                  .clone()));
  // You can also override the policies in a single call:
  // c3.SomeRpc(..., google::cloud::Options{}
  //     .set<google::cloud::datamigration_v1::DataMigrationServiceRetryPolicyOption>(
  //       google::cloud::datamigration_v1::DataMigrationServiceLimitedErrorCountRetryPolicy(10).clone()));
This assumes you have created a custom idempotency policy. Such as:
class CustomIdempotencyPolicy
    : public google::cloud::datamigration_v1::
          DataMigrationServiceConnectionIdempotencyPolicy {
 public:
  ~CustomIdempotencyPolicy() override = default;
  std::unique_ptr<google::cloud::datamigration_v1::
                      DataMigrationServiceConnectionIdempotencyPolicy>
  clone() const override {
    return std::make_unique<CustomIdempotencyPolicy>(*this);
  }
  // Override inherited functions to define as needed.
};
This will override the polling policies for datamigration_v1::DataMigrationServiceClient
  // The polling policy controls how the client waits for long-running
  // operations. `GenericPollingPolicy<>` combines existing policies.
  // In this case, keep polling until the operation completes (with success
  // or error) or 45 minutes, whichever happens first. Initially pause for
  // 10 seconds between polling requests, increasing the pause by a factor
  // of 4 until it becomes 2 minutes.
  auto options =
      google::cloud::Options{}
          .set<google::cloud::datamigration_v1::
                   DataMigrationServicePollingPolicyOption>(
              google::cloud::GenericPollingPolicy<
                  google::cloud::datamigration_v1::
                      DataMigrationServiceRetryPolicyOption::Type,
                  google::cloud::datamigration_v1::
                      DataMigrationServiceBackoffPolicyOption::Type>(
                  google::cloud::datamigration_v1::
                      DataMigrationServiceLimitedTimeRetryPolicy(
                          /*maximum_duration=*/std::chrono::minutes(45))
                          .clone(),
                  google::cloud::ExponentialBackoffPolicy(
                      /*initial_delay=*/std::chrono::seconds(10),
                      /*maximum_delay=*/std::chrono::minutes(2),
                      /*scaling=*/4.0)
                      .clone())
                  .clone());
  auto connection =
      google::cloud::datamigration_v1::MakeDataMigrationServiceConnection(
          options);
  // c1 and c2 share the same polling policies.
  auto c1 =
      google::cloud::datamigration_v1::DataMigrationServiceClient(connection);
  auto c2 =
      google::cloud::datamigration_v1::DataMigrationServiceClient(connection);