Retrying Background Functions

This document describes how to enable retrying for background functions.

Semantics of retry

By default, without retries enabled, the semantics of executing a background function are "best effort." This means that while the goal is to execute the function exactly once, this is not guaranteed.

When you enable retries on a background function, however, the semantics change to "at-least-once" delivery (with the caveat that a function that keeps failing for days may eventually time out).

Why background functions fail to complete

On rare occasions, a function might exit prematurely due to an internal error, and by default the function might or might not be automatically retried.

More typically, a background function may fail to successfully complete due to errors thrown in the function code itself. Some of the reasons this might happen are as follows:

  • The function contains a bug and the runtime throws an exception.
  • The function cannot reach a service endpoint, or times out while trying to reach the endpoint.
  • The function intentionally throws an exception (for example, when a parameter fails validation).
  • When functions written in Node.js return a rejected promise or pass a non-null value to a callback.

In any of the above cases, the function stops executing by default and the event is discarded. If you want to retry the function when an error occurs, you can change the default retry policy by setting the "retry on failure" property. This causes the event to be retried repeatedly for up to multiple days until the function successfully completes.

Enabling and disabling retries

To enable or disable retries, you can either use the gcloud command-line tool or the GCP Console. By default, retries are disabled.

Using the gcloud command-line tool

To enable retries via the gcloud command-line tool, include the --retry flag when deploying your function:

gcloud functions deploy FUNCTION_NAME --retry FLAGS...

To disable retries, re-deploy the function without the --retry flag:

gcloud functions deploy FUNCTION_NAME FLAGS...

Using the GCP Console

You can enable or disable retries in the GCP Console as follows:

  1. Go to the Cloud Functions Overview page in the Cloud Platform Console.

  2. Click Create function. Alternatively, click an existing function to go to its details page and click Edit.

  3. Fill in the required fields for your function.

  4. Ensure the Trigger field is set to a background function trigger type, such as Cloud Pub/Sub or Cloud Storage.

  5. Expand the advanced settings by clicking More.

  6. Check or uncheck the box labeled Retry on failure.

Best practices

This section describes best practices for using retries.

Use retry to handle transient errors

Because your function is retried continuously until successful execution, permanent errors like bugs should be eliminated from your code thorough testing before enabling retries. Retries are best used to handle intermittent/transient failures that have a high likelihood of resolution upon retrying, such as a flaky service endpoint or timeout.

Set an end condition to avoid infinite retry loops

It is best practice to protect your function against continuous looping when using retries. You can do this by including a well-defined end condition, before the function begins processing. A simple yet effective approach is to discard events with timestamps older than a certain time. This helps to avoid excessive executions when failures are either persistent or longer-lived than expected.

For example, this code snippet discards all events older than 10 seconds:

Node.js 6

/**
 * Background Cloud Function that only executes within
 * a certain time period after the triggering event
 *
 * @param {object} event The Cloud Functions event.
 * @param {function} callback The callback function.
 */
exports.avoidInfiniteRetries = (event, callback) => {
  const eventAge = Date.now() - Date.parse(event.timestamp);
  const eventMaxAge = 10000;

  // Ignore events that are too old
  if (eventAge > eventMaxAge) {
    console.log(`Dropping event ${event} with age ${eventAge} ms.`);
    callback();
    return;
  }

  // Do what the function is supposed to do
  console.log(`Processing event ${event} with age ${eventAge} ms.`);
  callback();
};

Node.js 8 (Beta)

/**
 * Background Cloud Function that only executes within a certain time
 * period after the triggering event to avoid infinite retry loops.
 *
 * @param {object} data The event payload.
 * @param {object} context The event metadata.
 */
exports.avoidInfiniteRetries = (data, context) => {
  const eventAge = Date.now() - Date.parse(context.timestamp);
  const eventMaxAge = 10000;

  // Ignore events that are too old
  if (eventAge > eventMaxAge) {
    console.log(`Dropping event ${context.eventId} with age ${eventAge} ms.`);
    return;
  }

  // Do what the function is supposed to do
  console.log(`Processing event ${context.eventId} with age ${eventAge} ms.`);
};

Python (Beta)

from datetime import datetime, timezone
# The 'python-dateutil' package must be included in requirements.txt.
from dateutil import parser

def avoid_infinite_retries(data, context):
    """Background Cloud Function that only executes within a certain
    time period after the triggering event.

    Args:
        data (dict): The event payload.
        context (google.cloud.functions.Context): The event metadata.
    Returns:
        None; output is written to Stackdriver Logging
    """

    timestamp = context.timestamp

    event_time = parser.parse(timestamp)
    event_age = (datetime.now(timezone.utc) - event_time).total_seconds()
    event_age_ms = event_age * 1000

    # Ignore events that are too old
    max_age_ms = 10000
    if event_age_ms > max_age_ms:
        print('Dropped {} (age {}ms)'.format(context.event_id, event_age_ms))
        return 'Timeout'

    # Do what the function is supposed to do
    print('Processed {} (age {}ms)'.format(context.event_id, event_age_ms))
    return

Distinguish between retriable and fatal errors

If your function has retries enabled, any unhandled error will trigger a retry. Make sure that your code captures any errors that should not result in a retry.

Node.js 6

/**
 * Background Cloud Function that demonstrates
 * how to toggle retries using a promise
 *
 * @param {object} event The Cloud Functions event.
 * @param {object} event.data Data included with the event.
 * @param {object} event.data.retry Whether or not to retry the function.
 */
exports.retryPromise = (event) => {
  const tryAgain = !!event.data.retry;

  if (tryAgain) {
    throw new Error(`Retrying...`);
  } else {
    return Promise.reject(new Error('Not retrying...'));
  }
};

/**
 * Background Cloud Function that demonstrates
 * how to toggle retries using a callback
 *
 * @param {object} event The Cloud Functions event.
 * @param {object} event.data Data included with the event.
 * @param {object} event.data.retry Whether or not to retry the function.
 * @param {function} callback The callback function.
 */
exports.retryCallback = (event, callback) => {
  const tryAgain = !!event.data.retry;
  const err = new Error('Error!');

  if (tryAgain) {
    console.error('Retrying:', err);
    callback(err);
  } else {
    console.error('Not retrying:', err);
    callback();
  }
};

Node.js 8 (Beta)

/**
 * Background Cloud Function that demonstrates
 * how to toggle retries using a promise
 *
 * @param {object} data The event payload.
 * @param {object} context The event metadata.
 */
exports.retryPromise = (data, context) => {
  const tryAgain = !!data.retry;

  if (tryAgain) {
    throw new Error(`Retrying...`);
  } else {
    return new Error('Not retrying...');
  }
};

Python (Beta)

from google.cloud import error_reporting
error_client = error_reporting.Client()


def retry_or_not(data, context):
    """Background Cloud Function that demonstrates how to toggle retries.

    Args:
        data (dict): The event payload.
        context (google.cloud.functions.Context): The event metadata.
    Returns:
        None; output is written to Stackdriver Logging
    """

    if data.data.get('retry'):
        try_again = True
    else:
        try_again = False

    try:
        raise RuntimeError('I failed you')
    except RuntimeError:
        error_client.report_exception()
        if try_again:
            raise  # Raise the exception and try again
        else:
            pass   # Swallow the exception and don't retry

Make retryable background functions idempotent

Background functions that can retried must be idempotent. Here are some general guidelines for making a background function idempotent:

  • Many external APIs (such as Stripe) let you supply an idempotency key as a parameter. If you are using such an API, you should use the event ID as the idempotency key.
  • Idempotency works well with at-least-once delivery, because it makes it safe to retry. So a general best practice for writing reliable code is to combine idempotency with retries.
  • Make sure that your code is internally idempotent. For example:
    • Make sure that mutations can happen more than once without changing the outcome.
    • Query database state in a transaction before mutating the state.
    • Make sure that all side effects are themselves idempotent.
  • Impose a transactional check outside the function, independent of the code. For example, persist state somewhere recording that a given event ID has already been processed.
  • Deal with duplicate function calls out-of-band. For example, have a separate clean up process that cleans up after duplicate function calls.

Next steps

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Functions Documentation