Pub/Sub retry policy warning in Dataflow job

Problem

After update to Apache Beam 3.34.0, below warning is observed in Dataflow job log which pulls data from Pub/Sub subscription.

Pub/Sub subscription projects/<project_name>/subscriptions/<subscription_name> has a rety policy configured which will not work as expected when Dataflow pulls from subscription. See  https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsubfor further details.

Environment

  • Apache Beam 3.34.0
  • Dataflow
  • Pub/Sub

Solution

  1. Set retry policy on the subscription to Retry immediately clears the warning in the job.

Cause

This issue happens when the Retry policy on the subscription is set to Retry after exponential backoff delay. Pub/Sub dead-letter topics and exponential backoff delay retry policies are not fully supported by Dataflow due to the following reasons:

  • Dataflow does not NACK messages and retries message processing indefinitely, while continually extending the acknowledgment deadline for the message.
  • Dataflow may acknowledge messages before the pipeline fully processes the data. When reading from Pub/Sub, Dataflow ACK's messages once they are written to shuffle or sink.

More details on the reasons are provided in Dataflow documentation