Cloud Pub/Sub Seek allows users to replay and reprocess previously acknowledged messages or to acknowledge messages in bulk. However, we do not recommend direct access to Cloud Pub/Sub Seek from within a running Cloud Dataflow pipeline. Direct access invalidates Cloud Dataflow's watermark logic and does not work well with exactly-once processing. In addition, direct access conflicts with the state of a pipeline that incorporates processed data.
We recommend using Cloud Pub/Sub Seek with the following workflow:
- Make a snapshot of the subscription.
- Drain the subscription messages in Cloud Dataflow.
- Restart the pipeline.
Creating a snapshot
You seek to and redo processing from a subscription snapshot. To create this
snapshot using the
gcloud command-line tool, run the following
alias pubsub='gcloud pubsub' pubsub snapshots create my-snapshot --subscription=seek-demo-sub
To verify that you have created the snapshot, run the command:
pubsub snapshots list
Draining the subscription
To drain the subscription and restart the pipeline, follow these steps:
- Navigate to the Cloud Dataflow console and click on your streaming pipeline.
- In the Summary pane, click on Stop Job.
- Select Drain to allow for processing of the in-flight messages and wait until the job is terminated.
- Seek your subscription to the snapshot with the
pubsub subscriptions seek seek-demo-sub --snapshot=my-snapshot
- Restart your Cloud Dataflow pipeline.