Pub/Sub Seek lets users replay and reprocess previously acknowledged messages or to acknowledge messages in bulk. However, we do not recommend direct access to Pub/Sub Seek from within a running Dataflow pipeline. Direct access invalidates Dataflow's watermark logic and does not work well with exactly-once processing. In addition, direct access conflicts with the state of a pipeline that incorporates processed data.
We recommend using Pub/Sub Seek with the following workflow:
- Make a snapshot of the subscription.
- Drain the subscription messages in Dataflow.
- Resubmit the pipeline.
Creating a snapshot
You seek to and redo processing from a subscription snapshot. To create this
snapshot using the gcloud
command-line tool, run the following
commands:
alias pubsub='gcloud pubsub' pubsub snapshots create my-snapshot --subscription=seek-demo-sub
To verify that you have created the snapshot, run the command:
pubsub snapshots list
Draining the subscription
To drain the subscription and resubmit the pipeline, follow these steps:
- Navigate to the Dataflow console and click your streaming pipeline.
- In the Summary pane, click Stop Job.
- Select Drain to allow for processing of the in-flight messages and wait until the job is terminated.
- Seek your subscription to the snapshot with the
gcloud
command-line tool:pubsub subscriptions seek seek-demo-sub --snapshot=my-snapshot
- Resubmit your Dataflow pipeline.