Replaying and discarding messages

The Cloud Pub/Sub subscriber data APIs, such as pull, provide limited access to message data. Normally, acknowledged messages are inaccessible to subscribers of a given subscription. In addition, subscriber clients must process every message in a subscription even if only a subset are needed.

The Seek feature extends subscriber functionality by allowing you to to alter the acknowledgement state of messages in bulk. For example, you can replay previously acknowledged messages or discard messages in bulk. In addition, you can copy the state of one subscription to another by using seek in combination with a Snapshot, introduced as part of the seek feature. Note that recovering acknowledged messages generally requires the source subscription to be configured in advance and will result in additional storage fees.

These features are described below. However, you can look at the quickstart for a working example.

How does it work?

Seeking to a timestamp

Seeking to a time has the effect of marking every message received by Cloud Pub/Sub before the time as acknowledged, and all messages received after the time as unacknowledged. You can seek to a time in the future to discard messages. To replay and reprocess previously acknowledged messages, seek to a prior time. The message publication time is generated by the Cloud Pub/Sub servers (see publishTime in the API reference). Note that this approach is imprecise due to possible clock skew among Cloud Pub/Sub servers, as well as the fact that Cloud Pub/Sub has to work with the arrival time of the publish request rather than when an event occurred in the source system.

To seek to a prior time, you must first configure your subscription to retain acknowledged messages:

Seeking to a snapshot

The snapshot feature allows you to capture the message acknowledgment state of a subscription. Once a snapshot is created, it retains all messages that were unacknowledged in the source subscription (at the time of the snapshot's creation), as well as any messages published to the topic thereafter. You can replay these unacknowledged messages by using a snapshot to seek to any of the topic's subscriptions.

Unlike with seeking to a time, you don't need to perform any special subscription configuration to seek to a snapshot—you just need to create the snapshot ahead of time. For example, you might create a snapshot when deploying new subscriber code, in case you need to recover from unexpected or erroneous acknowledgements.

Snapshots expire and are deleted in the following cases (whichever comes first):

  • The snapshot reaches a lifespan of seven days.
  • The oldest unacknowledged message in the snapshot exceeds the message retention duration.

For example, consider a snapshot of a subscription with a backlog where the oldest unacknowledged message is a day old. The snapshot will expire after six days, rather than seven. This is necessary for snapshots to offer strong at-least-once delivery guarantees.

Eventual consistency

Seek operations are strictly consistent with respect to message delivery guarantees. This means that any message that is to become unacknowledged based on the seek condition is guaranteed to be eventually delivered at least once after the seek operation succeeds. This does not mean, however, that delivered messages instantly become consistent with the seek operation. So a message that was published before the seek timestamp or that is acknowledged in a snapshot may still be delivered after the seek operation. In a sense, message delivery operates as an eventually consistent system with respect to the seek operation: it might take as long as a minute for the operation to take full effect.

Use cases

  • Update subscriber code safely. A concern with deploying new subscriber code is that the new executable may erroneously acknowledge messages, leading to message loss. Incorporating snapshots into your deployment process gives you a way to recover from bugs in new subscriber code.
  • Recover from unexpected subscriber problems. In cases where subscriber problems are not associated with a specific deployment event, you might not have a relevant snapshot. In this case, if you have enabled acknowledged message retention for a subscription, seeking to a past time gives you a way to recover from the error.
  • Save processing time and cost. Perform a bulk acknowledgement on a large backlog of messages that are no longer relevant.
  • Test subscriber code on known data. When testing subscriber code for performance and consistency, it is useful to use the same data in every run. Snapshots enable this with strong semantics, as well as the fact that they may be applied to any subscription on a given topic, including a newly created one.

What's next

You can use Cloud Pub/Sub with Cloud Dataflow. However, we do not recommend direct access to Cloud Pub/Sub Seek from within a running Cloud Dataflow pipeline. For the recommended workflow, see Using Cloud Pub/Sub with Cloud Dataflow.

Bu sayfayı yararlı buldunuz mu? Lütfen görüşünüzü bildirin:

Şunun hakkında geri bildirim gönderin...

Cloud Pub/Sub Documentation