The Cloud Pub/Sub subscriber data APIs, such as pull, provide limited access to message data. Normally, acknowledged messages are inaccessible to subscribers of a given subscription. In addition, subscriber clients must process every message in a subscription even if only a subset are needed.
The Seek feature extends subscriber functionality by allowing you to to alter the acknowledgement state of messages in bulk. For example, you can replay previously acknowledged messages or discard messages in bulk. In addition, you can copy the state of one subscription to another by using seek in combination with a Snapshot, introduced as part of the seek feature. Note that recovering acknowledged messages generally requires the source subscription to be configured in advance and will result in additional storage fees.
These features are described below. However, you can look at the quickstart for a working example.
How does it work?
Seeking to a timestamp
Seeking to a time has the effect of marking every message received by
Cloud Pub/Sub before the time as acknowledged, and all messages
received after the time as unacknowledged. You can seek to a time in the future
to discard messages. To replay and reprocess previously acknowledged messages,
seek to a prior time. The message publication time is generated by the
Cloud Pub/Sub servers (see
publishTime in the API
reference). Note that this approach is imprecise due to possible clock skew
among Cloud Pub/Sub servers, as well as the fact that
Cloud Pub/Sub has to work with the arrival time of the publish request
rather than when an event occurred in the source system.
To seek to a prior time, you must first configure your subscription to retain acknowledged messages:
- An acknowledged message is retained in a subscription only if the
retain_acked_messagesproperty is set to true (the default is false), for up to
message_retention_durationafter it is published (the default is 7 days). Acknowledged messages are retained only if they are acknowledged after the subscription’s
retain_acked_messagesis set to
- An unacknowledged message is retained in a subscription for up to
message_retention_durationafter it is published (the default is 7 days).
- Both the
message_retention_durationproperties of a subscription can be specified at subscription creation, or updated for an existing subscription.
Seeking to a snapshot
The snapshot feature allows you to capture the message acknowledgment state of a subscription. Once a snapshot is created, it retains all messages that were unacknowledged in the source subscription (at the time of the snapshot's creation), as well as any messages published to the topic thereafter. You can replay these unacknowledged messages by using a snapshot to seek to any of the topic's subscriptions.
Unlike with seeking to a time, you don't need to perform any special subscription configuration to seek to a snapshot—you just need to create the snapshot ahead of time. For example, you might create a snapshot when deploying new subscriber code, in case you need to recover from unexpected or erroneous acknowledgements.
Snapshots expire and are deleted in the following cases (whichever comes first):
- The snapshot reaches a lifespan of seven days.
- The oldest unacknowledged message in the snapshot exceeds the
message retention duration.
For example, consider a snapshot of a subscription with a backlog where the oldest unacknowledged message is a day old. The snapshot will expire after six days, rather than seven. This is necessary for snapshots to offer strong at-least-once delivery guarantees.
Seek operations are strictly consistent with respect to message delivery guarantees. This means that any message that is to become unacknowledged based on the seek condition is guaranteed to be eventually delivered at least once after the seek operation succeeds. This does not mean, however, that delivered messages instantly become consistent with the seek operation. So a message that was published before the seek timestamp or that is acknowledged in a snapshot may still be delivered after the seek operation. In a sense, message delivery operates as an eventually consistent system with respect to the seek operation: it might take as long as a minute for the operation to take full effect.
- Update subscriber code safely. A concern with deploying new subscriber code is that the new executable may erroneously acknowledge messages, leading to message loss. Incorporating snapshots into your deployment process gives you a way to recover from bugs in new subscriber code.
- Recover from unexpected subscriber problems. In cases where subscriber problems are not associated with a specific deployment event, you might not have a relevant snapshot. In this case, if you have enabled acknowledged message retention for a subscription, seeking to a past time gives you a way to recover from the error.
- Save processing time and cost. Perform a bulk acknowledgement on a large backlog of messages that are no longer relevant.
- Test subscriber code on known data. When testing subscriber code for performance and consistency, it is useful to use the same data in every run. Snapshots enable this with strong semantics, as well as the fact that they may be applied to any subscription on a given topic, including a newly created one.
You can use Cloud Pub/Sub with Cloud Dataflow. However, we do not recommend direct access to Cloud Pub/Sub Seek from within a running Cloud Dataflow pipeline. For the recommended workflow, see Using Cloud Pub/Sub with Cloud Dataflow.