PubsubUnboundedSink (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

Class PubsubUnboundedSink<T>

  • All Implemented Interfaces:
    HasDisplayData, Serializable

    public class PubsubUnboundedSink<T>
    extends PTransform<PCollection<T>,PDone>
    A PTransform which streams messages to Pubsub.
    • The underlying implementation is just a GroupByKey followed by a ParDo which publishes as a side effect. (In the future we want to design and switch to a custom UnboundedSink implementation so as to gain access to system watermark and end-of-pipeline cleanup.)
    • We try to send messages in batches while also limiting send latency.
    • No stats are logged. Rather some counters are used to keep track of elements and batches.
    • Though some background threads are used by the underlying netty system all actual Pubsub calls are blocking. We rely on the underlying runner to allow multiple DoFn instances to execute concurrently and hide latency.
    • A failed bundle will cause messages to be resent. Thus we rely on the Pubsub consumer to dedup messages.

    NOTE: This is not the implementation used when running on the Google Cloud Dataflow service.

    See Also:
    Serialized Form
    • Constructor Detail

      • PubsubUnboundedSink

        public PubsubUnboundedSink( pubsubFactory,
                                   ValueProvider<> topic,
                                   Coder<T> elementCoder,
                                   String timestampLabel,
                                   String idLabel,
                                   int numShards)
    • Method Detail

      • getTopic

        public getTopic()
      • getTopicProvider

        public ValueProvider<> getTopicProvider()
      • getTimestampLabel

        public String getTimestampLabel()
      • getIdLabel

        public String getIdLabel()
      • getElementCoder

        public Coder<T> getElementCoder()