Google Cloud Dataflow SDK for Java, version 1.9.1
Class PubsubUnboundedSink<T>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.PTransform<PCollection<T>,PDone>
-
- com.google.cloud.dataflow.sdk.io.PubsubUnboundedSink<T>
-
- All Implemented Interfaces:
- HasDisplayData, Serializable
public class PubsubUnboundedSink<T> extends PTransform<PCollection<T>,PDone>
A PTransform which streams messages to Pubsub.- The underlying implementation is just a
GroupByKey
followed by aParDo
which publishes as a side effect. (In the future we want to design and switch to a customUnboundedSink
implementation so as to gain access to system watermark and end-of-pipeline cleanup.) - We try to send messages in batches while also limiting send latency.
- No stats are logged. Rather some counters are used to keep track of elements and batches.
- Though some background threads are used by the underlying netty system all actual Pubsub
calls are blocking. We rely on the underlying runner to allow multiple
DoFn
instances to execute concurrently and hide latency. - A failed bundle will cause messages to be resent. Thus we rely on the Pubsub consumer to dedup messages.
NOTE: This is not the implementation used when running on the Google Cloud Dataflow service.
- See Also:
- Serialized Form
-
-
Field Summary
-
Fields inherited from class com.google.cloud.dataflow.sdk.transforms.PTransform
name
-
-
Constructor Summary
Constructors Constructor and Description PubsubUnboundedSink(com.google.cloud.dataflow.sdk.util.PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<com.google.cloud.dataflow.sdk.util.PubsubClient.TopicPath> topic, Coder<T> elementCoder, String timestampLabel, String idLabel, int numShards)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method and Description PDone
apply(PCollection<T> input)
Applies thisPTransform
on the givenInputT
, and returns itsOutput
.Coder<T>
getElementCoder()
String
getIdLabel()
String
getTimestampLabel()
com.google.cloud.dataflow.sdk.util.PubsubClient.TopicPath
getTopic()
ValueProvider<com.google.cloud.dataflow.sdk.util.PubsubClient.TopicPath>
getTopicProvider()
-
Methods inherited from class com.google.cloud.dataflow.sdk.transforms.PTransform
getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate
-
-
-
-
Constructor Detail
-
PubsubUnboundedSink
public PubsubUnboundedSink(com.google.cloud.dataflow.sdk.util.PubsubClient.PubsubClientFactory pubsubFactory, ValueProvider<com.google.cloud.dataflow.sdk.util.PubsubClient.TopicPath> topic, Coder<T> elementCoder, String timestampLabel, String idLabel, int numShards)
-
-
Method Detail
-
getTopic
public com.google.cloud.dataflow.sdk.util.PubsubClient.TopicPath getTopic()
-
getTopicProvider
public ValueProvider<com.google.cloud.dataflow.sdk.util.PubsubClient.TopicPath> getTopicProvider()
-
getTimestampLabel
@Nullable public String getTimestampLabel()
-
getIdLabel
@Nullable public String getIdLabel()
-
apply
public PDone apply(PCollection<T> input)
Description copied from class:PTransform
Applies thisPTransform
on the givenInputT
, and returns itsOutput
.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
The default implementation throws an exception. A derived class must either implement apply, or else each runner must supply a custom implementation via
PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<InputT, OutputT>, InputT)
.- Overrides:
apply
in classPTransform<PCollection<T>,PDone>
-
-