Google Cloud Dataflow SDK for Java, version 1.9.1
Class PubsubIO.Read
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.io.PubsubIO.Read
-
- Enclosing class:
- PubsubIO
public static class PubsubIO.Read extends Object
APTransform
that continuously reads from a Cloud Pub/Sub stream and returns aPCollection
ofStrings
containing the items from the stream.When running with a
PipelineRunner
that only supports boundedPCollections
(such asDirectPipelineRunner
), only a bounded portion of the input Pub/Sub stream can be processed. As such, eitherPubsubIO.Read.Bound.maxNumRecords(int)
orPubsubIO.Read.Bound.maxReadTime(Duration)
must be set.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
PubsubIO.Read.Bound<T>
APTransform
that reads from a Cloud Pub/Sub source and returns a unboundedPCollection
containing the items from the stream.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method and Description static PubsubIO.Read.Bound<String>
idLabel(String idLabel)
Creates and returns a transform for reading from Cloud Pub/Sub where unique record identifiers are expected to be provided as Pub/Sub message attributes.static PubsubIO.Read.Bound<String>
maxNumRecords(int maxNumRecords)
Creates and returns a transform for reading from Cloud Pub/Sub with a maximum number of records that will be read.static PubsubIO.Read.Bound<String>
maxReadTime(Duration maxReadTime)
Creates and returns a transform for reading from Cloud Pub/Sub with a maximum number of duration during which records will be read.static PubsubIO.Read.Bound<String>
named(String name)
Creates and returns a transform for reading from Cloud Pub/Sub with the specified transform name.static PubsubIO.Read.Bound<String>
subscription(String subscription)
Creates and returns a transform for reading from a specific Cloud Pub/Sub subscription.static PubsubIO.Read.Bound<String>
subscription(ValueProvider<String> subscription)
Liketopic()
but with aValueProvider
.static PubsubIO.Read.Bound<String>
timestampLabel(String timestampLabel)
Creates and returns a transform reading from Cloud Pub/Sub where record timestamps are expected to be provided as Pub/Sub message attributes.static PubsubIO.Read.Bound<String>
topic(String topic)
Creates and returns a transform for reading from a Cloud Pub/Sub topic.static PubsubIO.Read.Bound<String>
topic(ValueProvider<String> topic)
Liketopic()
but with aValueProvider
.static <T> PubsubIO.Read.Bound<T>
withCoder(Coder<T> coder)
Creates and returns a transform for reading from Cloud Pub/Sub that uses the givenCoder
to decode Pub/Sub messages into a value of typeT
.
-
-
-
Method Detail
-
named
public static PubsubIO.Read.Bound<String> named(String name)
Creates and returns a transform for reading from Cloud Pub/Sub with the specified transform name.
-
topic
public static PubsubIO.Read.Bound<String> topic(ValueProvider<String> topic)
Liketopic()
but with aValueProvider
.
-
topic
public static PubsubIO.Read.Bound<String> topic(String topic)
Creates and returns a transform for reading from a Cloud Pub/Sub topic. Mutually exclusive withsubscription(String)
.See
PubsubIO.PubsubTopic.fromPath(String)
for more details on the format of thetopic
string.Dataflow will start reading data published on this topic from the time the pipeline is started. Any data published on the topic before the pipeline is started will not be read by Dataflow.
-
subscription
public static PubsubIO.Read.Bound<String> subscription(String subscription)
Creates and returns a transform for reading from a specific Cloud Pub/Sub subscription. Mutually exclusive withtopic(String)
.See
PubsubIO.PubsubSubscription.fromPath(String)
for more details on the format of thesubscription
string.
-
subscription
public static PubsubIO.Read.Bound<String> subscription(ValueProvider<String> subscription)
Liketopic()
but with aValueProvider
.
-
timestampLabel
public static PubsubIO.Read.Bound<String> timestampLabel(String timestampLabel)
Creates and returns a transform reading from Cloud Pub/Sub where record timestamps are expected to be provided as Pub/Sub message attributes. ThetimestampLabel
parameter specifies the name of the attribute that contains the timestamp.The timestamp value is expected to be represented in the attribute as either:
- a numerical value representing the number of milliseconds since the Unix epoch. For
example, if using the Joda time classes,
Instant.getMillis()
returns the correct value for this attribute. - a String in RFC 3339 format. For example,
2015-10-29T23:41:41.123Z
. The sub-second component of the timestamp is optional, and digits beyond the first three (i.e., time units smaller than milliseconds) will be ignored.
If
timestampLabel
is not provided, the system will generate record timestamps the first time it sees each record. All windowing will be done relative to these timestamps.By default, windows are emitted based on an estimate of when this source is likely done producing data for a given timestamp (referred to as the Watermark; see
AfterWatermark
for more details). Any late data will be handled by the trigger specified with the windowing strategy – by default it will be output immediately.Note that the system can guarantee that no late data will ever be seen when it assigns timestamps by arrival time (i.e.
timestampLabel
is not provided).- See Also:
- RFC 3339
- a numerical value representing the number of milliseconds since the Unix epoch. For
example, if using the Joda time classes,
-
idLabel
public static PubsubIO.Read.Bound<String> idLabel(String idLabel)
Creates and returns a transform for reading from Cloud Pub/Sub where unique record identifiers are expected to be provided as Pub/Sub message attributes. TheidLabel
parameter specifies the attribute name. The value of the attribute can be any string that uniquely identifies this record.If
idLabel
is not provided, Dataflow cannot guarantee that no duplicate data will be delivered on the Pub/Sub stream. In this case, deduplication of the stream will be strictly best effort.
-
withCoder
public static <T> PubsubIO.Read.Bound<T> withCoder(Coder<T> coder)
Creates and returns a transform for reading from Cloud Pub/Sub that uses the givenCoder
to decode Pub/Sub messages into a value of typeT
.By default, uses
StringUtf8Coder
, which just returns the text lines as Java strings.- Type Parameters:
T
- the type of the decoded elements, and the elements of the resulting PCollection.
-
maxNumRecords
public static PubsubIO.Read.Bound<String> maxNumRecords(int maxNumRecords)
Creates and returns a transform for reading from Cloud Pub/Sub with a maximum number of records that will be read. The transform produces a boundedPCollection
.Either this option or
maxReadTime(Duration)
must be set in order to create a bounded source.
-
maxReadTime
public static PubsubIO.Read.Bound<String> maxReadTime(Duration maxReadTime)
Creates and returns a transform for reading from Cloud Pub/Sub with a maximum number of duration during which records will be read. The transform produces a boundedPCollection
.Either this option or
maxNumRecords(int)
must be set in order to create a bounded source.
-
-