PubsubIO.Read (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class PubsubIO.Read

    • Method Detail

      • named

        public static PubsubIO.Read.Bound<String> named(String name)
        Creates and returns a transform for reading from Cloud Pub/Sub with the specified transform name.
      • topic

        public static PubsubIO.Read.Bound<String> topic(String topic)
        Creates and returns a transform for reading from a Cloud Pub/Sub topic. Mutually exclusive with subscription(String).

        See PubsubIO.PubsubTopic.fromPath(String) for more details on the format of the topic string.

        Dataflow will start reading data published on this topic from the time the pipeline is started. Any data published on the topic before the pipeline is started will not be read by Dataflow.

      • timestampLabel

        public static PubsubIO.Read.Bound<String> timestampLabel(String timestampLabel)
        Creates and returns a transform reading from Cloud Pub/Sub where record timestamps are expected to be provided as Pub/Sub message attributes. The timestampLabel parameter specifies the name of the attribute that contains the timestamp.

        The timestamp value is expected to be represented in the attribute as either:

        • a numerical value representing the number of milliseconds since the Unix epoch. For example, if using the Joda time classes, Instant.getMillis() returns the correct value for this attribute.
        • a String in RFC 3339 format. For example, 2015-10-29T23:41:41.123Z. The sub-second component of the timestamp is optional, and digits beyond the first three (i.e., time units smaller than milliseconds) will be ignored.

        If timestampLabel is not provided, the system will generate record timestamps the first time it sees each record. All windowing will be done relative to these timestamps.

        By default, windows are emitted based on an estimate of when this source is likely done producing data for a given timestamp (referred to as the Watermark; see AfterWatermark for more details). Any late data will be handled by the trigger specified with the windowing strategy – by default it will be output immediately.

        Note that the system can guarantee that no late data will ever be seen when it assigns timestamps by arrival time (i.e. timestampLabel is not provided).

        See Also:
        RFC 3339
      • idLabel

        public static PubsubIO.Read.Bound<String> idLabel(String idLabel)
        Creates and returns a transform for reading from Cloud Pub/Sub where unique record identifiers are expected to be provided as Pub/Sub message attributes. The idLabel parameter specifies the attribute name. The value of the attribute can be any string that uniquely identifies this record.

        If idLabel is not provided, Dataflow cannot guarantee that no duplicate data will be delivered on the Pub/Sub stream. In this case, deduplication of the stream will be strictly best effort.

      • withCoder

        public static <T> PubsubIO.Read.Bound<T> withCoder(Coder<T> coder)
        Creates and returns a transform for reading from Cloud Pub/Sub that uses the given Coder to decode Pub/Sub messages into a value of type T.

        By default, uses StringUtf8Coder, which just returns the text lines as Java strings.

        Type Parameters:
        T - the type of the decoded elements, and the elements of the resulting PCollection.
      • maxNumRecords

        public static PubsubIO.Read.Bound<String> maxNumRecords(int maxNumRecords)
        Creates and returns a transform for reading from Cloud Pub/Sub with a maximum number of records that will be read. The transform produces a bounded PCollection.

        Either this option or maxReadTime(Duration) must be set in order to create a bounded source.

      • maxReadTime

        public static PubsubIO.Read.Bound<String> maxReadTime(Duration maxReadTime)
        Creates and returns a transform for reading from Cloud Pub/Sub with a maximum number of duration during which records will be read. The transform produces a bounded PCollection.

        Either this option or maxNumRecords(int) must be set in order to create a bounded source.


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow