DoFn (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

Class DoFn<InputT,OutputT>

  • java.lang.Object
    • Constructor Detail

      • DoFn

        public DoFn()
    • Method Detail

      • getAllowedTimestampSkew

        public Duration getAllowedTimestampSkew()
        Deprecated. does not interact well with the watermark.
        Returns the allowed timestamp skew duration, which is the maximum duration that timestamps can be shifted backward in DoFn.Context.outputWithTimestamp(OutputT, org.joda.time.Instant).

        The default value is Duration.ZERO, in which case timestamps can only be shifted forward to future. For infinite skew, return Duration.millis(Long.MAX_VALUE).

        Note that producing an element whose timestamp is less than the current timestamp may result in late data, i.e. returning a non-zero value here does not impact watermark calculations used for firing windows.

      • startBundle

        public void startBundle(DoFn.Context c)
                         throws Exception
        Prepares this DoFn instance for processing a batch of elements.

        By default, does nothing.

      • processElement

        public abstract void processElement(DoFn.ProcessContext c)
                                     throws Exception
        Processes one input element.

        The current element of the input PCollection is returned by c.element(). It should be considered immutable. The Dataflow runtime will not mutate the element, so it is safe to cache, etc. The element should not be mutated by any of the DoFn methods, because it may be cached elsewhere, retained by the Dataflow runtime, or used in other unspecified ways.

        A value is added to the main output PCollection by DoFn.Context.output(OutputT). Once passed to output the element should be considered immutable and not be modified in any way. It may be cached elsewhere, retained by the Dataflow runtime, or used in other unspecified ways.

        See Also:
      • finishBundle

        public void finishBundle(DoFn.Context c)
                          throws Exception
        Finishes processing this batch of elements.

        By default, does nothing.

      • populateDisplayData

        public void populateDisplayData(DisplayData.Builder builder)
        Register display data for the given transform or component.

        populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

        By default, does not register any display data. Implementors may override this method to provide their own display data.

        Specified by:
        populateDisplayData in interface HasDisplayData
        builder - The builder to populate with display data.
        See Also:
      • getOutputTypeDescriptor

        protected TypeDescriptor<OutputT> getOutputTypeDescriptor()
        Returns a TypeDescriptor capturing what is known statically about the output type of this DoFn instance's most-derived class.

        In the normal case of a concrete DoFn subclass with no generic type parameters of its own (including anonymous inner classes), this will be a complete non-generic type, which is good for choosing a default output Coder<OutputT> for the output PCollection<OutputT>.

      • createAggregator

        protected final <AggInputT,AggOutputT> Aggregator<AggInputT,AggOutputT> createAggregator(String name,
                                                                                                 Combine.CombineFn<? super AggInputT,?,AggOutputT> combiner)
        Returns an Aggregator with aggregation logic specified by the Combine.CombineFn argument. The name provided must be unique across Aggregators created within the DoFn. Aggregators can only be created during pipeline construction.
        name - the name of the aggregator
        combiner - the Combine.CombineFn to use in the aggregator
        an aggregator for the provided name and combiner in the scope of this DoFn
        NullPointerException - if the name or combiner is null
        IllegalArgumentException - if the given name collides with another aggregator in this scope
        IllegalStateException - if called during pipeline processing.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow