DoFnWithContext (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.transforms

Class DoFnWithContext<InputT,OutputT>

  • java.lang.Object
    • com.google.cloud.dataflow.sdk.transforms.DoFnWithContext<InputT,OutputT>
  • Type Parameters:
    InputT - the type of the (main) input elements
    OutputT - the type of the (main) output elements
    All Implemented Interfaces:
    HasDisplayData, Serializable


    @Experimental
    public abstract class DoFnWithContext<InputT,OutputT>
    extends Object
    implements Serializable, HasDisplayData
    The argument to ParDo providing the code to use to process elements of the input PCollection.

    See ParDo for more explanation, examples of use, and discussion of constraints on DoFnWithContexts, including their serializability, lack of access to global shared mutable state, requirements for failure tolerance, and benefits of optimization.

    DoFnWithContexts can be tested in a particular Pipeline by running that Pipeline on sample input and then checking its output. Unit testing of a DoFnWithContext, separately from any ParDo transform or Pipeline, can be done via the DoFnTester harness.

    Implementations must define a method annotated with DoFnWithContext.ProcessElement that satisfies the requirements described there. See the DoFnWithContext.ProcessElement for details.

    This functionality is experimental and likely to change.

    Example usage:

     {@code
     PCollection lines = ... ;
     PCollection words =
         lines.apply(ParDo.of(new DoFnWithContext() {
    See Also:
    Serialized Form
    • Constructor Detail

      • DoFnWithContext

        public DoFnWithContext()
    • Method Detail

      • getAllowedTimestampSkew

        public Duration getAllowedTimestampSkew()
        Returns the allowed timestamp skew duration, which is the maximum duration that timestamps can be shifted backward in DoFnWithContext.Context.outputWithTimestamp(OutputT, org.joda.time.Instant).

        The default value is Duration.ZERO, in which case timestamps can only be shifted forward to future. For infinite skew, return Duration.millis(Long.MAX_VALUE).

      • getOutputTypeDescriptor

        protected TypeDescriptor<OutputT> getOutputTypeDescriptor()
        Returns a TypeDescriptor capturing what is known statically about the output type of this DoFnWithContext instance's most-derived class.

        In the normal case of a concrete DoFnWithContext subclass with no generic type parameters of its own (including anonymous inner classes), this will be a complete non-generic type, which is good for choosing a default output Coder<O> for the output PCollection<O>.

      • createAggregator

        public final <AggInputT,AggOutputT> Aggregator<AggInputT,AggOutputT> createAggregator(String name,
                                                                                              Combine.CombineFn<? super AggInputT,?,AggOutputT> combiner)
        Returns an Aggregator with aggregation logic specified by the Combine.CombineFn argument. The name provided must be unique across Aggregators created within the DoFn. Aggregators can only be created during pipeline construction.
        Parameters:
        name - the name of the aggregator
        combiner - the Combine.CombineFn to use in the aggregator
        Returns:
        an aggregator for the provided name and combiner in the scope of this DoFn
        Throws:
        NullPointerException - if the name or combiner is null
        IllegalArgumentException - if the given name collides with another aggregator in this scope
        IllegalStateException - if called during pipeline execution.
      • populateDisplayData

        public void populateDisplayData(DisplayData.Builder builder)
        Register display data for the given transform or component.

        populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

        By default, does not register any display data. Implementors may override this method to provide their own display data.

        Specified by:
        populateDisplayData in interface HasDisplayData
        Parameters:
        builder - The builder to populate with display data.
        See Also:
        HasDisplayData


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow