Google Cloud Dataflow SDK for Java, version 1.9.1
Class DoFnWithContext<InputT,OutputT>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.DoFnWithContext<InputT,OutputT>
-
- Type Parameters:
InputT
- the type of the (main) input elementsOutputT
- the type of the (main) output elements
- All Implemented Interfaces:
- HasDisplayData, Serializable
@Experimental public abstract class DoFnWithContext<InputT,OutputT> extends Object implements Serializable, HasDisplayData
The argument toParDo
providing the code to use to process elements of the inputPCollection
.See
ParDo
for more explanation, examples of use, and discussion of constraints onDoFnWithContext
s, including their serializability, lack of access to global shared mutable state, requirements for failure tolerance, and benefits of optimization.DoFnWithContext
s can be tested in a particularPipeline
by running thatPipeline
on sample input and then checking its output. Unit testing of aDoFnWithContext
, separately from anyParDo
transform orPipeline
, can be done via theDoFnTester
harness.Implementations must define a method annotated with
DoFnWithContext.ProcessElement
that satisfies the requirements described there. See theDoFnWithContext.ProcessElement
for details.This functionality is experimental and likely to change.
Example usage:
{@code PCollection
lines = ... ; PCollection words = lines.apply(ParDo.of(new DoFnWithContext () { - See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description class
DoFnWithContext.Context
Information accessible to all methods in thisDoFnWithContext
.static interface
DoFnWithContext.ExtraContextFactory<InputT,OutputT>
Interface for runner implementors to provide implementations of extra context information.static interface
DoFnWithContext.FinishBundle
Annotation for the method to use to prepare an instance for processing a batch of elements.class
DoFnWithContext.ProcessContext
Information accessible when runningDoFn.processElement(com.google.cloud.dataflow.sdk.transforms.DoFn<InputT, OutputT>.ProcessContext)
.static interface
DoFnWithContext.ProcessElement
Annotation for the method to use for processing elements.static interface
DoFnWithContext.StartBundle
Annotation for the method to use to prepare an instance for processing a batch of elements.
-
Constructor Summary
Constructors Constructor and Description DoFnWithContext()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method and Description <AggInputT,AggOutputT>
Aggregator<AggInputT,AggOutputT>createAggregator(String name, Combine.CombineFn<? super AggInputT,?,AggOutputT> combiner)
Returns anAggregator
with aggregation logic specified by theCombine.CombineFn
argument.<AggInputT>
Aggregator<AggInputT,AggInputT>createAggregator(String name, SerializableFunction<Iterable<AggInputT>,AggInputT> combiner)
Returns anAggregator
with the aggregation logic specified by theSerializableFunction
argument.Duration
getAllowedTimestampSkew()
Returns the allowed timestamp skew duration, which is the maximum duration that timestamps can be shifted backward inDoFnWithContext.Context.outputWithTimestamp(OutputT, org.joda.time.Instant)
.protected TypeDescriptor<InputT>
getInputTypeDescriptor()
Returns aTypeDescriptor
capturing what is known statically about the input type of thisDoFnWithContext
instance's most-derived class.protected TypeDescriptor<OutputT>
getOutputTypeDescriptor()
Returns aTypeDescriptor
capturing what is known statically about the output type of thisDoFnWithContext
instance's most-derived class.void
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
-
-
-
Method Detail
-
getAllowedTimestampSkew
public Duration getAllowedTimestampSkew()
Returns the allowed timestamp skew duration, which is the maximum duration that timestamps can be shifted backward inDoFnWithContext.Context.outputWithTimestamp(OutputT, org.joda.time.Instant)
.The default value is
Duration.ZERO
, in which case timestamps can only be shifted forward to future. For infinite skew, returnDuration.millis(Long.MAX_VALUE)
.
-
getInputTypeDescriptor
protected TypeDescriptor<InputT> getInputTypeDescriptor()
Returns aTypeDescriptor
capturing what is known statically about the input type of thisDoFnWithContext
instance's most-derived class.See
getOutputTypeDescriptor()
for more discussion.
-
getOutputTypeDescriptor
protected TypeDescriptor<OutputT> getOutputTypeDescriptor()
Returns aTypeDescriptor
capturing what is known statically about the output type of thisDoFnWithContext
instance's most-derived class.In the normal case of a concrete
DoFnWithContext
subclass with no generic type parameters of its own (including anonymous inner classes), this will be a complete non-generic type, which is good for choosing a default outputCoder<O>
for the outputPCollection<O>
.
-
createAggregator
public final <AggInputT,AggOutputT> Aggregator<AggInputT,AggOutputT> createAggregator(String name, Combine.CombineFn<? super AggInputT,?,AggOutputT> combiner)
Returns anAggregator
with aggregation logic specified by theCombine.CombineFn
argument. The name provided must be unique acrossAggregator
s created within the DoFn. Aggregators can only be created during pipeline construction.- Parameters:
name
- the name of the aggregatorcombiner
- theCombine.CombineFn
to use in the aggregator- Returns:
- an aggregator for the provided name and combiner in the scope of this DoFn
- Throws:
NullPointerException
- if the name or combiner is nullIllegalArgumentException
- if the given name collides with another aggregator in this scopeIllegalStateException
- if called during pipeline execution.
-
createAggregator
public final <AggInputT> Aggregator<AggInputT,AggInputT> createAggregator(String name, SerializableFunction<Iterable<AggInputT>,AggInputT> combiner)
Returns anAggregator
with the aggregation logic specified by theSerializableFunction
argument. The name provided must be unique acrossAggregator
s created within the DoFn. Aggregators can only be created during pipeline construction.- Parameters:
name
- the name of the aggregatorcombiner
- theSerializableFunction
to use in the aggregator- Returns:
- an aggregator for the provided name and combiner in the scope of this DoFn
- Throws:
NullPointerException
- if the name or combiner is nullIllegalArgumentException
- if the given name collides with another aggregator in this scopeIllegalStateException
- if called during pipeline execution.
-
populateDisplayData
public void populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Parameters:
builder
- The builder to populate with display data.- See Also:
HasDisplayData
-
-