Google Cloud Dataflow SDK for Java, version 1.9.1
Class DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.DoFn<KV<K,Iterable<InputT>>,KV<K,OutputT>>
-
- com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT>
-
- All Implemented Interfaces:
- HasDisplayData, Serializable
- Enclosing class:
- DirectPipelineRunner
public static class DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT> extends DoFn<KV<K,Iterable<InputT>>,KV<K,OutputT>>
The implementation may split theCombine.KeyedCombineFn
into ADD, MERGE and EXTRACT phases ( seecom.google.cloud.dataflow.sdk.runners.worker.CombineValuesFn
). In order to emulate this for theDirectPipelineRunner
and provide an experience closer to the service, go through heavy serializability checks for the equivalent of the results of the ADD phase, but after theGroupByKey
shuffle, and the MERGE phase. Doing these checks ensure that not only is the accumulator coder serializable, but the accumulator coder can actually serialize the data in question.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class com.google.cloud.dataflow.sdk.transforms.DoFn
DoFn.Context, DoFn.ProcessContext, DoFn.RequiresWindowAccess
-
-
Constructor Summary
Constructors Constructor and Description TestCombineDoFn(com.google.cloud.dataflow.sdk.util.PerKeyCombineFnRunner<? super K,? super InputT,AccumT,OutputT> fnRunner, Coder<AccumT> accumCoder, boolean testSerializability, Random rand)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method and Description static <K,AccumT,InputT>
List<AccumT>addInputsRandomly(com.google.cloud.dataflow.sdk.util.PerKeyCombineFnRunner<? super K,? super InputT,AccumT,?> fnRunner, K key, Iterable<InputT> values, Random random, DoFn.ProcessContext c)
Create a random list of accumulators from the given list of values.static <K,InputT,AccumT,OutputT>
DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT>create(Combine.GroupedValues<K,InputT,OutputT> transform, PCollection<KV<K,Iterable<InputT>>> input, boolean testSerializability, Random rand)
<T> T
ensureSerializableByCoder(Coder<T> coder, T value, String errorContext)
void
processElement(DoFn.ProcessContext c)
Processes one input element.-
Methods inherited from class com.google.cloud.dataflow.sdk.transforms.DoFn
createAggregator, createAggregator, finishBundle, getAllowedTimestampSkew, getInputTypeDescriptor, getOutputTypeDescriptor, populateDisplayData, startBundle
-
-
-
-
Method Detail
-
create
public static <K,InputT,AccumT,OutputT> DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT> create(Combine.GroupedValues<K,InputT,OutputT> transform, PCollection<KV<K,Iterable<InputT>>> input, boolean testSerializability, Random rand)
-
processElement
public void processElement(DoFn.ProcessContext c) throws Exception
Description copied from class:DoFn
Processes one input element.The current element of the input
PCollection
is returned byc.element()
. It should be considered immutable. The Dataflow runtime will not mutate the element, so it is safe to cache, etc. The element should not be mutated by any of theDoFn
methods, because it may be cached elsewhere, retained by the Dataflow runtime, or used in other unspecified ways.A value is added to the main output
PCollection
byDoFn.Context.output(OutputT)
. Once passed tooutput
the element should be considered immutable and not be modified in any way. It may be cached elsewhere, retained by the Dataflow runtime, or used in other unspecified ways.
-
addInputsRandomly
public static <K,AccumT,InputT> List<AccumT> addInputsRandomly(com.google.cloud.dataflow.sdk.util.PerKeyCombineFnRunner<? super K,? super InputT,AccumT,?> fnRunner, K key, Iterable<InputT> values, Random random, DoFn.ProcessContext c)
Create a random list of accumulators from the given list of values.Visible for testing purposes only.
-
-