Google Cloud Dataflow SDK for Java, version 1.9.1
Class Combine.KeyedCombineFn<K,InputT,AccumT,OutputT>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.Combine.KeyedCombineFn<K,InputT,AccumT,OutputT>
-
- Type Parameters:
K
- type of keysInputT
- type of input valuesAccumT
- type of mutable accumulator valuesOutputT
- type of output values
- All Implemented Interfaces:
- CombineFnBase.PerKeyCombineFn<K,InputT,AccumT,OutputT>, HasDisplayData, Serializable
- Direct Known Subclasses:
- CombineFns.ComposedKeyedCombineFn
- Enclosing class:
- Combine
public abstract static class Combine.KeyedCombineFn<K,InputT,AccumT,OutputT> extends Object
AKeyedCombineFn<K, InputT, AccumT, OutputT>
specifies how to combine a collection of input values of typeInputT
, associated with a key of typeK
, into a single output value of typeOutputT
. It does this via one or more intermediate mutable accumulator values of typeAccumT
.The overall process to combine a collection of input
InputT
values associated with an inputK
key into a single outputOutputT
value is as follows:- The input
InputT
values are partitioned into one or more batches. - For each batch, the
createAccumulator(K)
operation is invoked to create a fresh mutable accumulator value of typeAccumT
, initialized to represent the combination of zero values. - For each input
InputT
value in a batch, theaddInput(K, AccumT, InputT)
operation is invoked to add the value to that batch's accumulatorAccumT
value. The accumulator may just record the new value (e.g., ifAccumT == List<InputT>
, or may do work to represent the combination more compactly. - The
mergeAccumulators(K, java.lang.Iterable<AccumT>)
operation is invoked to combine a collection of accumulatorAccumT
values into a single combined output accumulatorAccumT
value, once the merging accumulators have had all all the input values in their batches added to them. This operation is invoked repeatedly, until there is only one accumulator value left. - The
extractOutput(K, AccumT)
operation is invoked on the final accumulatorAccumT
value to get the outputOutputT
value.
All of these operations are passed the
K
key that the values being combined are associated with.For example:
public class ConcatFn extends KeyedCombineFn<String, Integer, ConcatFn.Accum, String> { public static class Accum { String s = ""; } public Accum createAccumulator(String key) { return new Accum(); } public Accum addInput(String key, Accum accum, Integer input) { accum.s += "+" + input; return accum; } public Accum mergeAccumulators(String key, Iterable<Accum> accums) { Accum merged = new Accum(); for (Accum accum : accums) { merged.s += accum.s; } return merged; } public String extractOutput(String key, Accum accum) { return key + accum.s; } } PCollection<KV<String, Integer>> pc = ...; PCollection<KV<String, String>> pc2 = pc.apply( Combine.perKey(new ConcatFn()));
Keyed combining functions used by
Combine.PerKey
,Combine.GroupedValues
, andPTransforms
derived from them should be associative and commutative. Associativity is required because input values are first broken up into subgroups before being combined, and their intermediate results further combined, in an arbitrary tree structure. Commutativity is required because any order of the input values is ignored when breaking up input values into groups.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor and Description KeyedCombineFn()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method and Description abstract AccumT
addInput(K key, AccumT accumulator, InputT value)
Adds the given input value to the given accumulator, returning the new accumulator value.OutputT
apply(K key, Iterable<? extends InputT> inputs)
Applies thisKeyedCombineFn
to a key and a collection of input values to produce a combined output value.AccumT
compact(K key, AccumT accumulator)
Returns an accumulator that represents the same logical value as the input accumulator, but may have a more compact representation.abstract AccumT
createAccumulator(K key)
Returns a new, mutable accumulator value representing the accumulation of zero input values.abstract OutputT
extractOutput(K key, AccumT accumulator)
Returns the output value that is the result of combining all the input values represented by the given accumulator.Combine.CombineFn<InputT,AccumT,OutputT>
forKey(K key, Coder<K> keyCoder)
Returns the a regularCombineFnBase.GlobalCombineFn
that operates on a specific key.TypeVariable<?>
getAccumTVariable()
Returns theTypeVariable
ofAccumT
.Coder<AccumT>
getAccumulatorCoder(CoderRegistry registry, Coder<K> keyCoder, Coder<InputT> inputCoder)
Returns theCoder
to use for accumulatorAccumT
values, or null if it is not able to be inferred.Coder<OutputT>
getDefaultOutputCoder(CoderRegistry registry, Coder<K> keyCoder, Coder<InputT> inputCoder)
Returns theCoder
to use by default for outputOutputT
values, or null if it is not able to be inferred.TypeVariable<?>
getInputTVariable()
Returns theTypeVariable
ofInputT
.TypeVariable<?>
getKTypeVariable()
Returns theTypeVariable
ofK
.TypeVariable<?>
getOutputTVariable()
Returns theTypeVariable
ofOutputT
.abstract AccumT
mergeAccumulators(K key, Iterable<AccumT> accumulators)
Returns an accumulator representing the accumulation of all the input values accumulated in the merging accumulators.void
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
-
-
-
Method Detail
-
createAccumulator
public abstract AccumT createAccumulator(K key)
Returns a new, mutable accumulator value representing the accumulation of zero input values.- Parameters:
key
- the key that all the accumulated values using the accumulator are associated with
-
addInput
public abstract AccumT addInput(K key, AccumT accumulator, InputT value)
Adds the given input value to the given accumulator, returning the new accumulator value.For efficiency, the input accumulator may be modified and returned.
- Parameters:
key
- the key that all the accumulated values using the accumulator are associated with
-
mergeAccumulators
public abstract AccumT mergeAccumulators(K key, Iterable<AccumT> accumulators)
Returns an accumulator representing the accumulation of all the input values accumulated in the merging accumulators.May modify any of the argument accumulators. May return a fresh accumulator, or may return one of the (modified) argument accumulators.
- Parameters:
key
- the key that all the accumulators are associated with
-
extractOutput
public abstract OutputT extractOutput(K key, AccumT accumulator)
Returns the output value that is the result of combining all the input values represented by the given accumulator.- Parameters:
key
- the key that all the accumulated values using the accumulator are associated with
-
compact
public AccumT compact(K key, AccumT accumulator)
Returns an accumulator that represents the same logical value as the input accumulator, but may have a more compact representation.For most CombineFns this would be a no-op, but should be overridden by CombineFns that (for example) buffer up elements and combine them in batches.
For efficiency, the input accumulator may be modified and returned.
By default returns the original accumulator.
-
forKey
public Combine.CombineFn<InputT,AccumT,OutputT> forKey(K key, Coder<K> keyCoder)
Description copied from interface:CombineFnBase.PerKeyCombineFn
Returns the a regularCombineFnBase.GlobalCombineFn
that operates on a specific key.
-
apply
public OutputT apply(K key, Iterable<? extends InputT> inputs)
Applies thisKeyedCombineFn
to a key and a collection of input values to produce a combined output value.Useful when testing the behavior of a
KeyedCombineFn
separately from aCombine
transform.
-
getAccumulatorCoder
public Coder<AccumT> getAccumulatorCoder(CoderRegistry registry, Coder<K> keyCoder, Coder<InputT> inputCoder) throws CannotProvideCoderException
Description copied from interface:CombineFnBase.PerKeyCombineFn
Returns theCoder
to use for accumulatorAccumT
values, or null if it is not able to be inferred.By default, uses the knowledge of the
Coder
being used forK
keys and inputInputT
values and the enclosingPipeline
'sCoderRegistry
to try to infer the Coder forAccumT
values.This is the Coder used to send data through a communication-intensive shuffle step, so a compact and efficient representation may have significant performance benefits.
- Specified by:
getAccumulatorCoder
in interfaceCombineFnBase.PerKeyCombineFn<K,InputT,AccumT,OutputT>
- Throws:
CannotProvideCoderException
-
getDefaultOutputCoder
public Coder<OutputT> getDefaultOutputCoder(CoderRegistry registry, Coder<K> keyCoder, Coder<InputT> inputCoder) throws CannotProvideCoderException
Description copied from interface:CombineFnBase.PerKeyCombineFn
Returns theCoder
to use by default for outputOutputT
values, or null if it is not able to be inferred.By default, uses the knowledge of the
Coder
being used forK
keys and inputInputT
values and the enclosingPipeline
'sCoderRegistry
to try to infer the Coder forOutputT
values.- Specified by:
getDefaultOutputCoder
in interfaceCombineFnBase.PerKeyCombineFn<K,InputT,AccumT,OutputT>
- Throws:
CannotProvideCoderException
-
getKTypeVariable
public TypeVariable<?> getKTypeVariable()
Returns theTypeVariable
ofK
.
-
getInputTVariable
public TypeVariable<?> getInputTVariable()
Returns theTypeVariable
ofInputT
.
-
getAccumTVariable
public TypeVariable<?> getAccumTVariable()
Returns theTypeVariable
ofAccumT
.
-
getOutputTVariable
public TypeVariable<?> getOutputTVariable()
Returns theTypeVariable
ofOutputT
.
-
populateDisplayData
public void populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Parameters:
builder
- The builder to populate with display data.- See Also:
HasDisplayData
-
-