Google Cloud Dataflow SDK for Java, version 1.9.1
Class Combine.GroupedValues<K,InputT,OutputT>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.PTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
-
- com.google.cloud.dataflow.sdk.transforms.Combine.GroupedValues<K,InputT,OutputT>
-
- Type Parameters:
K
- type of input and output keysInputT
- type of input valuesOutputT
- type of output values
- All Implemented Interfaces:
- HasDisplayData, Serializable
- Enclosing class:
- Combine
public static class Combine.GroupedValues<K,InputT,OutputT> extends PTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
GroupedValues<K, InputT, OutputT>
takes aPCollection<KV<K, Iterable<InputT>>>
, such as the result ofGroupByKey
, applies a specifiedKeyedCombineFn<K, InputT, AccumT, OutputT>
to each of the inputKV<K, Iterable<InputT>>
elements to produce a combined outputKV<K, OutputT>
element, and returns aPCollection<KV<K, OutputT>>
containing all the combined output elements. It is common forInputT == OutputT
, but not required. Common combining functions include sums, mins, maxes, and averages of numbers, conjunctions and disjunctions of booleans, statistical aggregations, etc.Example of use:
PCollection<KV<String, Integer>> pc = ...; PCollection<KV<String, Iterable<Integer>>> groupedByKey = pc.apply( new GroupByKey<String, Integer>()); PCollection<KV<String, Integer>> sumByKey = groupedByKey.apply( Combine.<String, Integer>groupedValues( new Sum.SumIntegerFn()));
See also
Combine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
/Combine.PerKey
, which captures the common pattern of "combining by key" in a single easy-to-usePTransform
.Combining for different keys can happen in parallel. Moreover, combining of the
Iterable<InputT>
values associated a single key can happen in parallel, with different subsets of the values being combined separately, and their intermediate results combined further, in an arbitrary tree reduction pattern, until a single result value is produced for each key.By default, the
Coder
of the keys of the outputPCollection<KV<K, OutputT>>
is that of the keys of the inputPCollection<KV<K, InputT>>
, and theCoder
of the values of the outputPCollection<KV<K, OutputT>>
is inferred from the concrete type of theKeyedCombineFn<K, InputT, AccumT, OutputT>
's output typeOutputT
.Each output element has the same timestamp and is in the same window as its corresponding input element, and the output
PCollection
has the sameWindowFn
associated with it as the input.See also
Combine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
/Combine.Globally
, which combines all the values in aPCollection
into a single value in aPCollection
.- See Also:
- Serialized Form
-
-
Field Summary
-
Fields inherited from class com.google.cloud.dataflow.sdk.transforms.PTransform
name
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method and Description PCollection<KV<K,OutputT>>
apply(PCollection<? extends KV<K,? extends Iterable<InputT>>> input)
Applies thisPTransform
on the givenInputT
, and returns itsOutput
.com.google.cloud.dataflow.sdk.util.AppliedCombineFn<? super K,? super InputT,?,OutputT>
getAppliedFn(CoderRegistry registry, Coder<? extends KV<K,? extends Iterable<InputT>>> inputCoder, com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> windowingStrategy)
Returns theCombine.CombineFn
bound to its coders.Coder<KV<K,OutputT>>
getDefaultOutputCoder(PCollection<? extends KV<K,? extends Iterable<InputT>>> input)
Returns the defaultCoder
to use for the output of this single-outputPTransform
when applied to the given input.CombineFnBase.PerKeyCombineFn<? super K,? super InputT,?,OutputT>
getFn()
Returns the KeyedCombineFn used by this Combine operation.List<PCollectionView<?>>
getSideInputs()
void
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.Combine.GroupedValues<K,InputT,OutputT>
withSideInputs(Iterable<? extends PCollectionView<?>> sideInputs)
-
Methods inherited from class com.google.cloud.dataflow.sdk.transforms.PTransform
getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, toString, validate
-
-
-
-
Method Detail
-
withSideInputs
public Combine.GroupedValues<K,InputT,OutputT> withSideInputs(Iterable<? extends PCollectionView<?>> sideInputs)
-
getFn
public CombineFnBase.PerKeyCombineFn<? super K,? super InputT,?,OutputT> getFn()
Returns the KeyedCombineFn used by this Combine operation.
-
getSideInputs
public List<PCollectionView<?>> getSideInputs()
-
apply
public PCollection<KV<K,OutputT>> apply(PCollection<? extends KV<K,? extends Iterable<InputT>>> input)
Description copied from class:PTransform
Applies thisPTransform
on the givenInputT
, and returns itsOutput
.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
The default implementation throws an exception. A derived class must either implement apply, or else each runner must supply a custom implementation via
PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<InputT, OutputT>, InputT)
.- Overrides:
apply
in classPTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
-
getAppliedFn
public com.google.cloud.dataflow.sdk.util.AppliedCombineFn<? super K,? super InputT,?,OutputT> getAppliedFn(CoderRegistry registry, Coder<? extends KV<K,? extends Iterable<InputT>>> inputCoder, com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> windowingStrategy)
Returns theCombine.CombineFn
bound to its coders.For internal use.
-
getDefaultOutputCoder
public Coder<KV<K,OutputT>> getDefaultOutputCoder(PCollection<? extends KV<K,? extends Iterable<InputT>>> input) throws CannotProvideCoderException
Description copied from class:PTransform
Returns the defaultCoder
to use for the output of this single-outputPTransform
when applied to the given input.By default, always throws.
- Overrides:
getDefaultOutputCoder
in classPTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
- Throws:
CannotProvideCoderException
- if none can be inferred.
-
populateDisplayData
public void populateDisplayData(DisplayData.Builder builder)
Description copied from class:PTransform
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classPTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
- Parameters:
builder
- The builder to populate with display data.- See Also:
HasDisplayData
-
-