Google Cloud Dataflow SDK for Java, version 1.9.1
Class Combine.PerKey<K,InputT,OutputT>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
-
- com.google.cloud.dataflow.sdk.transforms.Combine.PerKey<K,InputT,OutputT>
-
- Type Parameters:
K
- the type of the keys of the input and outputPCollection
sInputT
- the type of the values of the inputPCollection
OutputT
- the type of the values of the outputPCollection
- All Implemented Interfaces:
- HasDisplayData, Serializable
- Enclosing class:
- Combine
public static class Combine.PerKey<K,InputT,OutputT> extends PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
PerKey<K, InputT, OutputT>
takes aPCollection<KV<K, InputT>>
, groups it by key, applies a combining function to theInputT
values associated with each key to produce a combinedOutputT
value, and returns aPCollection<KV<K, OutputT>>
representing a map from each distinct key of the inputPCollection
to the corresponding combined value.InputT
andOutputT
are often the same.This is a concise shorthand for an application of
GroupByKey
followed by an application ofCombine.GroupedValues
. See those operations for more details on how keys are compared for equality and on the defaultCoder
for the output.Example of use:
PCollection<KV<String, Double>> salesRecords = ...; PCollection<KV<String, Double>> totalSalesPerPerson = salesRecords.apply(Combine.<String, Double>perKey( new Sum.SumDoubleFn()));
Each output element is in the window by which its corresponding input was grouped, and has the timestamp of the end of that window. The output
PCollection
has the sameWindowFn
as the input.- See Also:
- Serialized Form
-
-
Field Summary
-
Fields inherited from class com.google.cloud.dataflow.sdk.transforms.PTransform
name
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method and Description PCollection<KV<K,OutputT>>
apply(PCollection<KV<K,InputT>> input)
Applies thisPTransform
on the givenInputT
, and returns itsOutput
.CombineFnBase.PerKeyCombineFn<? super K,? super InputT,?,OutputT>
getFn()
Returns theCombineFnBase.PerKeyCombineFn
used by this Combine operation.List<PCollectionView<?>>
getSideInputs()
Returns the side inputs used by this Combine operation.Combine.PerKey<K,InputT,OutputT>
named(String name)
Return a newGlobally
transform that's like this transform but with the specified name.void
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT>
withHotKeyFanout(int hotKeyFanout)
LikewithHotKeyFanout(SerializableFunction)
, but returning the given constant value for every key.Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT>
withHotKeyFanout(SerializableFunction<? super K,Integer> hotKeyFanout)
If a single key has disproportionately many values, it may become a bottleneck, especially in streaming mode.Combine.PerKey<K,InputT,OutputT>
withSideInputs(Iterable<? extends PCollectionView<?>> sideInputs)
Returns aPTransform
identical to this, but with the specified side inputs to use inCombineWithContext.KeyedCombineFnWithContext
.-
Methods inherited from class com.google.cloud.dataflow.sdk.transforms.PTransform
getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, toString, validate
-
-
-
-
Method Detail
-
named
public Combine.PerKey<K,InputT,OutputT> named(String name)
Return a newGlobally
transform that's like this transform but with the specified name. Does not modify this transform.
-
withSideInputs
public Combine.PerKey<K,InputT,OutputT> withSideInputs(Iterable<? extends PCollectionView<?>> sideInputs)
Returns aPTransform
identical to this, but with the specified side inputs to use inCombineWithContext.KeyedCombineFnWithContext
.
-
withHotKeyFanout
public Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> withHotKeyFanout(SerializableFunction<? super K,Integer> hotKeyFanout)
If a single key has disproportionately many values, it may become a bottleneck, especially in streaming mode. This returns a new per-key combining transform that inserts an intermediate node to combine "hot" keys partially before performing the full combine.- Parameters:
hotKeyFanout
- a function from keys to an integer N, where the key will be spread among N intermediate nodes for partial combining. If N is less than or equal to 1, this key will not be sent through an intermediate node.
-
withHotKeyFanout
public Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> withHotKeyFanout(int hotKeyFanout)
LikewithHotKeyFanout(SerializableFunction)
, but returning the given constant value for every key.
-
getFn
public CombineFnBase.PerKeyCombineFn<? super K,? super InputT,?,OutputT> getFn()
Returns theCombineFnBase.PerKeyCombineFn
used by this Combine operation.
-
getSideInputs
public List<PCollectionView<?>> getSideInputs()
Returns the side inputs used by this Combine operation.
-
apply
public PCollection<KV<K,OutputT>> apply(PCollection<KV<K,InputT>> input)
Description copied from class:PTransform
Applies thisPTransform
on the givenInputT
, and returns itsOutput
.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
The default implementation throws an exception. A derived class must either implement apply, or else each runner must supply a custom implementation via
PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<InputT, OutputT>, InputT)
.- Overrides:
apply
in classPTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
-
populateDisplayData
public void populateDisplayData(DisplayData.Builder builder)
Description copied from class:PTransform
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classPTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
- Parameters:
builder
- The builder to populate with display data.- See Also:
HasDisplayData
-
-