Google Cloud Dataflow SDK for Java, version 1.9.1
Class ApproximateUnique.ApproximateUniqueCombineFn<T>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.Combine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,Long>
-
- com.google.cloud.dataflow.sdk.transforms.ApproximateUnique.ApproximateUniqueCombineFn<T>
-
- Type Parameters:
T
- the type of the values being combined
- All Implemented Interfaces:
- CombineFnBase.GlobalCombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,Long>, HasDisplayData, Serializable
- Enclosing class:
- ApproximateUnique
public static class ApproximateUnique.ApproximateUniqueCombineFn<T> extends Combine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,Long>
CombineFn
that computes an estimate of the number of distinct values that were combined.Hashes input elements, computes the top
sampleSize
hash values, and uses those to extrapolate the size of the entire set of hash values by assuming the rest of the hash values are as densely distributed as the topsampleSize
.Used to implement
ApproximatUnique.globally(...)
andApproximatUnique.perKey(...)
.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique
A heap utility class to efficiently track the largest added elements.
-
Constructor Summary
Constructors Constructor and Description ApproximateUniqueCombineFn(long sampleSize, Coder<T> coder)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method and Description ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique
addInput(ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique heap, T input)
Adds the given input value to the given accumulator, returning the new accumulator value.ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique
createAccumulator()
Returns a new, mutable accumulator value, representing the accumulation of zero input values.Long
extractOutput(ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique heap)
Returns the output value that is the result of combining all the input values represented by the given accumulator.TypeVariable<?>
getAccumTVariable()
Returns theTypeVariable
ofAccumT
.Coder<ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique>
getAccumulatorCoder(CoderRegistry registry, Coder<T> inputCoder)
Returns theCoder
to use for accumulatorAccumT
values, or null if it is not able to be inferred.Coder<OutputT>
getDefaultOutputCoder(CoderRegistry registry, Coder<InputT> inputCoder)
Returns theCoder
to use by default for outputOutputT
values, or null if it is not able to be inferred.String
getIncompatibleGlobalWindowErrorMessage()
Returns the error message for not supported default values in Combine.globally().TypeVariable<?>
getInputTVariable()
Returns theTypeVariable
ofInputT
.TypeVariable<?>
getOutputTVariable()
Returns theTypeVariable
ofOutputT
.ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique
mergeAccumulators(Iterable<ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique> heaps)
Returns an accumulator representing the accumulation of all the input values accumulated in the merging accumulators.void
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.-
Methods inherited from class com.google.cloud.dataflow.sdk.transforms.Combine.CombineFn
apply, asKeyedFn, compact, defaultValue, getOutputType
-
-
-
-
Method Detail
-
createAccumulator
public ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique createAccumulator()
Description copied from class:Combine.CombineFn
Returns a new, mutable accumulator value, representing the accumulation of zero input values.- Specified by:
createAccumulator
in classCombine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,Long>
-
addInput
public ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique addInput(ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique heap, T input)
Description copied from class:Combine.CombineFn
Adds the given input value to the given accumulator, returning the new accumulator value.For efficiency, the input accumulator may be modified and returned.
- Specified by:
addInput
in classCombine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,Long>
-
mergeAccumulators
public ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique mergeAccumulators(Iterable<ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique> heaps)
Description copied from class:Combine.CombineFn
Returns an accumulator representing the accumulation of all the input values accumulated in the merging accumulators.May modify any of the argument accumulators. May return a fresh accumulator, or may return one of the (modified) argument accumulators.
- Specified by:
mergeAccumulators
in classCombine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,Long>
-
extractOutput
public Long extractOutput(ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique heap)
Description copied from class:Combine.CombineFn
Returns the output value that is the result of combining all the input values represented by the given accumulator.- Specified by:
extractOutput
in classCombine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,Long>
-
getAccumulatorCoder
public Coder<ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique> getAccumulatorCoder(CoderRegistry registry, Coder<T> inputCoder)
Description copied from interface:CombineFnBase.GlobalCombineFn
Returns theCoder
to use for accumulatorAccumT
values, or null if it is not able to be inferred.By default, uses the knowledge of the
Coder
being used forInputT
values and the enclosingPipeline
'sCoderRegistry
to try to infer the Coder forAccumT
values.This is the Coder used to send data through a communication-intensive shuffle step, so a compact and efficient representation may have significant performance benefits.
- Specified by:
getAccumulatorCoder
in interfaceCombineFnBase.GlobalCombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,Long>
-
getDefaultOutputCoder
public Coder<OutputT> getDefaultOutputCoder(CoderRegistry registry, Coder<InputT> inputCoder) throws CannotProvideCoderException
Description copied from interface:CombineFnBase.GlobalCombineFn
Returns theCoder
to use by default for outputOutputT
values, or null if it is not able to be inferred.By default, uses the knowledge of the
Coder
being used for inputInputT
values and the enclosingPipeline
'sCoderRegistry
to try to infer the Coder forOutputT
values.- Specified by:
getDefaultOutputCoder
in interfaceCombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
- Throws:
CannotProvideCoderException
-
getIncompatibleGlobalWindowErrorMessage
public String getIncompatibleGlobalWindowErrorMessage()
Description copied from interface:CombineFnBase.GlobalCombineFn
Returns the error message for not supported default values in Combine.globally().- Specified by:
getIncompatibleGlobalWindowErrorMessage
in interfaceCombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
-
getInputTVariable
public TypeVariable<?> getInputTVariable()
Returns theTypeVariable
ofInputT
.
-
getAccumTVariable
public TypeVariable<?> getAccumTVariable()
Returns theTypeVariable
ofAccumT
.
-
getOutputTVariable
public TypeVariable<?> getOutputTVariable()
Returns theTypeVariable
ofOutputT
.
-
populateDisplayData
public void populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Parameters:
builder
- The builder to populate with display data.- See Also:
HasDisplayData
-
-