Google Cloud Dataflow SDK for Java, version 1.9.1
Class Partition<T>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.PTransform<PCollection<T>,PCollectionList<T>>
-
- com.google.cloud.dataflow.sdk.transforms.Partition<T>
-
- Type Parameters:
T
- the type of the elements of the input and outputPCollection
s
- All Implemented Interfaces:
- HasDisplayData, Serializable
public class Partition<T> extends PTransform<PCollection<T>,PCollectionList<T>>
Partition
takes aPCollection<T>
and aPartitionFn
, uses thePartitionFn
to split the elements of the inputPCollection
intoN
partitions, and returns aPCollectionList<T>
that bundlesN
PCollection<T>
s containing the split elements.Example of use:
PCollection<Student> students = ...; // Split students up into 10 partitions, by percentile: PCollectionList<Student> studentsByPercentile = students.apply(Partition.of(10, new PartitionFn<Student>() { public int partitionFor(Student student, int numPartitions) { return student.getPercentile() // 0..99 * numPartitions / 100; }})) for (int i = 0; i < 10; i++) { PCollection<Student> partition = studentsByPercentile.get(i); ... }
By default, the
Coder
of each of thePCollection
s in the outputPCollectionList
is the same as theCoder
of the inputPCollection
.Each output element has the same timestamp and is in the same windows as its corresponding input element, and each output
PCollection
has the sameWindowFn
associated with it as the input.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static interface
Partition.PartitionFn<T>
A function object that chooses an output partition for an element.
-
Field Summary
-
Fields inherited from class com.google.cloud.dataflow.sdk.transforms.PTransform
name
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method and Description PCollectionList<T>
apply(PCollection<T> in)
Applies thisPTransform
on the givenInputT
, and returns itsOutput
.static <T> Partition<T>
of(int numPartitions, Partition.PartitionFn<? super T> partitionFn)
Returns a newPartition
PTransform
that divides its inputPCollection
into the given number of partitions, using the given partitioning function.void
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.-
Methods inherited from class com.google.cloud.dataflow.sdk.transforms.PTransform
getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, toString, validate
-
-
-
-
Method Detail
-
of
public static <T> Partition<T> of(int numPartitions, Partition.PartitionFn<? super T> partitionFn)
Returns a newPartition
PTransform
that divides its inputPCollection
into the given number of partitions, using the given partitioning function.- Parameters:
numPartitions
- the number of partitions to divide the inputPCollection
intopartitionFn
- the function to invoke on each element to choose its output partition- Throws:
IllegalArgumentException
- ifnumPartitions <= 0
-
apply
public PCollectionList<T> apply(PCollection<T> in)
Description copied from class:PTransform
Applies thisPTransform
on the givenInputT
, and returns itsOutput
.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
The default implementation throws an exception. A derived class must either implement apply, or else each runner must supply a custom implementation via
PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<InputT, OutputT>, InputT)
.- Overrides:
apply
in classPTransform<PCollection<T>,PCollectionList<T>>
-
populateDisplayData
public void populateDisplayData(DisplayData.Builder builder)
Description copied from class:PTransform
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classPTransform<PCollection<T>,PCollectionList<T>>
- Parameters:
builder
- The builder to populate with display data.- See Also:
HasDisplayData
-
-