Google Cloud Dataflow SDK for Java, version 1.9.1
Class CoGroupByKey<K>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.PTransform<KeyedPCollectionTuple<K>,PCollection<KV<K,CoGbkResult>>>
-
- com.google.cloud.dataflow.sdk.transforms.join.CoGroupByKey<K>
-
- Type Parameters:
K
- the type of the keys in the input and outputPCollection
s
- All Implemented Interfaces:
- HasDisplayData, Serializable
public class CoGroupByKey<K> extends PTransform<KeyedPCollectionTuple<K>,PCollection<KV<K,CoGbkResult>>>
APTransform
that performs aCoGroupByKey
on a tuple of tables. ACoGroupByKey
groups results from all tables by like keys intoCoGbkResult
s, from which the results for any specific table can be accessed by theTupleTag
supplied with the initial table.Example of performing a
CoGroupByKey
followed by aParDo
that consumes the results:PCollection<KV<K, V1>> pt1 = ...; PCollection<KV<K, V2>> pt2 = ...; final TupleTag<V1> t1 = new TupleTag<>(); final TupleTag<V2> t2 = new TupleTag<>(); PCollection<KV<K, CoGbkResult>> coGbkResultCollection = KeyedPCollectionTuple.of(t1, pt1) .and(t2, pt2) .apply(CoGroupByKey.<K>create()); PCollection<T> finalResultCollection = coGbkResultCollection.apply(ParDo.of( new DoFn<KV<K, CoGbkResult>, T>() { @Override public void processElement(ProcessContext c) { KV<K, CoGbkResult> e = c.element(); Iterable<V1> pt1Vals = e.getValue().getAll(t1); V2 pt2Val = e.getValue().getOnly(t2); ... Do Something .... c.output(...some T...); } }));
- See Also:
- Serialized Form
-
-
Field Summary
-
Fields inherited from class com.google.cloud.dataflow.sdk.transforms.PTransform
name
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method and Description PCollection<KV<K,CoGbkResult>>
apply(KeyedPCollectionTuple<K> input)
Applies thisPTransform
on the givenInputT
, and returns itsOutput
.static <K> CoGroupByKey<K>
create()
Returns aCoGroupByKey<K>
PTransform
.-
Methods inherited from class com.google.cloud.dataflow.sdk.transforms.PTransform
getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate
-
-
-
-
Method Detail
-
create
public static <K> CoGroupByKey<K> create()
Returns aCoGroupByKey<K>
PTransform
.- Type Parameters:
K
- the type of the keys in the input and outputPCollection
s
-
apply
public PCollection<KV<K,CoGbkResult>> apply(KeyedPCollectionTuple<K> input)
Description copied from class:PTransform
Applies thisPTransform
on the givenInputT
, and returns itsOutput
.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
The default implementation throws an exception. A derived class must either implement apply, or else each runner must supply a custom implementation via
PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<InputT, OutputT>, InputT)
.- Overrides:
apply
in classPTransform<KeyedPCollectionTuple<K>,PCollection<KV<K,CoGbkResult>>>
-
-