CoGroupByKey (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.transforms.join

Class CoGroupByKey<K>

  • Type Parameters:
    K - the type of the keys in the input and output PCollections
    All Implemented Interfaces:
    HasDisplayData, Serializable


    public class CoGroupByKey<K>
    extends PTransform<KeyedPCollectionTuple<K>,PCollection<KV<K,CoGbkResult>>>
    A PTransform that performs a CoGroupByKey on a tuple of tables. A CoGroupByKey groups results from all tables by like keys into CoGbkResults, from which the results for any specific table can be accessed by the TupleTag supplied with the initial table.

    Example of performing a CoGroupByKey followed by a ParDo that consumes the results:

    
     PCollection<KV<K, V1>> pt1 = ...;
     PCollection<KV<K, V2>> pt2 = ...;
    
     final TupleTag<V1> t1 = new TupleTag<>();
     final TupleTag<V2> t2 = new TupleTag<>();
     PCollection<KV<K, CoGbkResult>> coGbkResultCollection =
       KeyedPCollectionTuple.of(t1, pt1)
                            .and(t2, pt2)
                            .apply(CoGroupByKey.<K>create());
    
     PCollection<T> finalResultCollection =
       coGbkResultCollection.apply(ParDo.of(
         new DoFn<KV<K, CoGbkResult>, T>() {
           @Override
           public void processElement(ProcessContext c) {
             KV<K, CoGbkResult> e = c.element();
             Iterable<V1> pt1Vals = e.getValue().getAll(t1);
             V2 pt2Val = e.getValue().getOnly(t2);
              ... Do Something ....
             c.output(...some T...);
           }
         }));
    
    See Also:
    Serialized Form


Send feedback about...

Cloud Dataflow