PCollectionTuple (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1


Class PCollectionTuple

  • All Implemented Interfaces:
    PInput, POutput

    public class PCollectionTuple
    extends Object
    implements PInput, POutput
    A PCollectionTuple is an immutable tuple of heterogeneously-typed PCollections, "keyed" by TupleTags. A PCollectionTuple can be used as the input or output of a PTransform taking or producing multiple PCollection inputs or outputs that can be of different types, for instance a ParDo with side outputs.

    A PCollectionTuple can be created and accessed like follows:

     PCollection<String> pc1 = ...;
     PCollection<Integer> pc2 = ...;
     PCollection<Iterable<String>> pc3 = ...;
     // Create TupleTags for each of the PCollections to put in the
     // PCollectionTuple (the type of the TupleTag enables tracking the
     // static type of each of the PCollections in the PCollectionTuple):
     TupleTag<String> tag1 = new TupleTag<>();
     TupleTag<Integer> tag2 = new TupleTag<>();
     TupleTag<Iterable<String>> tag3 = new TupleTag<>();
     // Create a PCollectionTuple with three PCollections:
     PCollectionTuple pcs =
         PCollectionTuple.of(tag1, pc1)
                         .and(tag2, pc2)
                         .and(tag3, pc3);
     // Create an empty PCollectionTuple:
     Pipeline p = ...;
     PCollectionTuple pcs2 = PCollectionTuple.empty(p);
     // Get PCollections out of a PCollectionTuple, using the same tags
     // that were used to put them in:
     PCollection<Integer> pcX = pcs.get(tag2);
     PCollection<String> pcY = pcs.get(tag1);
     PCollection<Iterable<String>> pcZ = pcs.get(tag3);
     // Get a map of all PCollections in a PCollectionTuple:
     Map<TupleTag<?>, PCollection<?>> allPcs = pcs.getAll();

Send feedback about...

Cloud Dataflow