Google Cloud Dataflow SDK for Java, version 1.9.1
Class PCollectionTuple
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.values.PCollectionTuple
-
public class PCollectionTuple extends Object implements PInput, POutput
APCollectionTuple
is an immutable tuple of heterogeneously-typedPCollections
, "keyed" byTupleTags
. APCollectionTuple
can be used as the input or output of aPTransform
taking or producing multiple PCollection inputs or outputs that can be of different types, for instance aParDo
with side outputs.A
PCollectionTuple
can be created and accessed like follows:PCollection<String> pc1 = ...; PCollection<Integer> pc2 = ...; PCollection<Iterable<String>> pc3 = ...; // Create TupleTags for each of the PCollections to put in the // PCollectionTuple (the type of the TupleTag enables tracking the // static type of each of the PCollections in the PCollectionTuple): TupleTag<String> tag1 = new TupleTag<>(); TupleTag<Integer> tag2 = new TupleTag<>(); TupleTag<Iterable<String>> tag3 = new TupleTag<>(); // Create a PCollectionTuple with three PCollections: PCollectionTuple pcs = PCollectionTuple.of(tag1, pc1) .and(tag2, pc2) .and(tag3, pc3); // Create an empty PCollectionTuple: Pipeline p = ...; PCollectionTuple pcs2 = PCollectionTuple.empty(p); // Get PCollections out of a PCollectionTuple, using the same tags // that were used to put them in: PCollection<Integer> pcX = pcs.get(tag2); PCollection<String> pcY = pcs.get(tag1); PCollection<Iterable<String>> pcZ = pcs.get(tag3); // Get a map of all PCollections in a PCollectionTuple: Map<TupleTag<?>, PCollection<?>> allPcs = pcs.getAll();
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method and Description <T> PCollectionTuple
and(TupleTag<T> tag, PCollection<T> pc)
Returns a newPCollectionTuple
that has eachPCollection
andTupleTag
of thisPCollectionTuple
plus the givenPCollection
associated with the givenTupleTag
.<OutputT extends POutput>
OutputTapply(PTransform<PCollectionTuple,OutputT> t)
Likeapply(String, PTransform)
but defaulting to the name of thePTransform
.<OutputT extends POutput>
OutputTapply(String name, PTransform<PCollectionTuple,OutputT> t)
Applies the givenPTransform
to this inputPCollectionTuple
, usingname
to identify this specific application of the transform.static PCollectionTuple
empty(Pipeline pipeline)
Returns an emptyPCollectionTuple
that is part of the givenPipeline
.Collection<? extends PValue>
expand()
void
finishSpecifying()
After building, finalizes thisPInput
to make it ready for being used as an input to aPTransform
.void
finishSpecifyingOutput()
As part of applying the producingPTransform
, finalizes this output to make it ready for being used as an input and for running.<T> PCollection<T>
get(TupleTag<T> tag)
Map<TupleTag<?>,PCollection<?>>
getAll()
Returns an immutable Map fromTupleTag
to correspondingPCollection
, for all the members of thisPCollectionTuple
.Pipeline
getPipeline()
<T> boolean
has(TupleTag<T> tag)
Returns whether thisPCollectionTuple
contains aPCollection
with the given tag.static <T> PCollectionTuple
of(TupleTag<T> tag, PCollection<T> pc)
static PCollectionTuple
ofPrimitiveOutputsInternal(Pipeline pipeline, TupleTagList outputTags, com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> windowingStrategy, PCollection.IsBounded isBounded)
Returns aPCollectionTuple
with each of the given tags mapping to a new outputPCollection
.void
recordAsOutput(AppliedPTransform<?,?,?> transform)
Records that thisPOutput
is an output of the givenPTransform
.
-
-
-
Method Detail
-
empty
public static PCollectionTuple empty(Pipeline pipeline)
Returns an emptyPCollectionTuple
that is part of the givenPipeline
.A
PCollectionTuple
containing additional elements can be created by callingand(com.google.cloud.dataflow.sdk.values.TupleTag<T>, com.google.cloud.dataflow.sdk.values.PCollection<T>)
on the result.
-
of
public static <T> PCollectionTuple of(TupleTag<T> tag, PCollection<T> pc)
Returns a singletonPCollectionTuple
containing the givenPCollection
keyed by the givenTupleTag
.A
PCollectionTuple
containing additional elements can be created by callingand(com.google.cloud.dataflow.sdk.values.TupleTag<T>, com.google.cloud.dataflow.sdk.values.PCollection<T>)
on the result.
-
and
public <T> PCollectionTuple and(TupleTag<T> tag, PCollection<T> pc)
Returns a newPCollectionTuple
that has eachPCollection
andTupleTag
of thisPCollectionTuple
plus the givenPCollection
associated with the givenTupleTag
.The given
TupleTag
should not already be mapped to aPCollection
in thisPCollectionTuple
.Each
PCollection
in the resultingPCollectionTuple
must be part of the samePipeline
.
-
has
public <T> boolean has(TupleTag<T> tag)
Returns whether thisPCollectionTuple
contains aPCollection
with the given tag.
-
get
public <T> PCollection<T> get(TupleTag<T> tag)
Returns thePCollection
associated with the givenTupleTag
in thisPCollectionTuple
. ThrowsIllegalArgumentException
if there is no suchPCollection
, i.e.,!has(tag)
.
-
getAll
public Map<TupleTag<?>,PCollection<?>> getAll()
Returns an immutable Map fromTupleTag
to correspondingPCollection
, for all the members of thisPCollectionTuple
.
-
apply
public <OutputT extends POutput> OutputT apply(PTransform<PCollectionTuple,OutputT> t)
Likeapply(String, PTransform)
but defaulting to the name of thePTransform
.- Returns:
- the output of the applied
PTransform
-
apply
public <OutputT extends POutput> OutputT apply(String name, PTransform<PCollectionTuple,OutputT> t)
Applies the givenPTransform
to this inputPCollectionTuple
, usingname
to identify this specific application of the transform. This name is used in various places, including the monitoring UI, logging, and to stably identify this application node in the job graph.- Returns:
- the output of the applied
PTransform
-
ofPrimitiveOutputsInternal
public static PCollectionTuple ofPrimitiveOutputsInternal(Pipeline pipeline, TupleTagList outputTags, com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> windowingStrategy, PCollection.IsBounded isBounded)
Returns aPCollectionTuple
with each of the given tags mapping to a new outputPCollection
.For use by primitive transformations only.
-
getPipeline
public Pipeline getPipeline()
Description copied from interface:PInput
- Specified by:
getPipeline
in interfacePInput
- Specified by:
getPipeline
in interfacePOutput
-
expand
public Collection<? extends PValue> expand()
Description copied from interface:PInput
Expands thisPInput
into a list of its component outputPValues
.- A
PValue
expands to itself. - A tuple or list of
PValues
(such asPCollectionTuple
orPCollectionList
) expands to its componentPValue PValues
.
Not intended to be invoked directly by user code.
- A
-
recordAsOutput
public void recordAsOutput(AppliedPTransform<?,?,?> transform)
Description copied from interface:POutput
Records that thisPOutput
is an output of the givenPTransform
.For a compound
POutput
, it is advised to call this method on each componentPOutput
.This is not intended to be invoked by user code, but is automatically invoked as part of applying the producing
PTransform
.- Specified by:
recordAsOutput
in interfacePOutput
-
finishSpecifying
public void finishSpecifying()
Description copied from interface:PInput
After building, finalizes thisPInput
to make it ready for being used as an input to aPTransform
.Automatically invoked whenever
apply()
is invoked on thisPInput
, so users do not normally call this explicitly.- Specified by:
finishSpecifying
in interfacePInput
-
finishSpecifyingOutput
public void finishSpecifyingOutput()
Description copied from interface:POutput
As part of applying the producingPTransform
, finalizes this output to make it ready for being used as an input and for running.This includes ensuring that all
PCollections
haveCoders
specified or defaulted.Automatically invoked whenever this
POutput
is used as aPInput
to anotherPTransform
, or if never used as aPInput
, whenPipeline.run()
is called, so users do not normally call this explicitly.- Specified by:
finishSpecifyingOutput
in interfacePOutput
-
-