Google Cloud Dataflow SDK for Java, version 1.9.1
Class PCollection<T>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.values.POutputValueBase
-
- com.google.cloud.dataflow.sdk.values.PValueBase
-
- com.google.cloud.dataflow.sdk.values.TypedPValue<T>
-
- com.google.cloud.dataflow.sdk.values.PCollection<T>
-
- Type Parameters:
T
- the type of the elements of thisPCollection
public class PCollection<T> extends TypedPValue<T>
APCollection<T>
is an immutable collection of values of typeT
. APCollection
can contain either a bounded or unbounded number of elements. Bounded and unboundedPCollections
are produced as the output ofPTransforms
(including root PTransforms likeRead
andCreate
), and can be passed as the inputs of other PTransforms.Some root transforms produce bounded
PCollections
and others produce unbounded ones. For example,TextIO.Read
reads a static set of files, so it produces a boundedPCollection
.PubsubIO.Read
, on the other hand, receives a potentially infinite stream of Pubsub messages, so it produces an unboundedPCollection
.Each element in a
PCollection
may have an associated implicit timestamp. Readers assign timestamps to elements when they createPCollections
, and otherPTransforms
propagate these timestamps from their input to their output. For example,PubsubIO.Read
assigns pubsub message timestamps to elements, andTextIO.Read
assigns the default valueBoundedWindow.TIMESTAMP_MIN_VALUE
to elements. User code can explicitly assign timestamps to elements withDoFn.Context.outputWithTimestamp(OutputT, org.joda.time.Instant)
.Additionally, a
PCollection
has an associatedWindowFn
and each element is assigned to a set of windows. By default, the windowing function isGlobalWindows
and all elements are assigned into a single default window. This default can be overridden with theWindow
PTransform
.See the individual
PTransform
subclasses for specific information on how they propagate timestamps and windowing.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
PCollection.IsBounded
The enumeration of cases for whether aPCollection
is bounded.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method and Description <OutputT extends POutput>
OutputTapply(PTransform<? super PCollection<T>,OutputT> t)
Likeapply(String, PTransform)
but defaulting to the name of thePTransform
.<OutputT extends POutput>
OutputTapply(String name, PTransform<? super PCollection<T>,OutputT> t)
Applies the givenPTransform
to this inputPCollection
, usingname
to identify this specific application of the transform.static <T> PCollection<T>
createPrimitiveOutputInternal(Pipeline pipeline, com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> windowingStrategy, PCollection.IsBounded isBounded)
Creates and returns a newPCollection
for a primitive output.Coder<T>
getCoder()
Returns theCoder
used by thisPCollection
to encode and decode the values stored in it.String
getName()
Returns the name of thisPCollection
.com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?>
getWindowingStrategy()
Returns theWindowingStrategy
of thisPCollection
.PCollection.IsBounded
isBounded()
PCollection<T>
setCoder(Coder<T> coder)
Sets theCoder
used by thisPCollection
to encode and decode the values stored in it.PCollection<T>
setIsBoundedInternal(PCollection.IsBounded isBounded)
Sets thePCollection.IsBounded
of thisPCollection
.PCollection<T>
setName(String name)
Sets the name of thisPCollection
.PCollection<T>
setTypeDescriptorInternal(TypeDescriptor<T> typeDescriptor)
Sets theTypeDescriptor<T>
for thisPCollection<T>
.PCollection<T>
setWindowingStrategyInternal(com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> windowingStrategy)
Sets theWindowingStrategy
of thisPCollection
.-
Methods inherited from class com.google.cloud.dataflow.sdk.values.TypedPValue
finishSpecifying, getTypeDescriptor
-
Methods inherited from class com.google.cloud.dataflow.sdk.values.PValueBase
expand, getKindString, isFinishedSpecifyingInternal, recordAsOutput, recordAsOutput, toString
-
Methods inherited from class com.google.cloud.dataflow.sdk.values.POutputValueBase
finishSpecifyingOutput, getPipeline, getProducingTransformInternal
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface com.google.cloud.dataflow.sdk.values.PValue
getProducingTransformInternal
-
Methods inherited from interface com.google.cloud.dataflow.sdk.values.POutput
expand, finishSpecifyingOutput, getPipeline, recordAsOutput
-
Methods inherited from interface com.google.cloud.dataflow.sdk.values.PInput
expand, getPipeline
-
-
-
-
Method Detail
-
getName
public String getName()
Returns the name of thisPCollection
.By default, the name of a
PCollection
is based on the name of thePTransform
that produces it. It can be specified explicitly by callingsetName(java.lang.String)
.- Specified by:
getName
in interfacePValue
- Overrides:
getName
in classPValueBase
- Throws:
IllegalStateException
- if the name hasn't been set yet
-
setName
public PCollection<T> setName(String name)
Sets the name of thisPCollection
. Returnsthis
.- Overrides:
setName
in classPValueBase
- Throws:
IllegalStateException
- if thisPCollection
has already been finalized and may no longer be set. Onceapply(com.google.cloud.dataflow.sdk.transforms.PTransform<? super com.google.cloud.dataflow.sdk.values.PCollection<T>, OutputT>)
has been called, this will be the case.
-
getCoder
public Coder<T> getCoder()
Returns theCoder
used by thisPCollection
to encode and decode the values stored in it.- Overrides:
getCoder
in classTypedPValue<T>
- Throws:
IllegalStateException
- if theCoder
hasn't been set, and couldn't be inferred.
-
setCoder
public PCollection<T> setCoder(Coder<T> coder)
- Overrides:
setCoder
in classTypedPValue<T>
- Throws:
IllegalStateException
- if thisPCollection
has already been finalized and may no longer be set. Onceapply(com.google.cloud.dataflow.sdk.transforms.PTransform<? super com.google.cloud.dataflow.sdk.values.PCollection<T>, OutputT>)
has been called, this will be the case.
-
apply
public <OutputT extends POutput> OutputT apply(PTransform<? super PCollection<T>,OutputT> t)
Likeapply(String, PTransform)
but defaulting to the name of thePTransform
.- Returns:
- the output of the applied
PTransform
-
apply
public <OutputT extends POutput> OutputT apply(String name, PTransform<? super PCollection<T>,OutputT> t)
Applies the givenPTransform
to this inputPCollection
, usingname
to identify this specific application of the transform. This name is used in various places, including the monitoring UI, logging, and to stably identify this application node in the job graph.- Returns:
- the output of the applied
PTransform
-
getWindowingStrategy
public com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> getWindowingStrategy()
Returns theWindowingStrategy
of thisPCollection
.
-
isBounded
public PCollection.IsBounded isBounded()
-
setTypeDescriptorInternal
public PCollection<T> setTypeDescriptorInternal(TypeDescriptor<T> typeDescriptor)
Sets theTypeDescriptor<T>
for thisPCollection<T>
. This may allow the enclosingPCollectionTuple
,PCollectionList
, orPTransform<?, PCollection<T>>
, etc., to provide more detailed reflective information.- Overrides:
setTypeDescriptorInternal
in classTypedPValue<T>
-
setWindowingStrategyInternal
public PCollection<T> setWindowingStrategyInternal(com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> windowingStrategy)
-
setIsBoundedInternal
public PCollection<T> setIsBoundedInternal(PCollection.IsBounded isBounded)
-
createPrimitiveOutputInternal
public static <T> PCollection<T> createPrimitiveOutputInternal(Pipeline pipeline, com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> windowingStrategy, PCollection.IsBounded isBounded)
Creates and returns a newPCollection
for a primitive output.For use by primitive transformations only.
-
-