Google Cloud Dataflow SDK for Java, version 1.9.1
Package com.google.cloud.dataflow.sdk.transforms
PTransform
s for transforming
data in a pipeline.See: Description
-
Interface Summary Interface Description Aggregator<InputT,OutputT> AnAggregator<InputT>
enables monitoring of values of typeInputT
, to be combined across all bundles.Combine.AccumulatingCombineFn.Accumulator<InputT,AccumT,OutputT> The type of mutable accumulator values used by thisAccumulatingCombineFn
.CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT> AGloballyCombineFn<InputT, AccumT, OutputT>
specifies how to combine a collection of input values of typeInputT
into a single output value of typeOutputT
.CombineFnBase.PerKeyCombineFn<K,InputT,AccumT,OutputT> APerKeyCombineFn<K, InputT, AccumT, OutputT>
specifies how to combine a collection of input values of typeInputT
, associated with a key of typeK
, into a single output value of typeOutputT
.CombineWithContext.RequiresContextInternal An internal interface for signaling that aGloballyCombineFn
or aPerKeyCombineFn
needs to accessCombineWithContext.Context
.DoFn.RequiresWindowAccess Interface for signaling that aDoFn
needs to access the window the element is being processed in, viaDoFn.ProcessContext.window()
.DoFnWithContext.ExtraContextFactory<InputT,OutputT> Interface for runner implementors to provide implementations of extra context information.Partition.PartitionFn<T> A function object that chooses an output partition for an element.SerializableComparator<T> AComparator
that is alsoSerializable
.SerializableFunction<InputT,OutputT> A function that computes an output value of typeOutputT
from an input value of typeInputT
and isSerializable
. -
Class Summary Class Description AggregatorRetriever An internal class for extractingAggregators
fromDoFns
.AppliedPTransform<InputT extends PInput,OutputT extends POutput,TransformT extends PTransform<? super InputT,OutputT>> Represents the application of aPTransform
to a specific input to produce a specific output.ApproximateQuantiles PTransform
s for getting an idea of aPCollection
's data distribution using approximateN
-tiles (e.g.ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends Comparator<T> & Serializable> TheApproximateQuantilesCombineFn
combiner gives an idea of the distribution of a collection of values using approximateN
-tiles.ApproximateUnique PTransform
s for estimating the number of distinct elements in aPCollection
, or the number of distinct values associated with each key in aPCollection
ofKV
s.ApproximateUnique.ApproximateUniqueCombineFn<T> CombineFn
that computes an estimate of the number of distinct values that were combined.ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique A heap utility class to efficiently track the largest added elements.Combine PTransform
s for combiningPCollection
elements globally and per-key.Combine.AccumulatingCombineFn<InputT,AccumT extends Combine.AccumulatingCombineFn.Accumulator<InputT,AccumT,OutputT>,OutputT> ACombineFn
that uses a subclass ofCombine.AccumulatingCombineFn.Accumulator
as its accumulator type.Combine.BinaryCombineDoubleFn An abstract subclass ofCombine.CombineFn
for implementing combiners that are more easily and efficiently expressed as binary operations ondouble
s.Combine.BinaryCombineFn<V> An abstract subclass ofCombine.CombineFn
for implementing combiners that are more easily expressed as binary operations.Combine.BinaryCombineIntegerFn An abstract subclass ofCombine.CombineFn
for implementing combiners that are more easily and efficiently expressed as binary operations onint
sCombine.BinaryCombineLongFn An abstract subclass ofCombine.CombineFn
for implementing combiners that are more easily and efficiently expressed as binary operations onlong
s.Combine.CombineFn<InputT,AccumT,OutputT> ACombineFn<InputT, AccumT, OutputT>
specifies how to combine a collection of input values of typeInputT
into a single output value of typeOutputT
.Combine.Globally<InputT,OutputT> Combine.Globally<InputT, OutputT>
takes aPCollection<InputT>
and returns aPCollection<OutputT>
whose elements are the result of combining all the elements in each window of the inputPCollection
, using a specifiedCombineFn<InputT, AccumT, OutputT>
.Combine.GloballyAsSingletonView<InputT,OutputT> Combine.GloballyAsSingletonView<InputT, OutputT>
takes aPCollection<InputT>
and returns aPCollectionView<OutputT>
whose elements are the result of combining all the elements in each window of the inputPCollection
, using a specifiedCombineFn<InputT, AccumT, OutputT>
.Combine.GroupedValues<K,InputT,OutputT> GroupedValues<K, InputT, OutputT>
takes aPCollection<KV<K, Iterable<InputT>>>
, such as the result ofGroupByKey
, applies a specifiedKeyedCombineFn<K, InputT, AccumT, OutputT>
to each of the inputKV<K, Iterable<InputT>>
elements to produce a combined outputKV<K, OutputT>
element, and returns aPCollection<KV<K, OutputT>>
containing all the combined output elements.Combine.Holder<V> Holds a single value value of typeV
which may or may not be present.Combine.IterableCombineFn<V> Combine.KeyedCombineFn<K,InputT,AccumT,OutputT> AKeyedCombineFn<K, InputT, AccumT, OutputT>
specifies how to combine a collection of input values of typeInputT
, associated with a key of typeK
, into a single output value of typeOutputT
.Combine.PerKey<K,InputT,OutputT> PerKey<K, InputT, OutputT>
takes aPCollection<KV<K, InputT>>
, groups it by key, applies a combining function to theInputT
values associated with each key to produce a combinedOutputT
value, and returns aPCollection<KV<K, OutputT>>
representing a map from each distinct key of the inputPCollection
to the corresponding combined value.Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> LikeCombine.PerKey
, but sharding the combining of hot keys.Combine.SimpleCombineFn<V> Deprecated CombineFnBase This class contains the shared interfaces and abstract classes for different types of combine functions.CombineFns Static utility methods that create combine function instances.CombineFns.CoCombineResult A tuple of outputs produced by a composed combine functions.CombineFns.ComposeCombineFnBuilder A builder class to construct a composedCombineFnBase.GlobalCombineFn
.CombineFns.ComposedCombineFn<DataT> A composedCombine.CombineFn
that applies multipleCombineFns
.CombineFns.ComposedCombineFnWithContext<DataT> A composedCombineWithContext.CombineFnWithContext
that applies multipleCombineFnWithContexts
.CombineFns.ComposedKeyedCombineFn<DataT,K> A composedCombine.KeyedCombineFn
that applies multipleKeyedCombineFns
.CombineFns.ComposedKeyedCombineFnWithContext<DataT,K> A composedCombineWithContext.KeyedCombineFnWithContext
that applies multipleKeyedCombineFnWithContexts
.CombineFns.ComposeKeyedCombineFnBuilder A builder class to construct a composedCombineFnBase.PerKeyCombineFn
.CombineWithContext This class contains combine functions that have access toPipelineOptions
and side inputs throughCombineWithContext.Context
.CombineWithContext.CombineFnWithContext<InputT,AccumT,OutputT> A combine function that has access toPipelineOptions
and side inputs throughCombineWithContext.Context
.CombineWithContext.Context Information accessible to all methods inCombineFnWithContext
andKeyedCombineFnWithContext
.CombineWithContext.KeyedCombineFnWithContext<K,InputT,AccumT,OutputT> A keyed combine function that has access toPipelineOptions
and side inputs throughCombineWithContext.Context
.Count PTransorm
s to count the elements in aPCollection
.Count.PerElement<T> Count.PerElement<T>
takes aPCollection<T>
and returns aPCollection<KV<T, Long>>
representing a map from each distinct element of the inputPCollection
to the number of times that element occurs in the input.Create<T> Create<T>
takes a collection of elements of typeT
known when the pipeline is constructed and returns aPCollection<T>
containing the elements.Create.TimestampedValues<T> APTransform
that creates aPCollection
whose elements have associated timestamps.Create.Values<T> APTransform
that creates aPCollection
from a set of in-memory objects.DoFn<InputT,OutputT> The argument toParDo
providing the code to use to process elements of the inputPCollection
.DoFnReflector Utility implementing the necessary reflection for working withDoFnWithContext
s.DoFnTester<InputT,OutputT> A harness for unit-testing aDoFn
.DoFnTester.OutputElementWithTimestamp<OutputT> Holder for an OutputElement along with its associated timestamp.DoFnWithContext<InputT,OutputT> The argument toParDo
providing the code to use to process elements of the inputPCollection
.Filter<T> PTransform
s for filtering from aPCollection
the elements satisfying a predicate, or satisfying an inequality with a given value based on the elements' natural ordering.FlatMapElements<InputT,OutputT> PTransform
s for mapping a simple function that returns iterables over the elements of aPCollection
and merging the results.FlatMapElements.MissingOutputTypeDescriptor<InputT,OutputT> An intermediate builder for aFlatMapElements
transform.Flatten Flatten<T>
takes multiplePCollection<T>
s bundled into aPCollectionList<T>
and returns a singlePCollection<T>
containing all the elements in all the inputPCollection
s.Flatten.FlattenIterables<T> FlattenIterables<T>
takes aPCollection<Iterable<T>>
and returns aPCollection<T>
that contains all the elements from each iterable.Flatten.FlattenPCollectionList<T> APTransform
that flattens aPCollectionList
into aPCollection
containing all the elements of all thePCollection
s in its input.GroupByKey<K,V> GroupByKey<K, V>
takes aPCollection<KV<K, V>>
, groups the values by key and windows, and returns aPCollection<KV<K, Iterable<V>>>
representing a map from each distinct key and window of the inputPCollection
to anIterable
over all the values associated with that key in the input per window.GroupByKey.GroupAlsoByWindow<K,V> Helper transform that takes a collection of timestamp-ordered values associated with each key, groups the values by window, combines windows as needed, and for each window in each key, outputs a collection of key/value-list pairs implicitly assigned to the window and with the timestamp derived from that window.GroupByKey.GroupByKeyOnly<K,V> Primitive helper transform that groups by key only, ignoring any window assignments.GroupByKey.ReifyTimestampsAndWindows<K,V> Helper transform that makes timestamps and window assignments explicit in the value part of each key/value pair.GroupByKey.SortValuesByTimestamp<K,V> Helper transform that sorts the values associated with each key by timestamp.IntraBundleParallelization Provides multi-threading ofDoFn
s, using threaded execution to process multiple elements concurrently within a bundle.IntraBundleParallelization.Bound<InputT,OutputT> APTransform
that, when applied to aPCollection<InputT>
, invokes a user-specifiedDoFn<InputT, OutputT>
on all its elements, with all its outputs collected into an outputPCollection<OutputT>
.IntraBundleParallelization.MultiThreadedIntraBundleProcessingDoFn<InputT,OutputT> A multi-threadedDoFn
wrapper.IntraBundleParallelization.Unbound An incompleteIntraBundleParallelization
transform, with unbound input/output types.Keys<K> Keys<K>
takes aPCollection
ofKV<K, V>
s and returns aPCollection<K>
of the keys.KvSwap<K,V> KvSwap<K, V>
takes aPCollection<KV<K, V>>
and returns aPCollection<KV<V, K>>
, where all the keys and values have been swapped.MapElements<InputT,OutputT> PTransform
s for mapping a simple function over the elements of aPCollection
.MapElements.MissingOutputTypeDescriptor<InputT,OutputT> An intermediate builder for aMapElements
transform.Max PTransform
s for computing the maximum of the elements in aPCollection
, or the maximum of the values associated with each key in aPCollection
ofKV
s.Max.MaxDoubleFn ACombineFn
that computes the maximum of a collection ofDouble
s, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.Max.MaxFn<T> ACombineFn
that computes the maximum of a collection of elements of typeT
using an arbitraryComparator
, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.Max.MaxIntegerFn ACombineFn
that computes the maximum of a collection ofInteger
s, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.Max.MaxLongFn ACombineFn
that computes the maximum of a collection ofLong
s, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.Mean PTransform
s for computing the arithmetic mean (a.k.a.Min PTransform
s for computing the minimum of the elements in aPCollection
, or the minimum of the values associated with each key in aPCollection
ofKV
s.Min.MinDoubleFn ACombineFn
that computes the minimum of a collection ofDouble
s, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.Min.MinFn<T> ACombineFn
that computes the maximum of a collection of elements of typeT
using an arbitraryComparator
, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.Min.MinIntegerFn ACombineFn
that computes the minimum of a collection ofInteger
s, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.Min.MinLongFn ACombineFn
that computes the minimum of a collection ofLong
s, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.ParDo ParDo
is the core element-wise transform in Google Cloud Dataflow, invoking a user-specified function on each of the elements of the inputPCollection
to produce zero or more output elements, all of which are collected into the outputPCollection
.ParDo.Bound<InputT,OutputT> APTransform
that, when applied to aPCollection<InputT>
, invokes a user-specifiedDoFn<InputT, OutputT>
on all its elements, with all its outputs collected into an outputPCollection<OutputT>
.ParDo.BoundMulti<InputT,OutputT> APTransform
that, when applied to aPCollection<InputT>
, invokes a user-specifiedDoFn<InputT, OutputT>
on all its elements, which can emit elements to any of thePTransform
's main and side outputPCollection
s, which are bundled into a resultPCollectionTuple
.ParDo.Unbound An incompleteParDo
transform, with unbound input/output types.ParDo.UnboundMulti<OutputT> An incomplete multi-outputParDo
transform, with unbound input type.Partition<T> Partition
takes aPCollection<T>
and aPartitionFn
, uses thePartitionFn
to split the elements of the inputPCollection
intoN
partitions, and returns aPCollectionList<T>
that bundlesN
PCollection<T>
s containing the split elements.PTransform<InputT extends PInput,OutputT extends POutput> RemoveDuplicates<T> RemoveDuplicates<T>
takes aPCollection<T>
and returns aPCollection<T>
that has all the elements of the input but with duplicate elements removed such that each element is unique within each window.RemoveDuplicates.WithRepresentativeValues<T,IdT> ARemoveDuplicates
PTransform
that uses aSerializableFunction
to obtain a representative value for each input element.Sample PTransform
s for taking samples of the elements in aPCollection
, or samples of the values associated with each key in aPCollection
ofKV
s.Sample.FixedSizedSampleFn<T> CombineFn
that computes a fixed-size sample of a collection of values.Sample.SampleAny<T> APTransform
that takes aPCollection<T>
and a limit, and produces a newPCollection<T>
containing up to limit elements of the inputPCollection
.SimpleFunction<InputT,OutputT> ASerializableFunction
which is not a functional interface.Sum PTransform
s for computing the sum of the elements in aPCollection
, or the sum of the values associated with each key in aPCollection
ofKV
s.Sum.SumDoubleFn ASerializableFunction
that computes the sum of anIterable
ofDouble
s, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.Sum.SumIntegerFn ASerializableFunction
that computes the sum of anIterable
ofInteger
s, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.Sum.SumLongFn ASerializableFunction
that computes the sum of anIterable
ofLong
s, useful as an argument toCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
orCombine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
.Top PTransform
s for finding the largest (or smallest) set of elements in aPCollection
, or the largest (or smallest) set of values associated with each key in aPCollection
ofKV
s.Top.Largest<T extends Comparable<? super T>> ASerializable
Comparator
that that uses the compared elements' natural ordering.Top.Smallest<T extends Comparable<? super T>> Serializable
Comparator
that that uses the reverse of the compared elements' natural ordering.Top.TopCombineFn<T,ComparatorT extends Comparator<T> & Serializable> CombineFn
forTop
transforms that combines a bunch ofT
s into a singlecount
-longList<T>
, usingcompareFn
to choose the largestT
s.Values<V> Values<V>
takes aPCollection
ofKV<K, V>
s and returns aPCollection<V>
of the values.View Transforms for creatingPCollectionViews
fromPCollections
(to read them as side inputs).View.AsIterable<T> Not intended for direct use by pipeline authors; public only so aPipelineRunner
may override its behavior.View.AsList<T> Not intended for direct use by pipeline authors; public only so aPipelineRunner
may override its behavior.View.AsMap<K,V> Not intended for direct use by pipeline authors; public only so aPipelineRunner
may override its behavior.View.AsMultimap<K,V> Not intended for direct use by pipeline authors; public only so aPipelineRunner
may override its behavior.View.AsSingleton<T> Not intended for direct use by pipeline authors; public only so aPipelineRunner
may override its behavior.View.CreatePCollectionView<ElemT,ViewT> Creates a primitivePCollectionView
.WithKeys<K,V> WithKeys<K, V>
takes aPCollection<V>
, and either a constant key of typeK
or a function fromV
toK
, and returns aPCollection<KV<K, V>>
, where each of the values in the inputPCollection
has been paired with either the constant key or a key computed from the value.WithTimestamps<T> APTransform
for assigning timestamps to all the elements of aPCollection
.Write Deprecated -
Enum Summary Enum Description DoFnTester.CloningBehavior Whether or not aDoFnTester
should clone theDoFn
under test. -
Annotation Types Summary Annotation Type Description DoFnWithContext.FinishBundle Annotation for the method to use to prepare an instance for processing a batch of elements.DoFnWithContext.ProcessElement Annotation for the method to use for processing elements.DoFnWithContext.StartBundle Annotation for the method to use to prepare an instance for processing a batch of elements.
Package com.google.cloud.dataflow.sdk.transforms Description
PTransform
s for transforming
data in a pipeline.
A PTransform
is an operation that takes an
InputT
(some subtype of PInput
)
and produces an
OutputT
(some subtype of POutput
).
Common PTransforms include root PTransforms like
TextIO.Read
and
Create
, processing and
conversion operations like ParDo
,
GroupByKey
,
CoGroupByKey
,
Combine
, and
Count
, and outputting
PTransforms like
TextIO.Write
.
New PTransforms can be created by composing existing PTransforms. Most PTransforms in this package are composites, and users can also create composite PTransforms for their own application-specific logic.