Google Cloud Dataflow SDK for Java, version 1.9.1
Class View
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.View
-
public class View extends Object
Transforms for creatingPCollectionViews
fromPCollections
(to read them as side inputs).While a
PCollection<ElemT>
has many values of typeElemT
per window, aPCollectionView<ViewT>
has a single value of typeViewT
for each window. It can be thought of as a mapping from windows to values of typeViewT
. The transforms here represent ways of converting theElemT
values in a window into aViewT
for that window.When a
ParDo
tranform is processing a main input element in a windoww
and aPCollectionView
is read viaDoFn.ProcessContext.sideInput(com.google.cloud.dataflow.sdk.values.PCollectionView<T>)
, the value of the view forw
is returned.The SDK supports viewing a
PCollection
, per window, as a single value, aList
, anIterable
, aMap
, or a multimap (iterable-valuedMap
).For a
PCollection
that contains a single value of typeT
per window, such as the output ofCombine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
, useasSingleton()
to prepare it for use as a side input:PCollectionView<T> output = someOtherPCollection .apply(Combine.globally(...)) .apply(View.<T>asSingleton());
For a small
PCollection
with windows that can fit entirely in memory, useasList()
to prepare it for use as aList
. When read as a side input, the entire list for a window will be cached in memory.PCollectionView<List<T>> output = smallPCollection.apply(View.<T>asList());
If a
PCollection
ofKV<K, V>
is known to have a single value per window for each key, then useasMap()
to view it as aMap<K, V>
:PCollectionView<Map<K, V> output = somePCollection.apply(View.<K, V>asMap());
Otherwise, to access a
PCollection
ofKV<K, V>
as aMap<K, Iterable<V>>
side input, useasMultimap()
:PCollectionView<Map<K, Iterable<V>> output = somePCollection.apply(View.<K, Iterable<V>>asMap());
To iterate over an entire window of a
PCollection
via side input, useasIterable()
:PCollectionView<Iterable<T>> output = somePCollection.apply(View.<T>asIterable());
Both
asMultimap()
andasMap()
are useful for implementing lookup based "joins" with the main input, when the side input is small enough to fit into memory.For example, if you represent a page on a website via some
Page
object and have some typeUrlVisits
logging that a URL was visited, you could convert these to more fully structuredPageVisit
objects using a side input, something like the following:PCollection<Page> pages = ... // pages fit into memory PCollection<UrlVisit> urlVisits = ... // very large collection final PCollectionView<Map<URL, Page>> urlToPage = pages .apply(WithKeys.of( ... )) // extract the URL from the page .apply(View.<URL, Page>asMap()); PCollection PageVisits = urlVisits .apply(ParDo.withSideInputs(urlToPage) .of(new DoFn<UrlVisit, PageVisit>() { @Override void processElement(ProcessContext context) { UrlVisit urlVisit = context.element(); Map<URL, Page> urlToPageMap = context.sideInput(urlToPage); Page page = urlToPageMap.get(urlVisit.getUrl()); c.output(new PageVisit(page, urlVisit.getVisitData())); } }));
See
ParDo.withSideInputs(com.google.cloud.dataflow.sdk.values.PCollectionView<?>...)
for details on how to access this variable inside aParDo
over anotherPCollection
.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
View.AsIterable<T>
Not intended for direct use by pipeline authors; public only so aPipelineRunner
may override its behavior.static class
View.AsList<T>
Not intended for direct use by pipeline authors; public only so aPipelineRunner
may override its behavior.static class
View.AsMap<K,V>
Not intended for direct use by pipeline authors; public only so aPipelineRunner
may override its behavior.static class
View.AsMultimap<K,V>
Not intended for direct use by pipeline authors; public only so aPipelineRunner
may override its behavior.static class
View.AsSingleton<T>
Not intended for direct use by pipeline authors; public only so aPipelineRunner
may override its behavior.static class
View.CreatePCollectionView<ElemT,ViewT>
Creates a primitivePCollectionView
.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method and Description static <T> View.AsIterable<T>
asIterable()
Returns aView.AsIterable
transform that takes aPCollection
as input and produces aPCollectionView
mapping each window to anIterable
of the values in that window.static <T> View.AsList<T>
asList()
Returns aView.AsList
transform that takes aPCollection
and returns aPCollectionView
mapping each window to aList
containing all of the elements in the window.static <K,V> View.AsMap<K,V>
asMap()
Returns aView.AsMap
transform that takes aPCollection<KV<K, V>>
as input and produces aPCollectionView
mapping each window to aMap<K, V>
.static <K,V> View.AsMultimap<K,V>
asMultimap()
Returns aView.AsMultimap
transform that takes aPCollection<KV<K, V>>
as input and produces aPCollectionView
mapping each window to its contents as aMap<K, Iterable<V>>
for use as a side input.static <T> View.AsSingleton<T>
asSingleton()
Returns aView.AsSingleton
transform that takes aPCollection
with a single value per window as input and produces aPCollectionView
that returns the value in the main input window when read as a side input.
-
-
-
Method Detail
-
asSingleton
public static <T> View.AsSingleton<T> asSingleton()
Returns aView.AsSingleton
transform that takes aPCollection
with a single value per window as input and produces aPCollectionView
that returns the value in the main input window when read as a side input.PCollection<InputT> input = ... CombineFn<InputT, OutputT> yourCombineFn = ... PCollectionView<OutputT> output = input .apply(Combine.globally(yourCombineFn)) .apply(View.<OutputT>asSingleton());
If the input
PCollection
is empty, throwsNoSuchElementException
in the consumingDoFn
.If the input
PCollection
contains more than one element, throwsIllegalArgumentException
in the consumingDoFn
.
-
asList
public static <T> View.AsList<T> asList()
Returns aView.AsList
transform that takes aPCollection
and returns aPCollectionView
mapping each window to aList
containing all of the elements in the window.The resulting list is required to fit in memory.
-
asIterable
public static <T> View.AsIterable<T> asIterable()
Returns aView.AsIterable
transform that takes aPCollection
as input and produces aPCollectionView
mapping each window to anIterable
of the values in that window.The values of the
Iterable
for a window are not required to fit in memory, but they may also not be effectively cached. If it is known that every window fits in memory, and stronger caching is desired, useasList()
.
-
asMap
public static <K,V> View.AsMap<K,V> asMap()
Returns aView.AsMap
transform that takes aPCollection<KV<K, V>>
as input and produces aPCollectionView
mapping each window to aMap<K, V>
. It is required that each key of the input be associated with a single value, per window. If this is not the case, precede this view withCombine.perKey
, as in the example below, or alternatively useasMultimap()
.PCollection<KV<K, V>> input = ... CombineFn<V, OutputT> yourCombineFn = ... PCollectionView<Map<K, OutputT>> output = input .apply(Combine.perKey(yourCombineFn.<K>asKeyedFn())) .apply(View.<K, OutputT>asMap());
Currently, the resulting map is required to fit into memory.
-
asMultimap
public static <K,V> View.AsMultimap<K,V> asMultimap()
Returns aView.AsMultimap
transform that takes aPCollection<KV<K, V>>
as input and produces aPCollectionView
mapping each window to its contents as aMap<K, Iterable<V>>
for use as a side input. In contrast toasMap()
, it is not required that the keys in the input collection be unique.PCollection<KV<K, V>> input = ... // maybe more than one occurrence of a some keys PCollectionView<Map<K, V>> output = input.apply(View.<K, V>asMultimap());
Currently, the resulting map is required to fit into memory.
-
-