Google Cloud Dataflow SDK for Java, version 1.9.1
Class Window
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.transforms.windowing.Window
-
public class Window extends Object
Window
logically divides up or groups the elements of aPCollection
into finite windows according to aWindowFn
. The output ofWindow
contains the same elements as input, but they have been logically assigned to windows. The nextGroupByKeys
, including one within composite transforms, will group by the combination of keys and windows.See
GroupByKey
for more information about how grouping with windows works.Windowing
Windowing a
PCollection
divides the elements into windows based on the associated event time for each element. This is especially useful forPCollection
s with unbounded size, since it allows operating on a sub-group of the elements placed into a related window. ForPCollection
s with a bounded size (aka. conventional batch mode), by default, all data is implicitly in a single window, unlessWindow
is applied.For example, a simple form of windowing divides up the data into fixed-width time intervals, using
FixedWindows
. The following example demonstrates how to useWindow
in a pipeline that counts the number of occurrences of strings each minute:PCollection<String> items = ...; PCollection<String> windowed_items = items.apply( Window.<String>into(FixedWindows.of(Duration.standardMinutes(1)))); PCollection<KV<String, Long>> windowed_counts = windowed_items.apply( Count.<String>perElement());
Let (data, timestamp) denote a data element along with its timestamp. Then, if the input to this pipeline consists of {("foo", 15s), ("bar", 30s), ("foo", 45s), ("foo", 1m30s)}, the output will be {(KV("foo", 2), 1m), (KV("bar", 1), 1m), (KV("foo", 1), 2m)}
Several predefined
WindowFn
s are provided:-
FixedWindows
partitions the timestamps into fixed-width intervals. -
SlidingWindows
places data into overlapping fixed-width intervals. -
Sessions
groups data into sessions where each item in a window is separated from the next by no more than a specified gap.
Additionally, custom
WindowFn
s can be created, by creating new subclasses ofWindowFn
.Triggers
Window.Bound.triggering(TriggerBuilder)
allows specifying a trigger to control when (in processing time) results for the given window can be produced. If unspecified, the default behavior is to trigger first when the watermark passes the end of the window, and then trigger again every time there is late arriving data.Elements are added to the current window pane as they arrive. When the root trigger fires, output is produced based on the elements in the current pane.
Depending on the trigger, this can be used both to output partial results early during the processing of the whole window, and to deal with late arriving in batches.
Continuing the earlier example, if we wanted to emit the values that were available when the watermark passed the end of the window, and then output any late arriving elements once-per (actual hour) hour until we have finished processing the next 24-hours of data. (The use of watermark time to stop processing tends to be more robust if the data source is slow for a few days, etc.)
PCollection<String> items = ...; PCollection<String> windowed_items = items.apply( Window.<String>into(FixedWindows.of(Duration.standardMinutes(1))) .triggering( AfterWatermark.pastEndOfWindow() .withLateFirings(AfterProcessingTime .pastFirstElementInPane().plusDelayOf(Duration.standardHours(1)))) .withAllowedLateness(Duration.standardDays(1))); PCollection<KV<String, Long>> windowed_counts = windowed_items.apply( Count.<String>perElement());
On the other hand, if we wanted to get early results every minute of processing time (for which there were new elements in the given window) we could do the following:
PCollection<String> windowed_items = items.apply( Window.<String>into(FixedWindows.of(Duration.standardMinutes(1)) .triggering( .triggering( AfterWatermark.pastEndOfWindow() .withEarlyFirings(AfterProcessingTime .pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(1)))) .withAllowedLateness(Duration.ZERO));
After a
GroupByKey
the trigger is set to a trigger that will preserve the intent of the upstream trigger. SeeTrigger.getContinuationTrigger()
for more information.See
Trigger
for details on the available triggers. -
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
Window.Bound<T>
APTransform
that windows the elements of aPCollection<T>
, into finite windows according to a user-specifiedWindowFn
.static class
Window.ClosingBehavior
Specifies the conditions under which a final pane will be created when a window is permanently closed.static class
Window.Remerge<T>
PTransform
that does not change assigned windows, but will cause windows to be merged again as part of the nextGroupByKey
.static class
Window.Unbound
An incompleteWindow
transform, with unbound input/output type.
-
Constructor Summary
Constructors Constructor and Description Window()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method and Description static <T> Window.Bound<T>
accumulatingFiredPanes()
Returns a newWindow
PTransform
that uses the registered WindowFn and Triggering behavior, and that accumulates elements in a pane after they are triggered.static <T> Window.Bound<T>
discardingFiredPanes()
Returns a newWindow
PTransform
that uses the registered WindowFn and Triggering behavior, and that discards elements in a pane after they are triggered.static <T> Window.Bound<T>
into(WindowFn<? super T,?> fn)
static Window.Unbound
named(String name)
Creates aWindow
PTransform
with the given name.static <T> Window.Remerge<T>
remerge()
Creates aWindow
PTransform
that does not change assigned windows, but will cause windows to be merged again as part of the nextGroupByKey
.static <T> Window.Bound<T>
triggering(TriggerBuilder<?> trigger)
Sets a non-default trigger for thisWindow
PTransform
.static <T> Window.Bound<T>
withAllowedLateness(Duration allowedLateness)
Override the amount of lateness allowed for data elements in the pipeline.
-
-
-
Method Detail
-
named
public static Window.Unbound named(String name)
Creates aWindow
PTransform
with the given name.See the discussion of Naming in
ParDo
for more explanation.The resulting
PTransform
is incomplete, and its input/output type is not yet bound. UseWindow.Unbound.into(com.google.cloud.dataflow.sdk.transforms.windowing.WindowFn<? super T, ?>)
to specify theWindowFn
to use, which will also bind the input/output type of thisPTransform
.
-
into
public static <T> Window.Bound<T> into(WindowFn<? super T,?> fn)
Creates aWindow
PTransform
that uses the givenWindowFn
to window the data.The resulting
PTransform
's types have been bound, with both the input and output being aPCollection<T>
, inferred from the types of the argumentWindowFn
. It is ready to be applied, or further properties can be set on it first.
-
triggering
@Experimental(value=TRIGGER) public static <T> Window.Bound<T> triggering(TriggerBuilder<?> trigger)
Sets a non-default trigger for thisWindow
PTransform
. Elements that are assigned to a specific window will be output when the trigger fires.Must also specify allowed lateness using
withAllowedLateness(org.joda.time.Duration)
and accumulation mode using eitherdiscardingFiredPanes()
oraccumulatingFiredPanes()
.
-
discardingFiredPanes
@Experimental(value=TRIGGER) public static <T> Window.Bound<T> discardingFiredPanes()
Returns a newWindow
PTransform
that uses the registered WindowFn and Triggering behavior, and that discards elements in a pane after they are triggered.Does not modify this transform. The resulting
PTransform
is sufficiently specified to be applied, but more properties can still be specified.
-
accumulatingFiredPanes
@Experimental(value=TRIGGER) public static <T> Window.Bound<T> accumulatingFiredPanes()
Returns a newWindow
PTransform
that uses the registered WindowFn and Triggering behavior, and that accumulates elements in a pane after they are triggered.Does not modify this transform. The resulting
PTransform
is sufficiently specified to be applied, but more properties can still be specified.
-
withAllowedLateness
@Experimental(value=TRIGGER) public static <T> Window.Bound<T> withAllowedLateness(Duration allowedLateness)
Override the amount of lateness allowed for data elements in the pipeline. Like the other properties on thisWindow
operation, this will be applied at the nextGroupByKey
. Any elements that are later than this as decided by the system-maintained watermark will be dropped.This value also determines how long state will be kept around for old windows. Once no elements will be added to a window (because this duration has passed) any state associated with the window will be cleaned up.
-
remerge
public static <T> Window.Remerge<T> remerge()
Creates aWindow
PTransform
that does not change assigned windows, but will cause windows to be merged again as part of the nextGroupByKey
.
-
-