AfterWatermark (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

Class AfterWatermark<W extends BoundedWindow>

  • java.lang.Object
  • Type Parameters:
    W - BoundedWindow subclass used to represent the windows used.

    public class AfterWatermark<W extends BoundedWindow>
    extends Object
    AfterWatermark triggers fire based on progress of the system watermark. This time is a lower-bound, sometimes heuristically established, on event times that have been fully processed by the pipeline.

    For sources that provide non-heuristic watermarks (e.g. PubsubIO when using arrival times as event times), the watermark is a strict guarantee that no data with an event time earlier than that watermark will ever be observed in the pipeline. In this case, it's safe to assume that any pane triggered by an AfterWatermark trigger with a reference point at or beyond the end of the window will be the last pane ever for that window.

    For sources that provide heuristic watermarks (e.g. PubsubIO when using user-supplied event times), the watermark itself becomes an estimate that no data with an event time earlier than that watermark (i.e. "late data") will ever be observed in the pipeline. These heuristics can often be quite accurate, but the chance of seeing late data for any given window is non-zero. Thus, if absolute correctness over time is important to your use case, you may want to consider using a trigger that accounts for late data. The default trigger, Repeatedly.forever(AfterWatermark.pastEndOfWindow()), which fires once when the watermark passes the end of the window and then immediately therafter when any late data arrives, is one such example.

    The watermark is the clock that defines TimeDomain.EVENT_TIME.

    Additionaly firings before or after the watermark can be requested by calling AfterWatermark.pastEndOfWindow.withEarlyFirings(OnceTrigger) or AfterWatermark.pastEndOfWindow.withEarlyFirings(OnceTrigger).