Sink.WriteOperation (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

Class Sink.WriteOperation<T,WriteT>

  • java.lang.Object
  • Type Parameters:
    T - The type of objects to write
    WriteT - The result of a per-bundle write
    All Implemented Interfaces:
    Direct Known Subclasses:
    Enclosing class:

    public abstract static class Sink.WriteOperation<T,WriteT>
    extends Object
    implements Serializable
    A Sink.WriteOperation defines the process of a parallel write of objects to a Sink.

    The WriteOperation defines how to perform initialization and finalization of a parallel write to a sink as well as how to create a Sink.Writer object that can write a bundle to the sink.

    Since operations in Dataflow may be run multiple times for redundancy or fault-tolerance, the initialization and finalization defined by a WriteOperation must be idempotent.

    WriteOperations may be mutable; a WriteOperation is serialized after the call to initialize method and deserialized before calls to createWriter and finalized. However, it is not reserialized after createWriter, so createWriter should not mutate the state of the WriteOperation.

    See Sink for more detailed documentation about the process of writing to a Sink.

    See Also:
    Serialized Form
    • Constructor Detail

      • WriteOperation

        public WriteOperation()
    • Method Detail

      • initialize

        public abstract void initialize(PipelineOptions options)
                                 throws Exception
        Performs initialization before writing to the sink. Called before writing begins.
      • finalize

        public abstract void finalize(Iterable<WriteT> writerResults,
                                      PipelineOptions options)
                               throws Exception
        Given an Iterable of results from bundle writes, performs finalization after writing and closes the sink. Called after all bundle writes are complete.

        The results that are passed to finalize are those returned by bundles that completed successfully. Although bundles may have been run multiple times (for fault-tolerance), only one writer result will be passed to finalize for each bundle. An implementation of finalize should perform clean up of any failed and successfully retried bundles. Note that these failed bundles will not have their writer result passed to finalize, so finalize should be capable of locating any temporary/partial output written by failed bundles.

        A best practice is to make finalize atomic. If this is impossible given the semantics of the sink, finalize should be idempotent, as it may be called multiple times in the case of failure/retry or for redundancy.

        Note that the iteration order of the writer results is not guaranteed to be consistent if finalize is called multiple times.

        writerResults - an Iterable of results from successful bundle writes.
      • getSink

        public abstract Sink<T> getSink()
        Returns the Sink that this write operation writes to.
      • getWriterResultCoder

        public Coder<WriteT> getWriterResultCoder()
        Returns a coder for the writer result type.