Google Cloud Dataflow SDK for Java, version 1.9.1
Class Sink.WriteOperation<T,WriteT>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.io.Sink.WriteOperation<T,WriteT>
-
- Type Parameters:
T
- The type of objects to writeWriteT
- The result of a per-bundle write
- All Implemented Interfaces:
- Serializable
- Direct Known Subclasses:
- FileBasedSink.FileBasedWriteOperation
public abstract static class Sink.WriteOperation<T,WriteT> extends Object implements Serializable
ASink.WriteOperation
defines the process of a parallel write of objects to a Sink.The
WriteOperation
defines how to perform initialization and finalization of a parallel write to a sink as well as how to create aSink.Writer
object that can write a bundle to the sink.Since operations in Dataflow may be run multiple times for redundancy or fault-tolerance, the initialization and finalization defined by a WriteOperation must be idempotent.
WriteOperation
s may be mutable; aWriteOperation
is serialized after the call toinitialize
method and deserialized before calls tocreateWriter
andfinalized
. However, it is not reserialized aftercreateWriter
, socreateWriter
should not mutate the state of theWriteOperation
.See
Sink
for more detailed documentation about the process of writing to a Sink.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor and Description WriteOperation()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method and Description abstract Sink.Writer<T,WriteT>
createWriter(PipelineOptions options)
Creates a newSink.Writer
to write a bundle of the input to the sink.abstract void
finalize(Iterable<WriteT> writerResults, PipelineOptions options)
Given an Iterable of results from bundle writes, performs finalization after writing and closes the sink.abstract Sink<T>
getSink()
Returns the Sink that this write operation writes to.Coder<WriteT>
getWriterResultCoder()
Returns a coder for the writer result type.abstract void
initialize(PipelineOptions options)
Performs initialization before writing to the sink.
-
-
-
Method Detail
-
initialize
public abstract void initialize(PipelineOptions options) throws Exception
Performs initialization before writing to the sink. Called before writing begins.- Throws:
Exception
-
finalize
public abstract void finalize(Iterable<WriteT> writerResults, PipelineOptions options) throws Exception
Given an Iterable of results from bundle writes, performs finalization after writing and closes the sink. Called after all bundle writes are complete.The results that are passed to finalize are those returned by bundles that completed successfully. Although bundles may have been run multiple times (for fault-tolerance), only one writer result will be passed to finalize for each bundle. An implementation of finalize should perform clean up of any failed and successfully retried bundles. Note that these failed bundles will not have their writer result passed to finalize, so finalize should be capable of locating any temporary/partial output written by failed bundles.
A best practice is to make finalize atomic. If this is impossible given the semantics of the sink, finalize should be idempotent, as it may be called multiple times in the case of failure/retry or for redundancy.
Note that the iteration order of the writer results is not guaranteed to be consistent if finalize is called multiple times.
- Parameters:
writerResults
- an Iterable of results from successful bundle writes.- Throws:
Exception
-
createWriter
public abstract Sink.Writer<T,WriteT> createWriter(PipelineOptions options) throws Exception
Creates a newSink.Writer
to write a bundle of the input to the sink.The bundle id that the writer will use to uniquely identify its output will be passed to
Sink.Writer.open(java.lang.String)
.Must not mutate the state of the WriteOperation.
- Throws:
Exception
-
-