FileBasedSink.FileBasedWriteOperation (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class FileBasedSink.FileBasedWriteOperation<T>

  • Type Parameters:
    T - the type of values written to the sink.
    All Implemented Interfaces:
    Serializable
    Direct Known Subclasses:
    XmlSink.XmlWriteOperation
    Enclosing class:
    FileBasedSink<T>


    public abstract static class FileBasedSink.FileBasedWriteOperation<T>
    extends Sink.WriteOperation<T,FileBasedSink.FileResult>
    Abstract Sink.WriteOperation that manages the process of writing to a FileBasedSink.

    The primary responsibilities of the FileBasedWriteOperation is the management of output files. During a write, FileBasedSink.FileBasedWriters write bundles to temporary file locations. After the bundles have been written,

    1. finalize(java.lang.Iterable<com.google.cloud.dataflow.sdk.io.FileBasedSink.FileResult>, com.google.cloud.dataflow.sdk.options.PipelineOptions) is given a list of the temporary files containing the output bundles.
    2. During finalize, these temporary files are copied to final output locations and named according to a file naming template.
    3. Finally, any temporary files that were created during the write are removed.

    Subclass implementations of FileBasedWriteOperation must implement createWriter(com.google.cloud.dataflow.sdk.options.PipelineOptions) to return a concrete FileBasedSinkWriter.

    Temporary and Output File Naming:

    During the write, bundles are written to temporary files using the baseTemporaryFilename that can be provided via the constructor of FileBasedWriteOperation. These temporary files will be named {baseTemporaryFilename}-temp-{bundleId}, where bundleId is the unique id of the bundle. For example, if baseTemporaryFilename is "gs://my-bucket/my_temp_output", the output for a bundle with bundle id 15723 will be "gs://my-bucket/my_temp_output-temp-15723".

    Final output files are written to baseOutputFilename with the format {baseOutputFilename}-0000i-of-0000n.{extension} where n is the total number of bundles written and extension is the file extension. Both baseOutputFilename and extension are required constructor arguments.

    Subclass implementations can change the file naming template by supplying a value for FileBasedSink.fileNamingTemplate.

    Temporary Bundle File Handling:

    temporaryFileRetention controls the behavior for managing temporary files. By default, temporary files will be removed. Subclasses can provide a different value to the constructor.

    Note that in the case of permanent failure of a bundle's write, no clean up of temporary files will occur.

    If there are no elements in the PCollection being written, no output will be generated.

    See Also:
    Serialized Form


Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Dataflow