FileBasedSink (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class FileBasedSink<T>

  • Type Parameters:
    T - the type of values written to the sink.
    All Implemented Interfaces:
    HasDisplayData, Serializable
    Direct Known Subclasses:
    XmlSink.Bound


    public abstract class FileBasedSink<T>
    extends Sink<T>
    Abstract Sink for file-based output. An implementation of FileBasedSink writes file-based output and defines the format of output files (how values are written, headers/footers, MIME type, etc.).

    At pipeline construction time, the methods of FileBasedSink are called to validate the sink and to create a Sink.WriteOperation that manages the process of writing to the sink.

    The process of writing to file-based sink is as follows:

    1. An optional subclass-defined initialization,
    2. a parallel write of bundles to temporary files, and finally,
    3. these temporary files are renamed with final output filenames.

    Supported file systems are those registered with IOChannelUtils.

    See Also:
    Serialized Form
    • Field Detail

      • baseOutputFilename

        protected final ValueProvider<String> baseOutputFilename
        Base filename for final output files.
      • extension

        protected final String extension
        The extension to be used for the final output files.
    • Constructor Detail

      • FileBasedSink

        public FileBasedSink(String baseOutputFilename,
                             String extension)
        Construct a FileBasedSink with the given base output filename and extension.
      • FileBasedSink

        public FileBasedSink(String baseOutputFilename,
                             String extension,
                             String fileNamingTemplate)
        Construct a FileBasedSink with the given base output filename, extension, and file naming template.

        See ShardNameTemplate for a description of file naming templates.

      • FileBasedSink

        public FileBasedSink(ValueProvider<String> baseOutputFilename,
                             String extension,
                             String fileNamingTemplate)
        Construct a FileBasedSink with the given base output filename, extension, and file naming template.

        See ShardNameTemplate for a description of file naming templates.

    • Method Detail

      • getBaseOutputFilename

        public String getBaseOutputFilename()
        Returns the base output filename for this file based sink.
      • getBaseOutputFilenameProvider

        public ValueProvider<String> getBaseOutputFilenameProvider()
        Returns the base output filename for this file based sink.
      • validate

        public void validate(PipelineOptions options)
        Perform pipeline-construction-time validation. The default implementation is a no-op. Subclasses should override to ensure the sink is valid and can be written to. It is recommended to use Preconditions in the implementation of this method.
        Specified by:
        validate in class Sink<T>
      • populateDisplayData

        public void populateDisplayData(DisplayData.Builder builder)
        Description copied from class: Sink
        Register display data for the given transform or component.

        populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

        By default, does not register any display data. Implementors may override this method to provide their own display data.

        Specified by:
        populateDisplayData in interface HasDisplayData
        Overrides:
        populateDisplayData in class Sink<T>
        Parameters:
        builder - The builder to populate with display data.
        See Also:
        HasDisplayData


Send feedback about...

Cloud Dataflow