Google Cloud Dataflow SDK for Java, version 1.9.1
Class FileBasedSink<T>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.io.Sink<T>
-
- com.google.cloud.dataflow.sdk.io.FileBasedSink<T>
-
- Type Parameters:
T
- the type of values written to the sink.
- All Implemented Interfaces:
- HasDisplayData, Serializable
- Direct Known Subclasses:
- XmlSink.Bound
public abstract class FileBasedSink<T> extends Sink<T>
AbstractSink
for file-based output. An implementation of FileBasedSink writes file-based output and defines the format of output files (how values are written, headers/footers, MIME type, etc.).At pipeline construction time, the methods of FileBasedSink are called to validate the sink and to create a
Sink.WriteOperation
that manages the process of writing to the sink.The process of writing to file-based sink is as follows:
- An optional subclass-defined initialization,
- a parallel write of bundles to temporary files, and finally,
- these temporary files are renamed with final output filenames.
Supported file systems are those registered with
IOChannelUtils
.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
FileBasedSink.FileBasedWriteOperation<T>
AbstractSink.WriteOperation
that manages the process of writing to aFileBasedSink
.static class
FileBasedSink.FileBasedWriter<T>
AbstractSink.Writer
that writes a bundle to aFileBasedSink
.static class
FileBasedSink.FileResult
Result of a single bundle write.-
Nested classes/interfaces inherited from class com.google.cloud.dataflow.sdk.io.Sink
Sink.WriteOperation<T,WriteT>, Sink.Writer<T,WriteT>
-
-
Field Summary
Fields Modifier and Type Field and Description protected ValueProvider<String>
baseOutputFilename
Base filename for final output files.protected String
extension
The extension to be used for the final output files.protected String
fileNamingTemplate
Naming template for output files.
-
Constructor Summary
Constructors Constructor and Description FileBasedSink(String baseOutputFilename, String extension)
Construct a FileBasedSink with the given base output filename and extension.FileBasedSink(String baseOutputFilename, String extension, String fileNamingTemplate)
Construct a FileBasedSink with the given base output filename, extension, and file naming template.FileBasedSink(ValueProvider<String> baseOutputFilename, String extension, String fileNamingTemplate)
Construct a FileBasedSink with the given base output filename, extension, and file naming template.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method and Description abstract FileBasedSink.FileBasedWriteOperation<T>
createWriteOperation(PipelineOptions options)
Return a subclass ofFileBasedSink.FileBasedWriteOperation
that will manage the write to the sink.String
getBaseOutputFilename()
Returns the base output filename for this file based sink.ValueProvider<String>
getBaseOutputFilenameProvider()
Returns the base output filename for this file based sink.void
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.void
validate(PipelineOptions options)
Perform pipeline-construction-time validation.
-
-
-
Field Detail
-
baseOutputFilename
protected final ValueProvider<String> baseOutputFilename
Base filename for final output files.
-
extension
protected final String extension
The extension to be used for the final output files.
-
fileNamingTemplate
protected final String fileNamingTemplate
Naming template for output files. SeeShardNameTemplate
for a description of possible naming templates. Default isShardNameTemplate.INDEX_OF_MAX
.
-
-
Constructor Detail
-
FileBasedSink
public FileBasedSink(String baseOutputFilename, String extension)
Construct a FileBasedSink with the given base output filename and extension.
-
FileBasedSink
public FileBasedSink(String baseOutputFilename, String extension, String fileNamingTemplate)
Construct a FileBasedSink with the given base output filename, extension, and file naming template.See
ShardNameTemplate
for a description of file naming templates.
-
FileBasedSink
public FileBasedSink(ValueProvider<String> baseOutputFilename, String extension, String fileNamingTemplate)
Construct a FileBasedSink with the given base output filename, extension, and file naming template.See
ShardNameTemplate
for a description of file naming templates.
-
-
Method Detail
-
getBaseOutputFilename
public String getBaseOutputFilename()
Returns the base output filename for this file based sink.
-
getBaseOutputFilenameProvider
public ValueProvider<String> getBaseOutputFilenameProvider()
Returns the base output filename for this file based sink.
-
validate
public void validate(PipelineOptions options)
Perform pipeline-construction-time validation. The default implementation is a no-op. Subclasses should override to ensure the sink is valid and can be written to. It is recommended to usePreconditions
in the implementation of this method.
-
createWriteOperation
public abstract FileBasedSink.FileBasedWriteOperation<T> createWriteOperation(PipelineOptions options)
Return a subclass ofFileBasedSink.FileBasedWriteOperation
that will manage the write to the sink.- Specified by:
createWriteOperation
in classSink<T>
-
populateDisplayData
public void populateDisplayData(DisplayData.Builder builder)
Description copied from class:Sink
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classSink<T>
- Parameters:
builder
- The builder to populate with display data.- See Also:
HasDisplayData
-
-