Google Cloud Dataflow SDK for Java, version 1.9.1
Class TextIO.Read
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.io.TextIO.Read
-
- Enclosing class:
- TextIO
public static class TextIO.Read extends Object
APTransform
that reads from a text file (or multiple text files matching a pattern) and returns aPCollection
containing the decoding of each of the lines of the text file(s). The default decoding just returns each line as aString
, but you may callwithCoder(Coder)
to change the return type.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
TextIO.Read.Bound<T>
APTransform
that reads from one or more text files and returns a boundedPCollection
containing one element for each line of the input files.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method and Description static TextIO.Read.Bound<String>
from(String filepattern)
Returns a transform for reading text files that reads from the file(s) with the given filename or filename pattern.static TextIO.Read.Bound<String>
from(ValueProvider<String> filepattern)
Same asfrom(filepattern)
, but accepting aValueProvider
.static TextIO.Read.Bound<String>
named(String name)
Returns a transform for reading text files that uses the given step name.static <T> TextIO.Read.Bound<T>
withCoder(Coder<T> coder)
Returns a transform for reading text files that uses the givenCoder<T>
to decode each of the lines of the file into a value of typeT
.static TextIO.Read.Bound<String>
withCompressionType(TextIO.CompressionType compressionType)
Returns a transform for reading text files that decompresses all input files using the specified compression type.static TextIO.Read.Bound<String>
withoutValidation()
Returns a transform for reading text files that has GCS path validation on pipeline creation disabled.
-
-
-
Method Detail
-
named
public static TextIO.Read.Bound<String> named(String name)
Returns a transform for reading text files that uses the given step name.
-
from
public static TextIO.Read.Bound<String> from(String filepattern)
Returns a transform for reading text files that reads from the file(s) with the given filename or filename pattern. This can be a local path (if running locally), or a Google Cloud Storage filename or filename pattern of the form"gs://<bucket>/<filepath>"
(if running locally or via the Google Cloud Dataflow service). Standard Java Filesystem glob patterns ("*", "?", "[..]") are supported.
-
from
public static TextIO.Read.Bound<String> from(ValueProvider<String> filepattern)
Same asfrom(filepattern)
, but accepting aValueProvider
.
-
withCoder
public static <T> TextIO.Read.Bound<T> withCoder(Coder<T> coder)
Returns a transform for reading text files that uses the givenCoder<T>
to decode each of the lines of the file into a value of typeT
.By default, uses
StringUtf8Coder
, which just returns the text lines as Java strings.- Type Parameters:
T
- the type of the decoded elements, and the elements of the resulting PCollection
-
withoutValidation
public static TextIO.Read.Bound<String> withoutValidation()
Returns a transform for reading text files that has GCS path validation on pipeline creation disabled.This can be useful in the case where the GCS input does not exist at the pipeline creation time, but is expected to be available at execution time.
-
withCompressionType
public static TextIO.Read.Bound<String> withCompressionType(TextIO.CompressionType compressionType)
Returns a transform for reading text files that decompresses all input files using the specified compression type.If no compression type is specified, the default is
TextIO.CompressionType.AUTO
. In this mode, the compression type of the file is determined by its extension (e.g.,*.gz
is gzipped,*.bz2
is bzipped, and all other extensions are uncompressed).
-
-