TextIO.Read (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1


Class TextIO.Read

  • Enclosing class:

    public static class TextIO.Read
    extends Object
    A PTransform that reads from a text file (or multiple text files matching a pattern) and returns a PCollection containing the decoding of each of the lines of the text file(s). The default decoding just returns each line as a String, but you may call withCoder(Coder) to change the return type.
    • Method Detail

      • from

        public static TextIO.Read.Bound<String> from(String filepattern)
        Returns a transform for reading text files that reads from the file(s) with the given filename or filename pattern. This can be a local path (if running locally), or a Google Cloud Storage filename or filename pattern of the form "gs://<bucket>/<filepath>" (if running locally or via the Google Cloud Dataflow service). Standard Java Filesystem glob patterns ("*", "?", "[..]") are supported.
      • withCoder

        public static <T> TextIO.Read.Bound<T> withCoder(Coder<T> coder)
        Returns a transform for reading text files that uses the given Coder<T> to decode each of the lines of the file into a value of type T.

        By default, uses StringUtf8Coder, which just returns the text lines as Java strings.

        Type Parameters:
        T - the type of the decoded elements, and the elements of the resulting PCollection
      • withoutValidation

        public static TextIO.Read.Bound<String> withoutValidation()
        Returns a transform for reading text files that has GCS path validation on pipeline creation disabled.

        This can be useful in the case where the GCS input does not exist at the pipeline creation time, but is expected to be available at execution time.

      • withCompressionType

        public static TextIO.Read.Bound<String> withCompressionType(TextIO.CompressionType compressionType)
        Returns a transform for reading text files that decompresses all input files using the specified compression type.

        If no compression type is specified, the default is TextIO.CompressionType.AUTO. In this mode, the compression type of the file is determined by its extension (e.g., *.gz is gzipped, *.bz2 is bzipped, and all other extensions are uncompressed).