TextIO (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class TextIO



  • public class TextIO
    extends Object
    PTransforms for reading and writing text files.

    To read a PCollection from one or more text files, use TextIO.Read. You can instantiate a transform using TextIO.Read.from(String) to specify the path of the file(s) to read from (e.g., a local filename or filename pattern if running locally, or a Google Cloud Storage filename or filename pattern of the form "gs://<bucket>/<filepath>"). You may optionally call TextIO.Read.named(String) to specify the name of the pipeline step.

    By default, TextIO.Read returns a PCollection of Strings, each corresponding to one line of an input UTF-8 text file. To convert directly from the raw bytes (split into lines delimited by '\n', '\r', or '\r\n') to another object of type T, supply a Coder<T> using TextIO.Read.withCoder(Coder).

    See the following examples:

    
     Pipeline p = ...;
    
     // A simple Read of a local file (only runs locally):
     PCollection<String> lines =
         p.apply(TextIO.Read.from("/local/path/to/file.txt"));
    
     // A fully-specified Read from a GCS file (runs locally and via the
     // Google Cloud Dataflow service):
     PCollection<Integer> numbers =
         p.apply(TextIO.Read.named("ReadNumbers")
                            .from("gs://my_bucket/path/to/numbers-*.txt")
                            .withCoder(TextualIntegerCoder.of()));
     

    To write a PCollection to one or more text files, use TextIO.Write, specifying TextIO.Write.to(String) to specify the path of the file to write to (e.g., a local filename or sharded filename pattern if running locally, or a Google Cloud Storage filename or sharded filename pattern of the form "gs://<bucket>/<filepath>"). You can optionally name the resulting transform using TextIO.Write.named(String), and you can use TextIO.Write.withCoder(Coder) to specify the Coder to use to encode the Java values into text lines.

    Any existing files with the same names as generated output files will be overwritten.

    For example:

    
     // A simple Write to a local file (only runs locally):
     PCollection<String> lines = ...;
     lines.apply(TextIO.Write.to("/path/to/file.txt"));
    
     // A fully-specified Write to a sharded GCS file (runs locally and via the
     // Google Cloud Dataflow service):
     PCollection<Integer> numbers = ...;
     numbers.apply(TextIO.Write.named("WriteNumbers")
                               .to("gs://my_bucket/path/to/numbers")
                               .withSuffix(".txt")
                               .withCoder(TextualIntegerCoder.of()));
     

    Permissions

    When run using the DirectPipelineRunner, your pipeline can read and write text files on your local drive and remote text files on Google Cloud Storage that you have access to using your gcloud credentials. When running in the Dataflow service using DataflowPipelineRunner, the pipeline can only read and write files from GCS. For more information about permissions, see the Cloud Dataflow documentation on Security and Permissions.

    • Field Detail

      • DEFAULT_TEXT_CODER

        public static final Coder<String> DEFAULT_TEXT_CODER
        The default coder, which returns each line of the input file as a string.


Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Dataflow