Google Cloud Dataflow SDK for Java, version 1.9.1
Class TextIO
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.io.TextIO
-
public class TextIO extends Object
PTransform
s for reading and writing text files.To read a
PCollection
from one or more text files, useTextIO.Read
. You can instantiate a transform usingTextIO.Read.from(String)
to specify the path of the file(s) to read from (e.g., a local filename or filename pattern if running locally, or a Google Cloud Storage filename or filename pattern of the form"gs://<bucket>/<filepath>"
). You may optionally callTextIO.Read.named(String)
to specify the name of the pipeline step.By default,
TextIO.Read
returns aPCollection
ofStrings
, each corresponding to one line of an input UTF-8 text file. To convert directly from the raw bytes (split into lines delimited by '\n', '\r', or '\r\n') to another object of typeT
, supply aCoder<T>
usingTextIO.Read.withCoder(Coder)
.See the following examples:
Pipeline p = ...; // A simple Read of a local file (only runs locally): PCollection<String> lines = p.apply(TextIO.Read.from("/local/path/to/file.txt")); // A fully-specified Read from a GCS file (runs locally and via the // Google Cloud Dataflow service): PCollection<Integer> numbers = p.apply(TextIO.Read.named("ReadNumbers") .from("gs://my_bucket/path/to/numbers-*.txt") .withCoder(TextualIntegerCoder.of()));
To write a
PCollection
to one or more text files, useTextIO.Write
, specifyingTextIO.Write.to(String)
to specify the path of the file to write to (e.g., a local filename or sharded filename pattern if running locally, or a Google Cloud Storage filename or sharded filename pattern of the form"gs://<bucket>/<filepath>"
). You can optionally name the resulting transform usingTextIO.Write.named(String)
, and you can useTextIO.Write.withCoder(Coder)
to specify the Coder to use to encode the Java values into text lines.Any existing files with the same names as generated output files will be overwritten.
For example:
// A simple Write to a local file (only runs locally): PCollection<String> lines = ...; lines.apply(TextIO.Write.to("/path/to/file.txt")); // A fully-specified Write to a sharded GCS file (runs locally and via the // Google Cloud Dataflow service): PCollection<Integer> numbers = ...; numbers.apply(TextIO.Write.named("WriteNumbers") .to("gs://my_bucket/path/to/numbers") .withSuffix(".txt") .withCoder(TextualIntegerCoder.of()));
Permissions
When run using the
DirectPipelineRunner
, your pipeline can read and write text files on your local drive and remote text files on Google Cloud Storage that you have access to using yourgcloud
credentials. When running in the Dataflow service usingDataflowPipelineRunner
, the pipeline can only read and write files from GCS. For more information about permissions, see the Cloud Dataflow documentation on Security and Permissions.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
TextIO.CompressionType
Possible text file compression types.static class
TextIO.Read
APTransform
that reads from a text file (or multiple text files matching a pattern) and returns aPCollection
containing the decoding of each of the lines of the text file(s).static class
TextIO.Write
APTransform
that writes aPCollection
to text file (or multiple text files matching a sharding pattern), with each element of the input collection encoded into its own line.
-
Field Summary
Fields Modifier and Type Field and Description static Coder<String>
DEFAULT_TEXT_CODER
The default coder, which returns each line of the input file as a string.
-