TextIO.Write (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class TextIO.Write

  • Enclosing class:
    TextIO


    public static class TextIO.Write
    extends Object
    A PTransform that writes a PCollection to text file (or multiple text files matching a sharding pattern), with each element of the input collection encoded into its own line.
    • Constructor Detail

      • Write

        public Write()
    • Method Detail

      • to

        public static TextIO.Write.Bound<String> to(String prefix)
        Returns a transform for writing to text files that writes to the file(s) with the given prefix. This can be a local filename (if running locally), or a Google Cloud Storage filename of the form "gs://<bucket>/<filepath>" (if running locally or via the Google Cloud Dataflow service).

        The files written will begin with this prefix, followed by a shard identifier (see TextIO.Write.Bound.withNumShards(int), and end in a common extension, if given by TextIO.Write.Bound.withSuffix(String).

      • withSuffix

        public static TextIO.Write.Bound<String> withSuffix(String nameExtension)
        Returns a transform for writing to text files that appends the specified suffix to the created files.
      • withNumShards

        public static TextIO.Write.Bound<String> withNumShards(int numShards)
        Returns a transform for writing to text files that uses the provided shard count.

        Constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.

        Parameters:
        numShards - the number of shards to use, or 0 to let the system decide.
      • withShardNameTemplate

        public static TextIO.Write.Bound<String> withShardNameTemplate(String shardTemplate)
        Returns a transform for writing to text files that uses the given shard name template.

        See ShardNameTemplate for a description of shard templates.

      • withoutSharding

        public static TextIO.Write.Bound<String> withoutSharding()
        Returns a transform for writing to text files that forces a single file as output.
      • withCoder

        public static <T> TextIO.Write.Bound<T> withCoder(Coder<T> coder)
        Returns a transform for writing to text files that uses the given Coder to encode each of the elements of the input PCollection into an output text line.

        By default, uses StringUtf8Coder, which writes input Java strings directly as output lines.

        Type Parameters:
        T - the type of the elements of the input PCollection
      • withoutValidation

        public static TextIO.Write.Bound<String> withoutValidation()
        Returns a transform for writing to text files that has GCS path validation on pipeline creation disabled.

        This can be useful in the case where the GCS output location does not exist at the pipeline creation time, but is expected to be available at execution time.

      • withHeader

        public static TextIO.Write.Bound<String> withHeader(@Nullable
                                                            String header)
        Returns a transform for writing to text files that adds a header string to the files it writes. Note that a newline character will be added after the header.

        A null value will clear any previously configured header.

        Parameters:
        header - the string to be added as file header
      • withFooter

        public static TextIO.Write.Bound<String> withFooter(@Nullable
                                                            String footer)
        Returns a transform for writing to text files that adds a footer string to the files it writes. Note that a newline character will be added after the header.

        A null value will clear any previously configured footer.

        Parameters:
        footer - the string to be added as file footer


Send feedback about...

Cloud Dataflow