ShardNameTemplate (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class ShardNameTemplate



  • public class ShardNameTemplate
    extends Object
    Standard shard naming templates.

    Shard naming templates are strings that may contain placeholders for the shard number and shard count. When constructing a filename for a particular shard number, the upper-case letters 'S' and 'N' are replaced with the 0-padded shard number and shard count respectively.

    Left-padding of the numbers enables lexicographical sorting of the resulting filenames. If the shard number or count are too large for the space provided in the template, then the result may no longer sort lexicographically. For example, a shard template of "S-of-N", for 200 shards, will result in outputs named "0-of-200", ... '10-of-200', '100-of-200", etc.

    Shard numbers start with 0, so the last shard number is the shard count minus one. For example, the template "-SSSSS-of-NNNNN" will be instantiated as "-00000-of-01000" for the first shard (shard 0) of a 1000-way sharded output.

    A shard name template is typically provided along with a name prefix and suffix, which allows constructing complex paths that have embedded shard information. For example, outputs in the form "gs://bucket/path-01-of-99.txt" could be constructed by providing the individual components:

    
       pipeline.apply(
           TextIO.Write.to("gs://bucket/path")
                       .withShardNameTemplate("-SS-of-NN")
                       .withSuffix(".txt"))
     

    In the example above, you could make parts of the output configurable by users without the user having to specify all components of the output name.

    If a shard name template does not contain any repeating 'S', then the output shard count must be 1, as otherwise the same filename would be generated for multiple shards.

    • Field Detail

      • INDEX_OF_MAX

        public static final String INDEX_OF_MAX
        Shard name containing the index and max.

        Eg: [prefix]-00000-of-00100[suffix] and [prefix]-00001-of-00100[suffix]

        See Also:
        Constant Field Values
      • DIRECTORY_CONTAINER

        public static final String DIRECTORY_CONTAINER
        Shard is a file within a directory.

        Eg: [prefix]/part-00000[suffix] and [prefix]/part-00001[suffix]

        See Also:
        Constant Field Values
    • Constructor Detail

      • ShardNameTemplate

        public ShardNameTemplate()


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow
Need help? Visit our support page.