Write (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1


Class Write

  • Direct Known Subclasses:

    public class Write
    extends Object
    A PTransform that writes to a Sink. A write begins with a sequential global initialization of a sink, followed by a parallel write, and ends with a sequential finalization of the write. The output of a write is PDone.

    By default, every bundle in the input PCollection will be processed by a Sink.WriteOperation, so the number of outputs will vary based on runner behavior, though at least 1 output will always be produced. The exact parallelism of the write stage can be controlled using Write.Bound.withNumShards(int), typically used to control how many files are produced or to globally limit the number of workers connecting to an external service. However, this option can often hurt performance: it adds an additional GroupByKey to the pipeline.

    Write re-windows the data into the global window, so it is typically not well suited to use in streaming pipelines.

    Example usage with runner-controlled sharding:

     p.apply(Write.to(new MySink(...)));

    Example usage with a fixed number of shards:

     p.apply(Write.to(new MySink(...)).withNumShards(3));

    • Constructor Detail

      • Write

        public Write()
    • Method Detail

      • to

        public static <T> Write.Bound<T> to(Sink<T> sink)
        Creates a Write transform that writes to the given Sink, letting the runner control how many different shards are produced.