Write (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class Write

  • Direct Known Subclasses:
    Write


    @Experimental(value=SOURCE_SINK)
    public class Write
    extends Object
    A PTransform that writes to a Sink. A write begins with a sequential global initialization of a sink, followed by a parallel write, and ends with a sequential finalization of the write. The output of a write is PDone.

    By default, every bundle in the input PCollection will be processed by a Sink.WriteOperation, so the number of outputs will vary based on runner behavior, though at least 1 output will always be produced. The exact parallelism of the write stage can be controlled using Write.Bound.withNumShards(int), typically used to control how many files are produced or to globally limit the number of workers connecting to an external service. However, this option can often hurt performance: it adds an additional GroupByKey to the pipeline.

    Write re-windows the data into the global window, so it is typically not well suited to use in streaming pipelines.

    Example usage with runner-controlled sharding:

     p.apply(Write.to(new MySink(...)));

    Example usage with a fixed number of shards:

     p.apply(Write.to(new MySink(...)).withNumShards(3));

    • Constructor Detail

      • Write

        public Write()
    • Method Detail

      • to

        public static <T> Write.Bound<T> to(Sink<T> sink)
        Creates a Write transform that writes to the given Sink, letting the runner control how many different shards are produced.


이 페이지가 도움이 되었나요? 평가를 부탁드립니다.

다음에 대한 의견 보내기...

도움이 필요하시나요? 지원 페이지를 방문하세요.