Google Cloud Dataflow SDK for Java, version 1.9.1
Class Write
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.io.Write
-
- Direct Known Subclasses:
- Write
@Experimental(value=SOURCE_SINK) public class Write extends Object
APTransform
that writes to aSink
. A write begins with a sequential global initialization of a sink, followed by a parallel write, and ends with a sequential finalization of the write. The output of a write isPDone
.By default, every bundle in the input
PCollection
will be processed by aSink.WriteOperation
, so the number of outputs will vary based on runner behavior, though at least 1 output will always be produced. The exact parallelism of the write stage can be controlled usingWrite.Bound.withNumShards(int)
, typically used to control how many files are produced or to globally limit the number of workers connecting to an external service. However, this option can often hurt performance: it adds an additionalGroupByKey
to the pipeline.Write
re-windows the data into the global window, so it is typically not well suited to use in streaming pipelines.Example usage with runner-controlled sharding:
p.apply(Write.to(new MySink(...)));
Example usage with a fixed number of shards:
p.apply(Write.to(new MySink(...)).withNumShards(3));
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
Write.Bound<T>
APTransform
that writes to aSink
.
-
Constructor Summary
Constructors Constructor and Description Write()
-
-
-
Method Detail
-
to
public static <T> Write.Bound<T> to(Sink<T> sink)
-
-