Google Cloud Dataflow SDK for Java, version 1.9.1
- Direct Known Subclasses:
@Experimental(value=SOURCE_SINK) public class Write extends ObjectA
PTransformthat writes to a
Sink. A write begins with a sequential global initialization of a sink, followed by a parallel write, and ends with a sequential finalization of the write. The output of a write is
By default, every bundle in the input
PCollectionwill be processed by a
Sink.WriteOperation, so the number of outputs will vary based on runner behavior, though at least 1 output will always be produced. The exact parallelism of the write stage can be controlled using
Write.Bound.withNumShards(int), typically used to control how many files are produced or to globally limit the number of workers connecting to an external service. However, this option can often hurt performance: it adds an additional
GroupByKeyto the pipeline.
Writere-windows the data into the global window, so it is typically not well suited to use in streaming pipelines.
Example usage with runner-controlled sharding:
Example usage with a fixed number of shards:
Nested Class Summary
Nested Classes Modifier and Type Class and Description
Constructors Constructor and Description