Google Cloud Dataflow SDK for Java, version 1.9.1
public class IntraBundleParallelization extends ObjectProvides multi-threading of
DoFns, using threaded execution to process multiple elements concurrently within a bundle.
Note, that each Dataflow worker will already process multiple bundles concurrently and usage of this class is meant only for cases where processing elements from within a bundle is limited by blocking calls.
CPU intensive or IO intensive tasks are in general a poor fit for parallelization. This is because a limited resource that is already maximally utilized does not benefit from sub-division of work. The parallelization will increase the amount of time to process each element yet the throughput for processing will remain relatively the same. For example, if the local disk (an IO resource) has a maximum write rate of 10 MiB/s, and processing each element requires to write 20 MiBs to disk, then processing one element to disk will take 2 seconds. Yet processing 3 elements concurrently (each getting an equal share of the maximum write rate) will take at least 6 seconds to complete (there is additional overhead in the extra parallelization).
To parallelize a
DoFnto 10 threads:
PCollection<T> data = ...; data.apply( IntraBundleParallelization.of(new MyDoFn()) .withMaxParallelism(10)));
An uncaught exception from the wrapped
DoFnwill result in the exception being rethrown in later calls to
IntraBundleParallelization.MultiThreadedIntraBundleProcessingDoFn.processElement(com.google.cloud.dataflow.sdk.transforms.DoFn<InputT, OutputT>.ProcessContext)or a call to
Nested Class Summary
Nested Classes Modifier and Type Class and Description
PTransformthat, when applied to a
PCollection<InputT>, invokes a user-specified
DoFn<InputT, OutputT>on all its elements, with all its outputs collected into an output
IntraBundleParallelizationtransform, with unbound input/output types.
Constructors Constructor and Description
All Methods Static Methods Concrete Methods Modifier and Type Method and Description