DirectPipelineRunner.TestCombineDoFn (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

Class DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT>

  • All Implemented Interfaces:
    HasDisplayData, Serializable
    Enclosing class:

    public static class DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT>
    extends DoFn<KV<K,Iterable<InputT>>,KV<K,OutputT>>
    The implementation may split the Combine.KeyedCombineFn into ADD, MERGE and EXTRACT phases ( see In order to emulate this for the DirectPipelineRunner and provide an experience closer to the service, go through heavy serializability checks for the equivalent of the results of the ADD phase, but after the GroupByKey shuffle, and the MERGE phase. Doing these checks ensure that not only is the accumulator coder serializable, but the accumulator coder can actually serialize the data in question.
    See Also:
    Serialized Form
    • Constructor Detail

      • TestCombineDoFn

        public TestCombineDoFn(<? super K,? super InputT,AccumT,OutputT> fnRunner,
                               Coder<AccumT> accumCoder,
                               boolean testSerializability,
                               Random rand)
    • Method Detail

      • processElement

        public void processElement(DoFn.ProcessContext c)
                            throws Exception
        Description copied from class: DoFn
        Processes one input element.

        The current element of the input PCollection is returned by c.element(). It should be considered immutable. The Dataflow runtime will not mutate the element, so it is safe to cache, etc. The element should not be mutated by any of the DoFn methods, because it may be cached elsewhere, retained by the Dataflow runtime, or used in other unspecified ways.

        A value is added to the main output PCollection by DoFn.Context.output(OutputT). Once passed to output the element should be considered immutable and not be modified in any way. It may be cached elsewhere, retained by the Dataflow runtime, or used in other unspecified ways.

        Specified by:
        processElement in class DoFn<KV<K,Iterable<InputT>>,KV<K,OutputT>>
        See Also:
      • addInputsRandomly

        public static <K,AccumT,InputT> List<AccumT> addInputsRandomly(<? super K,? super InputT,AccumT,?> fnRunner,
                                                                       K key,
                                                                       Iterable<InputT> values,
                                                                       Random random,
                                                                       DoFn.ProcessContext c)
        Create a random list of accumulators from the given list of values.

        Visible for testing purposes only.

      • ensureSerializableByCoder

        public <T> T ensureSerializableByCoder(Coder<T> coder,
                                               T value,
                                               String errorContext)

Send feedback about...

Cloud Dataflow