Combine.GroupedValues (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.transforms

Class Combine.GroupedValues<K,InputT,OutputT>

  • Type Parameters:
    K - type of input and output keys
    InputT - type of input values
    OutputT - type of output values
    All Implemented Interfaces:
    HasDisplayData, Serializable
    Enclosing class:
    Combine


    public static class Combine.GroupedValues<K,InputT,OutputT>
    extends PTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
    GroupedValues<K, InputT, OutputT> takes a PCollection<KV<K, Iterable<InputT>>>, such as the result of GroupByKey, applies a specified KeyedCombineFn<K, InputT, AccumT, OutputT> to each of the input KV<K, Iterable<InputT>> elements to produce a combined output KV<K, OutputT> element, and returns a PCollection<KV<K, OutputT>> containing all the combined output elements. It is common for InputT == OutputT, but not required. Common combining functions include sums, mins, maxes, and averages of numbers, conjunctions and disjunctions of booleans, statistical aggregations, etc.

    Example of use:

     
     PCollection<KV<String, Integer>> pc = ...;
     PCollection<KV<String, Iterable<Integer>>> groupedByKey = pc.apply(
         new GroupByKey<String, Integer>());
     PCollection<KV<String, Integer>> sumByKey = groupedByKey.apply(
         Combine.<String, Integer>groupedValues(
             new Sum.SumIntegerFn()));
      

    See also Combine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.PerKey, which captures the common pattern of "combining by key" in a single easy-to-use PTransform.

    Combining for different keys can happen in parallel. Moreover, combining of the Iterable<InputT> values associated a single key can happen in parallel, with different subsets of the values being combined separately, and their intermediate results combined further, in an arbitrary tree reduction pattern, until a single result value is produced for each key.

    By default, the Coder of the keys of the output PCollection<KV<K, OutputT>> is that of the keys of the input PCollection<KV<K, InputT>>, and the Coder of the values of the output PCollection<KV<K, OutputT>> is inferred from the concrete type of the KeyedCombineFn<K, InputT, AccumT, OutputT>'s output type OutputT.

    Each output element has the same timestamp and is in the same window as its corresponding input element, and the output PCollection has the same WindowFn associated with it as the input.

    See also Combine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.Globally, which combines all the values in a PCollection into a single value in a PCollection.

    See Also:
    Serialized Form


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow
Need help? Visit our support page.