Combine.PerKey (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

Class Combine.PerKey<K,InputT,OutputT>

  • Type Parameters:
    K - the type of the keys of the input and output PCollections
    InputT - the type of the values of the input PCollection
    OutputT - the type of the values of the output PCollection
    All Implemented Interfaces:
    HasDisplayData, Serializable
    Enclosing class:

    public static class Combine.PerKey<K,InputT,OutputT>
    extends PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
    PerKey<K, InputT, OutputT> takes a PCollection<KV<K, InputT>>, groups it by key, applies a combining function to the InputT values associated with each key to produce a combined OutputT value, and returns a PCollection<KV<K, OutputT>> representing a map from each distinct key of the input PCollection to the corresponding combined value. InputT and OutputT are often the same.

    This is a concise shorthand for an application of GroupByKey followed by an application of Combine.GroupedValues. See those operations for more details on how keys are compared for equality and on the default Coder for the output.

    Example of use:

     PCollection<KV<String, Double>> salesRecords = ...;
     PCollection<KV<String, Double>> totalSalesPerPerson =
         salesRecords.apply(Combine.<String, Double>perKey(
             new Sum.SumDoubleFn()));

    Each output element is in the window by which its corresponding input was grouped, and has the timestamp of the end of that window. The output PCollection has the same WindowFn as the input.

    See Also:
    Serialized Form