Count.PerElement (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.transforms

Class Count.PerElement<T>

  • Type Parameters:
    T - the type of the elements of the input PCollection, and the type of the keys of the output PCollection
    All Implemented Interfaces:
    HasDisplayData, Serializable
    Enclosing class:
    Count


    public static class Count.PerElement<T>
    extends PTransform<PCollection<T>,PCollection<KV<T,Long>>>
    Count.PerElement<T> takes a PCollection<T> and returns a PCollection<KV<T, Long>> representing a map from each distinct element of the input PCollection to the number of times that element occurs in the input. Each key in the output PCollection is unique.

    This transform compares two values of type T by first encoding each element using the input PCollection's Coder, then comparing the encoded bytes. Because of this, the input coder must be deterministic. (See Coder.verifyDeterministic() for more detail). Performing the comparison in this manner admits efficient parallel evaluation.

    By default, the Coder of the keys of the output PCollection is the same as the Coder of the elements of the input PCollection.

    Example of use:

     
     PCollection<String> words = ...;
     PCollection<KV<String, Long>> wordCounts =
         words.apply(Count.<String>perElement());
      
    See Also:
    Serialized Form


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow
Need help? Visit our support page.