Top (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.transforms

Class Top



  • public class Top
    extends Object
    PTransforms for finding the largest (or smallest) set of elements in a PCollection, or the largest (or smallest) set of values associated with each key in a PCollection of KVs.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class and Description
      static class  Top.Largest<T extends Comparable<? super T>>
      A Serializable Comparator that that uses the compared elements' natural ordering.
      static class  Top.Smallest<T extends Comparable<? super T>>
      Serializable Comparator that that uses the reverse of the compared elements' natural ordering.
      static class  Top.TopCombineFn<T,ComparatorT extends Comparator<T> & Serializable>
      CombineFn for Top transforms that combines a bunch of Ts into a single count-long List<T>, using compareFn to choose the largest Ts.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method and Description
      static <T extends Comparable<T>>
      Combine.Globally<T,List<T>>
      largest(int count)
      Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the largest count elements of the input PCollection<T>, in decreasing order, sorted according to their natural order.
      static <K,V extends Comparable<V>>
      Combine.PerKey<K,V,List<V>>
      largestPerKey(int count)
      Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the largest count values associated with that key in the input PCollection<KV<K, V>>, in decreasing order, sorted according to their natural order.
      static <T,ComparatorT extends Comparator<T> & Serializable>
      Combine.Globally<T,List<T>>
      of(int count, ComparatorT compareFn)
      Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the largest count elements of the input PCollection<T>, in decreasing order, sorted using the given Comparator<T>.
      static <K,V,ComparatorT extends Comparator<V> & Serializable>
      PTransform<PCollection<KV<K,V>>,PCollection<KV<K,List<V>>>>
      perKey(int count, ComparatorT compareFn)
      Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the largest count values associated with that key in the input PCollection<KV<K, V>>, in decreasing order, sorted using the given Comparator<V>.
      static <T extends Comparable<T>>
      Combine.Globally<T,List<T>>
      smallest(int count)
      Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the smallest count elements of the input PCollection<T>, in increasing order, sorted according to their natural order.
      static <K,V extends Comparable<V>>
      PTransform<PCollection<KV<K,V>>,PCollection<KV<K,List<V>>>>
      smallestPerKey(int count)
      Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the smallest count values associated with that key in the input PCollection<KV<K, V>>, in increasing order, sorted according to their natural order.
    • Method Detail

      • of

        public static <T,ComparatorT extends Comparator<T> & SerializableCombine.Globally<T,List<T>> of(int count,
                                                                                                          ComparatorT compareFn)
        Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the largest count elements of the input PCollection<T>, in decreasing order, sorted using the given Comparator<T>. The Comparator<T> must also be Serializable.

        If count < the number of elements in the input PCollection, then all the elements of the input PCollection will be in the resulting List, albeit in sorted order.

        All the elements of the result's List must fit into the memory of a single machine.

        Example of use:

         
         PCollection<Student> students = ...;
         PCollection<List<Student>> top10Students =
             students.apply(Top.of(10, new CompareStudentsByAvgGrade()));
          

        By default, the Coder of the output PCollection is a ListCoder of the Coder of the elements of the input PCollection.

        If the input PCollection is windowed into GlobalWindows, an empty List<T> in the GlobalWindow will be output if the input PCollection is empty. To use this with inputs with other windowing, either withoutDefaults or asSingletonView must be called.

        See also smallest(int) and largest(int), which sort Comparable elements using their natural ordering.

        See also perKey(int, ComparatorT), smallestPerKey(int), and largestPerKey(int), which take a PCollection of KVs and return the top values associated with each key.

      • smallest

        public static <T extends Comparable<T>> Combine.Globally<T,List<T>> smallest(int count)
        Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the smallest count elements of the input PCollection<T>, in increasing order, sorted according to their natural order.

        If count < the number of elements in the input PCollection, then all the elements of the input PCollection will be in the resulting PCollection's List, albeit in sorted order.

        All the elements of the result List must fit into the memory of a single machine.

        Example of use:

         
         PCollection<Integer> values = ...;
         PCollection<List<Integer>> smallest10Values = values.apply(Top.smallest(10));
          

        By default, the Coder of the output PCollection is a ListCoder of the Coder of the elements of the input PCollection.

        If the input PCollection is windowed into GlobalWindows, an empty List<T> in the GlobalWindow will be output if the input PCollection is empty. To use this with inputs with other windowing, either withoutDefaults or asSingletonView must be called.

        See also largest(int).

        See also of(int, ComparatorT), which sorts using a user-specified Comparator function.

        See also perKey(int, ComparatorT), smallestPerKey(int), and largestPerKey(int), which take a PCollection of KVs and return the top values associated with each key.

      • largest

        public static <T extends Comparable<T>> Combine.Globally<T,List<T>> largest(int count)
        Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the largest count elements of the input PCollection<T>, in decreasing order, sorted according to their natural order.

        If count < the number of elements in the input PCollection, then all the elements of the input PCollection will be in the resulting PCollection's List, albeit in sorted order.

        All the elements of the result's List must fit into the memory of a single machine.

        Example of use:

         
         PCollection<Integer> values = ...;
         PCollection<List<Integer>> largest10Values = values.apply(Top.largest(10));
          

        By default, the Coder of the output PCollection is a ListCoder of the Coder of the elements of the input PCollection.

        If the input PCollection is windowed into GlobalWindows, an empty List<T> in the GlobalWindow will be output if the input PCollection is empty. To use this with inputs with other windowing, either withoutDefaults or asSingletonView must be called.

        See also smallest(int).

        See also of(int, ComparatorT), which sorts using a user-specified Comparator function.

        See also perKey(int, ComparatorT), smallestPerKey(int), and largestPerKey(int), which take a PCollection of KVs and return the top values associated with each key.

      • perKey

        public static <K,V,ComparatorT extends Comparator<V> & SerializablePTransform<PCollection<KV<K,V>>,PCollection<KV<K,List<V>>>> perKey(int count,
                                                                                                                                                ComparatorT compareFn)
        Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the largest count values associated with that key in the input PCollection<KV<K, V>>, in decreasing order, sorted using the given Comparator<V>. The Comparator<V> must also be Serializable.

        If there are fewer than count values associated with a particular key, then all those values will be in the result mapping for that key, albeit in sorted order.

        All the values associated with a single key must fit into the memory of a single machine, but there can be many more KVs in the resulting PCollection than can fit into the memory of a single machine.

        Example of use:

         
         PCollection<KV<School, Student>> studentsBySchool = ...;
         PCollection<KV<School, List<Student>>> top10StudentsBySchool =
             studentsBySchool.apply(
                 Top.perKey(10, new CompareStudentsByAvgGrade()));
          

        By default, the Coder of the keys of the output PCollection is the same as that of the keys of the input PCollection, and the Coder of the values of the output PCollection is a ListCoder of the Coder of the values of the input PCollection.

        See also smallestPerKey(int) and largestPerKey(int), which sort Comparable<V> values using their natural ordering.

        See also of(int, ComparatorT), smallest(int), and largest(int), which take a PCollection and return the top elements.

      • smallestPerKey

        public static <K,V extends Comparable<V>> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,List<V>>>> smallestPerKey(int count)
        Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the smallest count values associated with that key in the input PCollection<KV<K, V>>, in increasing order, sorted according to their natural order.

        If there are fewer than count values associated with a particular key, then all those values will be in the result mapping for that key, albeit in sorted order.

        All the values associated with a single key must fit into the memory of a single machine, but there can be many more KVs in the resulting PCollection than can fit into the memory of a single machine.

        Example of use:

         
         PCollection<KV<String, Integer>> keyedValues = ...;
         PCollection<KV<String, List<Integer>>> smallest10ValuesPerKey =
             keyedValues.apply(Top.smallestPerKey(10));
          

        By default, the Coder of the keys of the output PCollection is the same as that of the keys of the input PCollection, and the Coder of the values of the output PCollection is a ListCoder of the Coder of the values of the input PCollection.

        See also largestPerKey(int).

        See also perKey(int, ComparatorT), which sorts values using a user-specified Comparator function.

        See also of(int, ComparatorT), smallest(int), and largest(int), which take a PCollection and return the top elements.

      • largestPerKey

        public static <K,V extends Comparable<V>> Combine.PerKey<K,V,List<V>> largestPerKey(int count)
        Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the largest count values associated with that key in the input PCollection<KV<K, V>>, in decreasing order, sorted according to their natural order.

        If there are fewer than count values associated with a particular key, then all those values will be in the result mapping for that key, albeit in sorted order.

        All the values associated with a single key must fit into the memory of a single machine, but there can be many more KVs in the resulting PCollection than can fit into the memory of a single machine.

        Example of use:

         
         PCollection<KV<String, Integer>> keyedValues = ...;
         PCollection<KV<String, List<Integer>>> largest10ValuesPerKey =
             keyedValues.apply(Top.largestPerKey(10));
          

        By default, the Coder of the keys of the output PCollection is the same as that of the keys of the input PCollection, and the Coder of the values of the output PCollection is a ListCoder of the Coder of the values of the input PCollection.

        See also smallestPerKey(int).

        See also perKey(int, ComparatorT), which sorts values using a user-specified Comparator function.

        See also of(int, ComparatorT), smallest(int), and largest(int), which take a PCollection and return the top elements.


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow
Need help? Visit our support page.