ApproximateQuantiles (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.transforms

Class ApproximateQuantiles

  • java.lang.Object
    • com.google.cloud.dataflow.sdk.transforms.ApproximateQuantiles


  • public class ApproximateQuantiles
    extends Object
    PTransforms for getting an idea of a PCollection's data distribution using approximate N-tiles (e.g. quartiles, percentiles, etc.), either globally or per-key.
    • Method Detail

      • globally

        public static <T,ComparatorT extends Comparator<T> & SerializablePTransform<PCollection<T>,PCollection<List<T>>> globally(int numQuantiles,
                                                                                                                                    ComparatorT compareFn)
        Returns a PTransform that takes a PCollection<T> and returns a PCollection<List<T>> whose single value is a List of the approximate N-tiles of the elements of the input PCollection. This gives an idea of the distribution of the input elements.

        The computed List is of size numQuantiles, and contains the input elements' minimum value, numQuantiles-2 intermediate values, and maximum value, in sorted order, using the given Comparator to order values. To compute traditional N-tiles, one should use ApproximateQuantiles.globally(compareFn, N+1).

        If there are fewer input elements than numQuantiles, then the result List will contain all the input elements, in sorted order.

        The argument Comparator must be Serializable.

        Example of use:

         
         PCollection<String> pc = ...;
         PCollection<List<String>> quantiles =
             pc.apply(ApproximateQuantiles.globally(stringCompareFn, 11));
          
        Type Parameters:
        T - the type of the elements in the input PCollection
        Parameters:
        numQuantiles - the number of elements in the resulting quantile values List
        compareFn - the function to use to order the elements
      • globally

        public static <T extends Comparable<T>> PTransform<PCollection<T>,PCollection<List<T>>> globally(int numQuantiles)
        Like globally(int, Comparator), but sorts using the elements' natural ordering.
        Type Parameters:
        T - the type of the elements in the input PCollection
        Parameters:
        numQuantiles - the number of elements in the resulting quantile values List
      • perKey

        public static <K,V,ComparatorT extends Comparator<V> & SerializablePTransform<PCollection<KV<K,V>>,PCollection<KV<K,List<V>>>> perKey(int numQuantiles,
                                                                                                                                                ComparatorT compareFn)
        Returns a PTransform that takes a PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to a List of the approximate N-tiles of the values associated with that key in the input PCollection. This gives an idea of the distribution of the input values for each key.

        Each of the computed Lists is of size numQuantiles, and contains the input values' minimum value, numQuantiles-2 intermediate values, and maximum value, in sorted order, using the given Comparator to order values. To compute traditional N-tiles, one should use ApproximateQuantiles.perKey(compareFn, N+1).

        If a key has fewer than numQuantiles values associated with it, then that key's output List will contain all the key's input values, in sorted order.

        The argument Comparator must be Serializable.

        Example of use:

         
         PCollection<KV<Integer, String>> pc = ...;
         PCollection<KV<Integer, List<String>>> quantilesPerKey =
             pc.apply(ApproximateQuantiles.<Integer, String>perKey(stringCompareFn, 11));
          

        See Combine.PerKey for how this affects timestamps and windowing.

        Type Parameters:
        K - the type of the keys in the input and output PCollections
        V - the type of the values in the input PCollection
        Parameters:
        numQuantiles - the number of elements in the resulting quantile values List
        compareFn - the function to use to order the elements
      • perKey

        public static <K,V extends Comparable<V>> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,List<V>>>> perKey(int numQuantiles)
        Like perKey(int, Comparator), but sorts values using the their natural ordering.
        Type Parameters:
        K - the type of the keys in the input and output PCollections
        V - the type of the values in the input PCollection
        Parameters:
        numQuantiles - the number of elements in the resulting quantile values List


Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Dataflow