RemoveDuplicates (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.transforms

Class RemoveDuplicates<T>

  • Type Parameters:
    T - the type of the elements of the input and output PCollections
    All Implemented Interfaces:
    HasDisplayData, Serializable


    public class RemoveDuplicates<T>
    extends PTransform<PCollection<T>,PCollection<T>>
    RemoveDuplicates<T> takes a PCollection<T> and returns a PCollection<T> that has all the elements of the input but with duplicate elements removed such that each element is unique within each window.

    Two values of type T are compared for equality not by regular Java Object.equals(java.lang.Object), but instead by first encoding each of the elements using the PCollection's Coder, and then comparing the encoded bytes. This admits efficient parallel evaluation.

    Optionally, a function may be provided that maps each element to a representative value. In this case, two elements will be considered duplicates if they have equal representative values, with equality being determined as above.

    By default, the Coder of the output PCollection is the same as the Coder of the input PCollection.

    Each output element is in the same window as its corresponding input element, and has the timestamp of the end of that window. The output PCollection has the same WindowFn as the input.

    Does not preserve any order the input PCollection might have had.

    Example of use:

     
     PCollection<String> words = ...;
     PCollection<String> uniqueWords =
         words.apply(RemoveDuplicates.<String>create());
      

    See Also:
    Serialized Form
    • Constructor Detail

      • RemoveDuplicates

        public RemoveDuplicates()
    • Method Detail

      • create

        public static <T> RemoveDuplicates<T> create()
        Returns a RemoveDuplicates<T> PTransform.
        Type Parameters:
        T - the type of the elements of the input and output PCollections
      • withRepresentativeValueFn

        public static <T,IdT> RemoveDuplicates.WithRepresentativeValues<T,IdT> withRepresentativeValueFn(SerializableFunction<T,IdT> fn)
        Returns a RemoveDuplicates<T, IdT> PTransform.
        Type Parameters:
        T - the type of the elements of the input and output PCollections
        IdT - the type of the representative value used to dedup


이 페이지가 도움이 되었나요? 평가를 부탁드립니다.

다음에 대한 의견 보내기...

도움이 필요하시나요? 지원 페이지를 방문하세요.