Source (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class Source<T>

  • Type Parameters:
    T - Type of elements read by the source.
    All Implemented Interfaces:
    HasDisplayData, Serializable
    Direct Known Subclasses:
    BoundedSource, UnboundedSource


    @Experimental(value=SOURCE_SINK)
    public abstract class Source<T>
    extends Object
    implements Serializable, HasDisplayData
    Base class for defining input formats and creating a Source for reading the input.

    This class is not intended to be subclassed directly. Instead, to define a bounded source (a source which produces a finite amount of input), subclass BoundedSource; to define an unbounded source, subclass UnboundedSource.

    A Source passed to a Read transform must be Serializable. This allows the Source instance created in this "main program" to be sent (in serialized form) to remote worker machines and reconstituted for each batch of elements of the input PCollection being processed or for each source splitting operation. A Source can have instance variable state, and non-transient instance variable state will be serialized in the main program and then deserialized on remote worker machines.

    Source classes MUST be effectively immutable. The only acceptable use of mutable fields is to cache the results of expensive operations, and such fields MUST be marked transient.

    Source objects should override Object.toString(), as it will be used in important error and debugging messages.

    See Also:
    Serialized Form
    • Constructor Detail

      • Source

        public Source()
    • Method Detail

      • validate

        public abstract void validate()
        Checks that this source is valid, before it can be used in a pipeline.

        It is recommended to use Preconditions for implementing this method.

      • getDefaultOutputCoder

        public abstract Coder<T> getDefaultOutputCoder()
        Returns the default Coder to use for the data read from this source.
      • populateDisplayData

        public void populateDisplayData(DisplayData.Builder builder)
        Register display data for the given transform or component.

        populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

        By default, does not register any display data. Implementors may override this method to provide their own display data.

        Specified by:
        populateDisplayData in interface HasDisplayData
        Parameters:
        builder - The builder to populate with display data.
        See Also:
        HasDisplayData


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow