Source.Reader (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io

Class Source.Reader<T>

  • All Implemented Interfaces:
    AutoCloseable
    Direct Known Subclasses:
    BoundedSource.BoundedReader, UnboundedSource.UnboundedReader
    Enclosing class:
    Source<T>


    public abstract static class Source.Reader<T>
    extends Object
    implements AutoCloseable
    The interface that readers of custom input sources must implement.

    This interface is deliberately distinct from Iterator because the current model tends to be easier to program and more efficient in practice for iterating over sources such as files, databases etc. (rather than pure collections).

    Reading data from the Source.Reader must obey the following access pattern:

    • One call to start()
      • If start() returned true, any number of calls to getCurrent* methods
    • Repeatedly, a call to advance(). This may be called regardless of what the previous start()/advance() returned.
      • If advance() returned true, any number of calls to getCurrent* methods

    For example, if the reader is reading a fixed set of data:

       try {
         for (boolean available = reader.start(); available; available = reader.advance()) {
           T item = reader.getCurrent();
           Instant timestamp = reader.getCurrentTimestamp();
           ...
         }
       } finally {
         reader.close();
       }
     

    If the set of data being read is continually growing:

       try {
         boolean available = reader.start();
         while (true) {
           if (available) {
             T item = reader.getCurrent();
             Instant timestamp = reader.getCurrentTimestamp();
             ...
             resetExponentialBackoff();
           } else {
             exponentialBackoff();
           }
           available = reader.advance();
         }
       } finally {
         reader.close();
       }
     

    Note: this interface is a work-in-progress and may change.

    All Reader functions except getCurrentSource() do not need to be thread-safe; they may only be accessed by a single thread at once. However, getCurrentSource() needs to be thread-safe, and other functions should assume that its returned value can change asynchronously.

    • Constructor Detail

      • Reader

        public Reader()
    • Method Detail

      • start

        public abstract boolean start()
                               throws IOException
        Initializes the reader and advances the reader to the first record.

        This method should be called exactly once. The invocation should occur prior to calling advance() or getCurrent(). This method may perform expensive operations that are needed to initialize the reader.

        Returns:
        true if a record was read, false if there is no more input available.
        Throws:
        IOException
      • advance

        public abstract boolean advance()
                                 throws IOException
        Advances the reader to the next valid record.

        It is an error to call this without having called start() first.

        Returns:
        true if a record was read, false if there is no more input available.
        Throws:
        IOException
      • getCurrentTimestamp

        public abstract Instant getCurrentTimestamp()
                                             throws NoSuchElementException
        Returns the timestamp associated with the current data item.

        If the source does not support timestamps, this should return BoundedWindow.TIMESTAMP_MIN_VALUE.

        Multiple calls to this method without an intervening call to advance() should return the same result.

        Throws:
        NoSuchElementException - if the reader is at the beginning of the input and start() or advance() wasn't called, or if the last start() or advance() returned false.
      • close

        public abstract void close()
                            throws IOException
        Closes the reader. The reader cannot be used after this method is called.
        Specified by:
        close in interface AutoCloseable
        Throws:
        IOException
      • getCurrentSource

        public abstract Source<T> getCurrentSource()
        Returns a Source describing the same input that this Reader currently reads (including items already read).

        Usually, an implementation will simply return the immutable Source object from which the current Source.Reader was constructed, or delegate to the base class. However, when using or implementing this method on a BoundedSource.BoundedReader, special considerations apply, see documentation for BoundedSource.BoundedReader.getCurrentSource().


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow