FileBasedSource.FileBasedReader (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

Class FileBasedSource.FileBasedReader<T>

    • Method Detail

      • getCurrentSource

        public FileBasedSource<T> getCurrentSource()
        Description copied from class: BoundedSource.BoundedReader
        Returns a Source describing the same input that this Reader currently reads (including items already read).


        Reader subclasses can use this method for convenience to access unchanging properties of the source being read. Alternatively, they can cache these properties in the constructor.

        The framework will call this method in the course of dynamic work rebalancing, e.g. after a successful BoundedSource.BoundedReader.splitAtFraction(double) call.

        Mutability and thread safety

        Remember that Source objects must always be immutable. However, the return value of this function may be affected by dynamic work rebalancing, happening asynchronously via BoundedSource.BoundedReader.splitAtFraction(double), meaning it can return a different Source object. However, the returned object itself will still itself be immutable. Callers must take care not to rely on properties of the returned source that may be asynchronously changed as a result of this process (e.g. do not cache an end offset when reading a file).


        For convenience, subclasses should usually return the most concrete subclass of Source possible. In practice, the implementation of this method should nearly always be one of the following:
        • Source that inherits from a base class that already implements BoundedSource.BoundedReader.getCurrentSource(): delegate to base class. In this case, it is almost always an error for the subclass to maintain its own copy of the source.
             public FooReader(FooSource<T> source) {
             public FooSource<T> getCurrentSource() {
               return (FooSource<T>)super.getCurrentSource();
        • Source that does not support dynamic work rebalancing: return a private final variable.
             private final FooSource<T> source;
             public FooReader(FooSource<T> source) {
               this.source = source;
             public FooSource<T> getCurrentSource() {
               return source;
        • BoundedSource.BoundedReader that explicitly supports dynamic work rebalancing: maintain a variable pointing to an immutable source object, and protect it with synchronization.
             private FooSource<T> source;
             public FooReader(FooSource<T> source) {
               this.source = source;
             public synchronized FooSource<T> getCurrentSource() {
               return source;
             public synchronized FooSource<T> splitAtFraction(double fraction) {
               FooSource<T> primary = ...;
               FooSource<T> residual = ...;
               this.source = primary;
               return residual;

        getCurrentSource in class OffsetBasedSource.OffsetBasedReader<T>
      • startReading

        protected abstract void startReading(ReadableByteChannel channel)
                                      throws IOException
        Performs any initialization of the subclass of FileBasedReader that involves IO operations. Will only be invoked once and before that invocation the base class will seek the channel to the source's starting offset.

        Provided ReadableByteChannel is for the file represented by the source of this reader. Subclass may use the channel to build a higher level IO abstraction, e.g., a BufferedReader or an XML parser.

        If the corresponding source is for a subrange of a file, channel is guaranteed to be an instance of the type SeekableByteChannel.

        After this method is invoked the base class will not be reading data from the channel or adjusting the position of the channel. But the base class is responsible for properly closing the channel.

        channel - a byte channel representing the file backing the reader.