OffsetBasedSource (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

Class OffsetBasedSource<T>

    • Constructor Detail

      • OffsetBasedSource

        public OffsetBasedSource(long startOffset,
                                 long endOffset,
                                 long minBundleSize)
        startOffset - starting offset (inclusive) of the source. Must be non-negative.
        endOffset - ending offset (exclusive) of the source. Use Long.MAX_VALUE to indicate that the entire source after startOffset should be read. Must be > startOffset.
        minBundleSize - minimum bundle size in offset units that should be used when splitting the source into sub-sources. This value may not be respected if the total range of the source is smaller than the specified minBundleSize. Must be non-negative.
    • Method Detail

      • getStartOffset

        public long getStartOffset()
        Returns the starting offset of the source.
      • getMinBundleSize

        public long getMinBundleSize()
        Returns the minimum bundle size that should be used when splitting the source into sub-sources. This value may not be respected if the total range of the source is smaller than the specified minBundleSize.
      • getEstimatedSizeBytes

        public long getEstimatedSizeBytes(PipelineOptions options)
                                   throws Exception
        Description copied from class: BoundedSource
        An estimate of the total size (in bytes) of the data that would be read from this source. This estimate is in terms of external storage size, before any decompression or other processing done by the reader.
        Specified by:
        getEstimatedSizeBytes in class BoundedSource<T>
      • validate

        public void validate()
        Description copied from class: Source
        Checks that this source is valid, before it can be used in a pipeline.

        It is recommended to use Preconditions for implementing this method.

        Specified by:
        validate in class Source<T>
      • getMaxEndOffset

        public abstract long getMaxEndOffset(PipelineOptions options)
                                      throws Exception
        Returns the actual ending offset of the current source. The value returned by this function will be used to clip the end of the range [startOffset, endOffset) such that the range used is [startOffset, min(endOffset, maxEndOffset)).

        As an example in which OffsetBasedSource is used to implement a file source, suppose that this source was constructed with an endOffset of Long.MAX_VALUE to indicate that a file should be read to the end. Then this function should determine the actual, exact size of the file in bytes and return it.

      • createSourceForSubrange

        public abstract OffsetBasedSource<T> createSourceForSubrange(long start,
                                                                     long end)
        Returns an OffsetBasedSource for a subrange of the current source. The subrange [start, end) must be within the range [startOffset, endOffset) of the current source, i.e. startOffset <= start < end <= endOffset.
      • allowsDynamicSplitting

        public boolean allowsDynamicSplitting()
        Whether this source should allow dynamic splitting of the offset ranges.

        True by default. Override this to return false if the source cannot support dynamic splitting correctly. If this returns false, OffsetBasedSource.OffsetBasedReader.splitAtFraction(double) will refuse all split requests.

      • populateDisplayData

        public void populateDisplayData(DisplayData.Builder builder)
        Description copied from class: Source
        Register display data for the given transform or component.

        populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

        By default, does not register any display data. Implementors may override this method to provide their own display data.

        Specified by:
        populateDisplayData in interface HasDisplayData
        populateDisplayData in class Source<T>
        builder - The builder to populate with display data.
        See Also: