Google Cloud Dataflow SDK for Java, version 1.9.1
com.google.cloud.dataflow.sdk.io
Class BoundedSource<T>
- java.lang.Object
-
- com.google.cloud.dataflow.sdk.io.Source<T>
-
- com.google.cloud.dataflow.sdk.io.BoundedSource<T>
-
- Type Parameters:
T
- Type of records read by the source.
- All Implemented Interfaces:
- HasDisplayData, Serializable
- Direct Known Subclasses:
- DatastoreIO.Source, OffsetBasedSource
public abstract class BoundedSource<T> extends Source<T>
ASource
that reads a finite amount of input and, because of that, supports some additional operations.The operations are:
- Splitting into bundles of given size:
splitIntoBundles(long, com.google.cloud.dataflow.sdk.options.PipelineOptions)
; - Size estimation:
getEstimatedSizeBytes(com.google.cloud.dataflow.sdk.options.PipelineOptions)
; - Telling whether or not this source produces key/value pairs in sorted order:
producesSortedKeys(com.google.cloud.dataflow.sdk.options.PipelineOptions)
; - The accompanying
reader
has additional functionality to enable runners to dynamically adapt based on runtime conditions.- Progress estimation (
BoundedSource.BoundedReader.getFractionConsumed()
) - Tracking of parallelism, to determine whether the current source can be split
(
BoundedSource.BoundedReader.getSplitPointsConsumed()
andBoundedSource.BoundedReader.getSplitPointsRemaining()
). - Dynamic splitting of the current source (
BoundedSource.BoundedReader.splitAtFraction(double)
).
- Progress estimation (
To use this class for supporting your custom input type, derive your class class from it, and override the abstract methods. For an example, see
DatastoreIO
.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
BoundedSource.BoundedReader<T>
AReader
that reads a bounded amount of input and supports some additional operations, such as progress estimation and dynamic work rebalancing.-
Nested classes/interfaces inherited from class com.google.cloud.dataflow.sdk.io.Source
Source.Reader<T>
-
-
Constructor Summary
Constructors Constructor and Description BoundedSource()
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method and Description abstract BoundedSource.BoundedReader<T>
createReader(PipelineOptions options)
Returns a newBoundedSource.BoundedReader
that reads from this source.abstract long
getEstimatedSizeBytes(PipelineOptions options)
An estimate of the total size (in bytes) of the data that would be read from this source.abstract boolean
producesSortedKeys(PipelineOptions options)
Whether this source is known to produce key/value pairs sorted by lexicographic order on the bytes of the encoded key.abstract List<? extends BoundedSource<T>>
splitIntoBundles(long desiredBundleSizeBytes, PipelineOptions options)
Splits the source into bundles of approximatelydesiredBundleSizeBytes
.-
Methods inherited from class com.google.cloud.dataflow.sdk.io.Source
getDefaultOutputCoder, populateDisplayData, validate
-
-
-
-
Method Detail
-
splitIntoBundles
public abstract List<? extends BoundedSource<T>> splitIntoBundles(long desiredBundleSizeBytes, PipelineOptions options) throws Exception
Splits the source into bundles of approximatelydesiredBundleSizeBytes
.- Throws:
Exception
-
getEstimatedSizeBytes
public abstract long getEstimatedSizeBytes(PipelineOptions options) throws Exception
An estimate of the total size (in bytes) of the data that would be read from this source. This estimate is in terms of external storage size, before any decompression or other processing done by the reader.- Throws:
Exception
-
producesSortedKeys
public abstract boolean producesSortedKeys(PipelineOptions options) throws Exception
Whether this source is known to produce key/value pairs sorted by lexicographic order on the bytes of the encoded key.- Throws:
Exception
-
createReader
public abstract BoundedSource.BoundedReader<T> createReader(PipelineOptions options) throws IOException
Returns a newBoundedSource.BoundedReader
that reads from this source.- Throws:
IOException
-
-