com.google.cloud.bigtable.beam
Class CloudBigtableIO.SourceWithKeys
- java.lang.Object
-
- org.apache.beam.sdk.io.Source<T>
-
- org.apache.beam.sdk.io.BoundedSource<Result>
-
- com.google.cloud.bigtable.beam.CloudBigtableIO.SourceWithKeys
-
- All Implemented Interfaces:
- Serializable, HasDisplayData
- Enclosing class:
- CloudBigtableIO
protected static class CloudBigtableIO.SourceWithKeys extends BoundedSource<Result>
ABoundedSource
for a Cloud BigtableTable
with a start/stop key range, along with a potential filter via aScan
.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.BoundedSource
BoundedSource.BoundedReader<T>
-
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.Source
Source.Reader<T>
-
-
Field Summary
Fields Modifier and Type Field and Description protected static long
SIZED_BASED_MAX_SPLIT_COUNT
protected static org.slf4j.Logger
SOURCE_LOG
-
Constructor Summary
Constructors Modifier Constructor and Description protected
SourceWithKeys(CloudBigtableScanConfiguration configuration, long estimatedSize)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method and Description BoundedSource.BoundedReader<Result>
createReader(PipelineOptions options)
Creates a reader that will scan the entire table based on theScan
in the configuration.protected CloudBigtableScanConfiguration
getConfiguration()
long
getEstimatedSize()
long
getEstimatedSizeBytes(PipelineOptions options)
Gets an estimate of the size of the source.Coder<Result>
getOutputCoder()
List<com.google.bigtable.repackaged.com.google.cloud.bigtable.data.v2.models.KeyOffset>
getSampleRowKeys()
Performs a call to get sample row keys fromCloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration)
if they are not yet cached.protected List<CloudBigtableIO.SourceWithKeys>
getSplits(long desiredBundleSizeBytes)
protected static boolean
isWithinRange(byte[] scanStartKey, byte[] scanEndKey, byte[] startKey, byte[] endKey)
Checks if the range of the region is within the range of the scan.void
populateDisplayData(DisplayData.Builder builder)
protected List<CloudBigtableIO.SourceWithKeys>
split(long regionSize, long desiredBundleSizeBytes, byte[] startKey, byte[] stopKey)
Splits the region based on the start and stop key.List<? extends BoundedSource<Result>>
split(long desiredBundleSizeBytes, PipelineOptions options)
Splits the bundle based on the assumption that the data is distributed evenly between startKey and stopKey.String
toString()
void
validate()
Validates the existence of the table in the configuration.-
Methods inherited from class org.apache.beam.sdk.io.Source
getDefaultOutputCoder
-
-
-
-
Field Detail
-
SOURCE_LOG
protected static final org.slf4j.Logger SOURCE_LOG
-
SIZED_BASED_MAX_SPLIT_COUNT
protected static final long SIZED_BASED_MAX_SPLIT_COUNT
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
SourceWithKeys
protected SourceWithKeys(CloudBigtableScanConfiguration configuration, long estimatedSize)
-
-
Method Detail
-
getEstimatedSizeBytes
public long getEstimatedSizeBytes(PipelineOptions options)
Gets an estimate of the size of the source.NOTE: This value is a guesstimate. It could be significantly off, especially if there is a
Scan
selected in the configuration. It will also be off if the start and stop keys are calculated viaCloudBigtableIO.Source.split(long, PipelineOptions)
.- Parameters:
options
- The pipeline options.- Returns:
- The estimated size of the source, in bytes.
-
getEstimatedSize
public long getEstimatedSize()
-
split
public List<? extends BoundedSource<Result>> split(long desiredBundleSizeBytes, PipelineOptions options) throws Exception
Splits the bundle based on the assumption that the data is distributed evenly between startKey and stopKey. That assumption may not be correct for any specific start/stop key combination.This method is called internally by Beam. Do not call it directly.
- Specified by:
split
in classBoundedSource<Result>
- Parameters:
desiredBundleSizeBytes
- The desired size for each bundle, in bytes.options
- The pipeline options.- Returns:
- A list of sources split into groups.
- Throws:
Exception
-
getSplits
protected List<CloudBigtableIO.SourceWithKeys> getSplits(long desiredBundleSizeBytes) throws Exception
- Throws:
Exception
-
isWithinRange
protected static boolean isWithinRange(byte[] scanStartKey, byte[] scanEndKey, byte[] startKey, byte[] endKey)
Checks if the range of the region is within the range of the scan.
-
getSampleRowKeys
public List<com.google.bigtable.repackaged.com.google.cloud.bigtable.data.v2.models.KeyOffset> getSampleRowKeys() throws IOException
Performs a call to get sample row keys fromCloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration)
if they are not yet cached. The sample row keys give information about tablet key boundaries and estimated sizes.- Throws:
IOException
-
validate
public void validate()
Validates the existence of the table in the configuration.
-
split
protected List<CloudBigtableIO.SourceWithKeys> split(long regionSize, long desiredBundleSizeBytes, byte[] startKey, byte[] stopKey) throws IOException
Splits the region based on the start and stop key. UsesBytes.split(byte[], byte[], int)
under the covers.- Throws:
IOException
-
createReader
public BoundedSource.BoundedReader<Result> createReader(PipelineOptions options)
Creates a reader that will scan the entire table based on theScan
in the configuration.- Specified by:
createReader
in classBoundedSource<Result>
- Returns:
- A reader for the table.
-
getConfiguration
protected CloudBigtableScanConfiguration getConfiguration()
- Returns:
- the configuration
-
populateDisplayData
public void populateDisplayData(DisplayData.Builder builder)
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classSource<Result>
-
-