com.google.cloud.bigtable.beam
Class CloudBigtableIO.Source
- java.lang.Object
-
- org.apache.beam.sdk.io.Source<T>
-
- org.apache.beam.sdk.io.BoundedSource<Result>
-
- com.google.cloud.bigtable.beam.CloudBigtableIO.Source
-
- All Implemented Interfaces:
- Serializable, HasDisplayData
- Enclosing class:
- CloudBigtableIO
public static class CloudBigtableIO.Source extends BoundedSource<Result>
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.BoundedSource
BoundedSource.BoundedReader<T>
-
Nested classes/interfaces inherited from class org.apache.beam.sdk.io.Source
Source.Reader<T>
-
-
Field Summary
Fields Modifier and Type Field and Description protected static long
SIZED_BASED_MAX_SPLIT_COUNT
protected static org.slf4j.Logger
SOURCE_LOG
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method and Description BoundedSource.BoundedReader<Result>
createReader(PipelineOptions options)
Creates a reader that will scan the entire table based on theScan
in the configuration.protected CloudBigtableScanConfiguration
getConfiguration()
long
getEstimatedSizeBytes(PipelineOptions options)
Gets an estimated size based on data returned fromCloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration)
.Coder<Result>
getOutputCoder()
List<com.google.bigtable.repackaged.com.google.cloud.bigtable.data.v2.models.KeyOffset>
getSampleRowKeys()
Performs a call to get sample row keys fromCloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration)
if they are not yet cached.protected List<CloudBigtableIO.SourceWithKeys>
getSplits(long desiredBundleSizeBytes)
protected static boolean
isWithinRange(byte[] scanStartKey, byte[] scanEndKey, byte[] startKey, byte[] endKey)
Checks if the range of the region is within the range of the scan.void
populateDisplayData(DisplayData.Builder builder)
protected List<CloudBigtableIO.SourceWithKeys>
split(long regionSize, long desiredBundleSizeBytes, byte[] startKey, byte[] stopKey)
Splits the region based on the start and stop key.List<? extends BoundedSource<Result>>
split(long desiredBundleSizeBytes, PipelineOptions options)
Splits the table based on keys that belong to tablets, known as "regions" in the HBase API.void
validate()
Validates the existence of the table in the configuration.-
Methods inherited from class org.apache.beam.sdk.io.Source
getDefaultOutputCoder
-
-
-
-
Field Detail
-
SOURCE_LOG
protected static final org.slf4j.Logger SOURCE_LOG
-
SIZED_BASED_MAX_SPLIT_COUNT
protected static final long SIZED_BASED_MAX_SPLIT_COUNT
- See Also:
- Constant Field Values
-
-
Method Detail
-
split
public List<? extends BoundedSource<Result>> split(long desiredBundleSizeBytes, PipelineOptions options) throws Exception
Splits the table based on keys that belong to tablets, known as "regions" in the HBase API. The current implementation uses the HBaseRegionLocator
interface, which callsCloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration)
under the covers. ACloudBigtableIO.SourceWithKeys
may correspond to a single region or a portion of a region.If a split is smaller than a single region, the split is calculated based on the assumption that the data is distributed evenly between the region's startKey and stopKey. That assumption may not be correct for any specific start/stop key combination.
This method is called internally by Beam. Do not call it directly.
- Specified by:
split
in classBoundedSource<Result>
- Parameters:
desiredBundleSizeBytes
- The desired size for each bundle, in bytes.options
- The pipeline options.- Returns:
- A list of sources split into groups.
- Throws:
Exception
-
getEstimatedSizeBytes
public long getEstimatedSizeBytes(PipelineOptions options) throws IOException
Gets an estimated size based on data returned fromCloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration)
. The estimate will be high if aScan
is set on theCloudBigtableScanConfiguration
; in such cases, the estimate will not take the Scan into account, and will return a larger estimate than what theCloudBigtableIO.Reader
will actually read.- Parameters:
options
- The pipeline options.- Returns:
- The estimated size of the data, in bytes.
- Throws:
IOException
-
getSplits
protected List<CloudBigtableIO.SourceWithKeys> getSplits(long desiredBundleSizeBytes) throws Exception
- Throws:
Exception
-
isWithinRange
protected static boolean isWithinRange(byte[] scanStartKey, byte[] scanEndKey, byte[] startKey, byte[] endKey)
Checks if the range of the region is within the range of the scan.
-
getSampleRowKeys
public List<com.google.bigtable.repackaged.com.google.cloud.bigtable.data.v2.models.KeyOffset> getSampleRowKeys() throws IOException
Performs a call to get sample row keys fromCloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration)
if they are not yet cached. The sample row keys give information about tablet key boundaries and estimated sizes.- Throws:
IOException
-
validate
public void validate()
Validates the existence of the table in the configuration.
-
split
protected List<CloudBigtableIO.SourceWithKeys> split(long regionSize, long desiredBundleSizeBytes, byte[] startKey, byte[] stopKey) throws IOException
Splits the region based on the start and stop key. UsesBytes.split(byte[], byte[], int)
under the covers.- Throws:
IOException
-
createReader
public BoundedSource.BoundedReader<Result> createReader(PipelineOptions options)
Creates a reader that will scan the entire table based on theScan
in the configuration.- Specified by:
createReader
in classBoundedSource<Result>
- Returns:
- A reader for the table.
-
getConfiguration
protected CloudBigtableScanConfiguration getConfiguration()
- Returns:
- the configuration
-
populateDisplayData
public void populateDisplayData(DisplayData.Builder builder)
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Overrides:
populateDisplayData
in classSource<Result>
-
-