CloudBigtableIO.SourceWithKeys (Apache Beam + Cloud Bigtable Connector 1.6.0 API)

Class CloudBigtableIO.SourceWithKeys

  • java.lang.Object
  • All Implemented Interfaces:
    Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData
    Enclosing class:

    protected static class CloudBigtableIO.SourceWithKeys
    A BoundedSource for a Cloud Bigtable Table with a start/stop key range, along with a potential filter via a Scan.
    See Also:
    Serialized Form
    • Nested Class Summary

      • Nested classes/interfaces inherited from class<T>
      • Nested classes/interfaces inherited from class<T>
    • Field Detail

      • SOURCE_LOG

        protected static final org.slf4j.Logger SOURCE_LOG

        protected static final long SIZED_BASED_MAX_SPLIT_COUNT
        See Also:
        Constant Field Values
    • Method Detail

      • getEstimatedSizeBytes

        public long getEstimatedSizeBytes(org.apache.beam.sdk.options.PipelineOptions options)
        Gets an estimate of the size of the source.

        NOTE: This value is a guesstimate. It could be significantly off, especially if there is aScan selected in the configuration. It will also be off if the start and stop keys are calculated via CloudBigtableIO.Source.split(long, PipelineOptions).

        options - The pipeline options.
        The estimated size of the source, in bytes.
      • getEstimatedSize

        public long getEstimatedSize()
      • split

        public List<? extends<Result>> split(long desiredBundleSizeBytes,
                                                                                  org.apache.beam.sdk.options.PipelineOptions options)
                                                                           throws Exception
        Splits the bundle based on the assumption that the data is distributed evenly between startKey and stopKey. That assumption may not be correct for any specific start/stop key combination.

        This method is called internally by Beam. Do not call it directly.

        Specified by:
        split in class<Result>
        desiredBundleSizeBytes - The desired size for each bundle, in bytes.
        options - The pipeline options.
        A list of sources split into groups.
      • getOutputCoder

        public org.apache.beam.sdk.coders.Coder<Result> getOutputCoder()
      • isWithinRange

        protected static boolean isWithinRange(byte[] scanStartKey,
                                               byte[] scanEndKey,
                                               byte[] startKey,
                                               byte[] endKey)
        Checks if the range of the region is within the range of the scan.
      • getSampleRowKeys

        public List<> getSampleRowKeys()
                                                                                                           throws IOException
        Performs a call to get sample row keys from BigtableDataClient.sampleRowKeys( if they are not yet cached. The sample row keys give information about tablet key boundaries and estimated sizes.
      • validate

        public void validate()
        Validates the existence of the table in the configuration.
        validate in class<Result>
      • createReader

        public<Result> createReader(org.apache.beam.sdk.options.PipelineOptions options)
        Creates a reader that will scan the entire table based on the Scan in the configuration.
        Specified by:
        createReader in class<Result>
        A reader for the table.
      • populateDisplayData

        public void populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
        Specified by:
        populateDisplayData in interface org.apache.beam.sdk.transforms.display.HasDisplayData
        populateDisplayData in class<Result>

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Bigtable Documentation