CloudBigtableIO.Source (Apache Beam + Cloud Bigtable Connector 1.0.0-pre3 API)

com.google.cloud.bigtable.beam

Class CloudBigtableIO.Source

  • java.lang.Object
    • org.apache.beam.sdk.io.Source<T>
      • org.apache.beam.sdk.io.BoundedSource<Result>
        • com.google.cloud.bigtable.beam.CloudBigtableIO.Source
  • All Implemented Interfaces:
    Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData
    Enclosing class:
    CloudBigtableIO


    public static class CloudBigtableIO.Source
    extends org.apache.beam.sdk.io.BoundedSource<Result>
    A BoundedSource for a Cloud Bigtable Table, which is potentially filtered by a Scan.
    See Also:
    Serialized Form
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.beam.sdk.io.BoundedSource

        org.apache.beam.sdk.io.BoundedSource.BoundedReader<T>
      • Nested classes/interfaces inherited from class org.apache.beam.sdk.io.Source

        org.apache.beam.sdk.io.Source.Reader<T>
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method and Description
      org.apache.beam.sdk.io.BoundedSource.BoundedReader<Result> createReader(org.apache.beam.sdk.options.PipelineOptions options)
      Creates a reader that will scan the entire table based on the Scan in the configuration.
      protected CloudBigtableScanConfiguration getConfiguration() 
      org.apache.beam.sdk.coders.Coder<Result> getDefaultOutputCoder() 
      long getEstimatedSizeBytes(org.apache.beam.sdk.options.PipelineOptions options)
      Gets an estimated size based on data returned from BigtableDataClient.sampleRowKeys(com.google.bigtable.repackaged.com.google.bigtable.v2.SampleRowKeysRequest).
      List<com.google.bigtable.repackaged.com.google.bigtable.v2.SampleRowKeysResponse> getSampleRowKeys()
      Performs a call to get sample row keys from BigtableDataClient.sampleRowKeys(com.google.bigtable.repackaged.com.google.bigtable.v2.SampleRowKeysRequest) if they are not yet cached.
      protected List<CloudBigtableIO.SourceWithKeys> getSplits(long desiredBundleSizeBytes) 
      protected static boolean isWithinRange(byte[] scanStartKey, byte[] scanEndKey, byte[] startKey, byte[] endKey)
      Checks if the range of the region is within the range of the scan.
      void populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder) 
      protected List<CloudBigtableIO.SourceWithKeys> split(long regionSize, long desiredBundleSizeBytes, byte[] startKey, byte[] stopKey)
      Splits the region based on the start and stop key.
      List<? extends org.apache.beam.sdk.io.BoundedSource<Result>> split(long desiredBundleSizeBytes, org.apache.beam.sdk.options.PipelineOptions options)
      Splits the table based on keys that belong to tablets, known as "regions" in the HBase API.
      void validate()
      Validates the existence of the table in the configuration.
    • Field Detail

      • SOURCE_LOG

        protected static final org.slf4j.Logger SOURCE_LOG
      • SIZED_BASED_MAX_SPLIT_COUNT

        protected static final long SIZED_BASED_MAX_SPLIT_COUNT
        See Also:
        Constant Field Values
    • Method Detail

      • split

        public List<? extends org.apache.beam.sdk.io.BoundedSource<Result>> split(long desiredBundleSizeBytes,
                                                                                  org.apache.beam.sdk.options.PipelineOptions options)
                                                                           throws Exception
        Splits the table based on keys that belong to tablets, known as "regions" in the HBase API. The current implementation uses the HBase RegionLocator interface, which calls BigtableDataClient.sampleRowKeys(com.google.bigtable.repackaged.com.google.bigtable.v2.SampleRowKeysRequest) under the covers. A CloudBigtableIO.SourceWithKeys may correspond to a single region or a portion of a region.

        If a split is smaller than a single region, the split is calculated based on the assumption that the data is distributed evenly between the region's startKey and stopKey. That assumption may not be correct for any specific start/stop key combination.

        This method is called internally by Beam. Do not call it directly.

        Specified by:
        split in class org.apache.beam.sdk.io.BoundedSource<Result>
        Parameters:
        desiredBundleSizeBytes - The desired size for each bundle, in bytes.
        options - The pipeline options.
        Returns:
        A list of sources split into groups.
        Throws:
        Exception
      • getEstimatedSizeBytes

        public long getEstimatedSizeBytes(org.apache.beam.sdk.options.PipelineOptions options)
                                   throws IOException
        Gets an estimated size based on data returned from BigtableDataClient.sampleRowKeys(com.google.bigtable.repackaged.com.google.bigtable.v2.SampleRowKeysRequest). The estimate will be high if a Scan is set on the CloudBigtableScanConfiguration; in such cases, the estimate will not take the Scan into account, and will return a larger estimate than what the CloudBigtableIO.Reader will actually read.
        Parameters:
        options - The pipeline options.
        Returns:
        The estimated size of the data, in bytes.
        Throws:
        IOException
      • getDefaultOutputCoder

        public org.apache.beam.sdk.coders.Coder<Result> getDefaultOutputCoder()
      • isWithinRange

        protected static boolean isWithinRange(byte[] scanStartKey,
                                               byte[] scanEndKey,
                                               byte[] startKey,
                                               byte[] endKey)
        Checks if the range of the region is within the range of the scan.
      • getSampleRowKeys

        public List<com.google.bigtable.repackaged.com.google.bigtable.v2.SampleRowKeysResponse> getSampleRowKeys()
                                                                                                           throws IOException
        Performs a call to get sample row keys from BigtableDataClient.sampleRowKeys(com.google.bigtable.repackaged.com.google.bigtable.v2.SampleRowKeysRequest) if they are not yet cached. The sample row keys give information about tablet key boundaries and estimated sizes.
        Throws:
        IOException
      • validate

        public void validate()
        Validates the existence of the table in the configuration.
        Specified by:
        validate in class org.apache.beam.sdk.io.Source<Result>
      • createReader

        public org.apache.beam.sdk.io.BoundedSource.BoundedReader<Result> createReader(org.apache.beam.sdk.options.PipelineOptions options)
        Creates a reader that will scan the entire table based on the Scan in the configuration.
        Specified by:
        createReader in class org.apache.beam.sdk.io.BoundedSource<Result>
        Returns:
        A reader for the table.
      • populateDisplayData

        public void populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
        Specified by:
        populateDisplayData in interface org.apache.beam.sdk.transforms.display.HasDisplayData
        Overrides:
        populateDisplayData in class org.apache.beam.sdk.io.Source<Result>


Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Bigtable Documentation