CloudBigtableIO.SourceWithKeys (Google Cloud Dataflow + Cloud Bigtable Connector 0.3.0 API)

com.google.cloud.bigtable.dataflow

Class CloudBigtableIO.SourceWithKeys<ResultOutputType>

    • Field Detail

      • SOURCE_LOG

        protected static final org.slf4j.Logger SOURCE_LOG
      • SIZED_BASED_MAX_SPLIT_COUNT

        protected static final long SIZED_BASED_MAX_SPLIT_COUNT
        See Also:
        Constant Field Values
      • configuration

        protected final CloudBigtableScanConfiguration configuration
        Configuration for a Cloud Bigtable connection, a table, and an optional scan.
      • coderTypeOrdinal

        protected final int coderTypeOrdinal
      • scanIterator

        protected final com.google.cloud.bigtable.dataflow.CloudBigtableIO.ScanIterator<ResultOutputType> scanIterator
    • Constructor Detail

      • SourceWithKeys

        protected SourceWithKeys(CloudBigtableScanConfiguration configuration,
                                 com.google.cloud.bigtable.dataflow.CloudBigtableIO.CoderType coderType,
                                 com.google.cloud.bigtable.dataflow.CloudBigtableIO.ScanIterator<ResultOutputType> scanIterator,
                                 long estimatedSize)
    • Method Detail

      • getEstimatedSizeBytes

        public long getEstimatedSizeBytes(PipelineOptions options)
                                   throws Exception
        Gets an estimate of the size of the source.

        NOTE: This value is a guesstimate. It could be significantly off, especially if there is aScan selected in the configuration. It will also be off if the start and stop keys are calculated via CloudBigtableIO.Source.splitIntoBundles(long, PipelineOptions).

        Parameters:
        options - The pipeline options.
        Returns:
        The estimated size of the source, in bytes.
        Throws:
        Exception
      • getEstimatedSize

        public long getEstimatedSize()
      • splitIntoBundles

        public List<? extends BoundedSource<ResultOutputType>> splitIntoBundles(long desiredBundleSizeBytes,
                                                                                PipelineOptions options)
                                                                         throws Exception
        Splits the bundle based on the assumption that the data is distributed evenly between startKey and stopKey. That assumption may not be correct for any specific start/stop key combination.

        This method is called internally by Cloud Dataflow. Do not call it directly.

        Specified by:
        splitIntoBundles in class BoundedSource<ResultOutputType>
        Parameters:
        desiredBundleSizeBytes - The desired size for each bundle, in bytes.
        options - The pipeline options.
        Returns:
        A list of sources split into groups.
        Throws:
        Exception
      • getStartRow

        public byte[] getStartRow()
      • getStopRow

        public byte[] getStopRow()
      • isWithinRange

        protected static boolean isWithinRange(byte[] scanStartKey,
                                               byte[] scanEndKey,
                                               byte[] startKey,
                                               byte[] endKey)
        Checks if the range of the region is within the range of the scan.
      • getSampleRowKeys

        public List<com.google.bigtable.v1.SampleRowKeysResponse> getSampleRowKeys()
                                                                            throws IOException
        Performs a call to get sample row keys from BigtableDataClient.sampleRowKeys(SampleRowKeysRequest) if they are not yet cached. The sample row keys give information about tablet key boundaries and estimated sizes.
        Throws:
        IOException
      • validate

        public void validate()
        Validates the existence of the table in the configuration.
        Specified by:
        validate in class Source<ResultOutputType>
      • producesSortedKeys

        public boolean producesSortedKeys(PipelineOptions options)
                                   throws Exception
        Checks whether the pipeline produces sorted keys.

        NOTE: HBase supports reverse scans, but Cloud Bigtable does not.

        Specified by:
        producesSortedKeys in class BoundedSource<ResultOutputType>
        Parameters:
        options - The pipeline options.
        Returns:
        Whether the pipeline produces sorted keys.
        Throws:
        Exception


Send feedback about...

Cloud Bigtable Documentation