CloudBigtableIO (Apache Beam + Cloud Bigtable Connector 1.0.0-pre3 API)

com.google.cloud.bigtable.beam

Class CloudBigtableIO



  • @Experimental
    public class CloudBigtableIO
    extends Object

    Utilities to create PTransforms for reading and writing Google Cloud Bigtable entities in a Beam pipeline.

    Google Cloud Bigtable offers you a fast, fully managed, massively scalable NoSQL database service that's ideal for web, mobile, and Internet of Things applications requiring terabytes to petabytes of data. Unlike comparable market offerings, Cloud Bigtable doesn't require you to sacrifice speed, scale, or cost efficiency when your applications grow. Cloud Bigtable has been battle-tested at Google for more than 10 years--it's the database driving major applications such as Google Analytics and Gmail.

    To use CloudBigtableIO, users must use gcloud to get a credential for Cloud Bigtable:

     $ gcloud auth login
     

    To read a PCollection from a table, with an optional Scan, use read(CloudBigtableScanConfiguration):

     
     PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
     Pipeline p = Pipeline.create(options);
     PCollection<Result> = p.apply(
       Read.from(CloudBigtableIO.read(
          new CloudBigtableScanConfiguration.Builder()
              .withProjectId("project-id")
              .withInstanceId("instance-id")
              .withTableId("table-id")
              .build())));
     
     

    To write a PCollection to a table, use writeToTable(CloudBigtableTableConfiguration):

     
     PipelineOptions options =
         PipelineOptionsFactory.fromArgs(args).create();
     Pipeline p = Pipeline.create(options);
     PCollection<Mutation> mutationCollection = ...;
     mutationCollection.apply(
       CloudBigtableIO.writeToTable(
          new CloudBigtableScanConfiguration.Builder()
              .withProjectId("project-id")
              .withInstanceId("instance-id")
              .withTableId("table-id")
              .build()));
     
     
    • Constructor Detail

      • CloudBigtableIO

        public CloudBigtableIO()
    • Method Detail

      • writeToTable

        public static org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<Mutation>,org.apache.beam.sdk.values.PDone> writeToTable(CloudBigtableTableConfiguration config)
        Creates a PTransform that can write either a bounded or unbounded PCollection of Mutations to a table specified via a CloudBigtableTableConfiguration.

        NOTE: This PTransform will write Puts and Deletes, not Appends and Increments. This limitation exists because if the batch fails partway through, Appends/Increments might be re-run, causing the Mutation to be executed twice, which is never the user's intent. Re-running a Delete will not cause any differences. Re-running a Put isn't normally a problem, but might cause problems in some cases when the number of versions supported by the column family is greater than one. In a case where multiple versions could be a problem, it's best to add a timestamp to the Put.

      • writeToMultipleTables

        public static org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<String,Iterable<Mutation>>>,org.apache.beam.sdk.values.PDone> writeToMultipleTables(CloudBigtableConfiguration config)
        Creates a PTransform that can write either a bounded or unbounded PCollection of KV of (String tableName, List of Mutations) to the specified table.

        NOTE: This PTransform will write Puts and Deletes, not Appends and Increments. This limitation exists because if the batch fails partway through, Appends/Increments might be re-run, causing the Mutation to be executed twice, which is never the user's intent. Re-running a Delete will not cause any differences. Re-running a Put isn't normally a problem, but might cause problems in some cases when the number of versions supported by the column family is greater than one. In a case where multiple versions could be a problem, it's best to add a timestamp to the Put.


Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Bigtable Documentation