BigtableIO (Google Cloud Dataflow SDK 1.9.1 API)

Google Cloud Dataflow SDK for Java, version 1.9.1

com.google.cloud.dataflow.sdk.io.bigtable

Class BigtableIO



  • @Experimental
    public class BigtableIO
    extends Object
    A bounded source and sink for Google Cloud Bigtable.

    For more information, see the online documentation at Google Cloud Bigtable.

    Reading from Cloud Bigtable

    The Bigtable source returns a set of rows from a single table, returning a PCollection<Row>.

    To configure a Cloud Bigtable source, you must supply a table id and a BigtableOptions or builder configured with the project and other information necessary to identify the Bigtable cluster. By default, BigtableIO.Read will read all rows in the table. The row range to be read can optionally be restricted using BigtableIO.Read.withKeyRange(com.google.cloud.dataflow.sdk.io.range.ByteKeyRange), and a RowFilter can be specified using BigtableIO.Read.withRowFilter(com.google.bigtable.v1.RowFilter). For example:

    
     BigtableOptions.Builder optionsBuilder =
         new BigtableOptions.Builder()
             .setProjectId("project")
             .setClusterId("cluster")
             .setZoneId("zone");
    
     Pipeline p = ...;
    
     // Scan the entire table.
     p.apply("read",
         BigtableIO.read()
             .withBigtableOptions(optionsBuilder)
             .withTableId("table"));
    
     // Scan a prefix of the table.
     ByteKeyRange keyRange = ...;
     p.apply("read",
         BigtableIO.read()
             .withBigtableOptions(optionsBuilder)
             .withTableId("table")
             .withKeyRange(keyRange));
    
     // Scan a subset of rows that match the specified row filter.
     p.apply("filtered read",
         BigtableIO.read()
             .withBigtableOptions(optionsBuilder)
             .withTableId("table")
             .withRowFilter(filter));
     

    Writing to Cloud Bigtable

    The Bigtable sink executes a set of row mutations on a single table. It takes as input a PCollection<KV<ByteString, Iterable<Mutation>>>, where the ByteString is the key of the row being mutated, and each Mutation represents an idempotent transformation to that row.

    To configure a Cloud Bigtable sink, you must supply a table id and a BigtableOptions or builder configured with the project and other information necessary to identify the Bigtable cluster, for example:

    
     BigtableOptions.Builder optionsBuilder =
         new BigtableOptions.Builder()
             .setProjectId("project")
             .setClusterId("cluster")
             .setZoneId("zone");
    
     PCollection<KV<ByteString, Iterable<Mutation>>> data = ...;
    
     data.apply("write",
         BigtableIO.write()
             .withBigtableOptions(optionsBuilder)
             .withTableId("table"));
     

    Experimental

    This connector for Cloud Bigtable is considered experimental and may break or receive backwards-incompatible changes in future versions of the Cloud Dataflow SDK. Cloud Bigtable is in Beta, and thus it may introduce breaking changes in future revisions of its service or APIs.

    Permissions

    Permission requirements depend on the PipelineRunner that is used to execute the Dataflow job. Please refer to the documentation of corresponding PipelineRunners for more details.


Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow