Dataproc optional HBase component

Installation of the optional HBase component is limited to Dataproc clusters created with image version 1.5 or 2.0.

While Google Cloud provides many services that let you deploy self-managed Apache HBase, Bigtable is often the best option as it provides an open API with HBase and workload portability. HBase database tables can be migrated to Bigtable for management of the underlying data, while applications that previously interoperated with HBase, such as Spark, may remain on Dataproc and securely connect with Bigtable. In this guide, we provide the high-level steps for getting started with Bigtable and provide references for migrating data to Bigtable from Dataproc HBase deployments.

Get started with Bigtable

Cloud Bigtable is a highly scalable and performant NoSQL platform that provides Apache HBase API client compatibility and portability for HBase workloads. The client is compatible with HBase API versions 1.x and 2.x and may be included with the existing application to read and write to Bigtable. Existing HBase applications may add the Bigtable HBase client library to read and write data stored in Bigtable.

See Bigtable and the HBase API for more information on configuring your HBase application with Bigtable.

Create a Bigtable cluster

You can get started using Bigtable by creating a cluster and tables for storing data that was previously stored in HBase. Follow the steps in the Bigtable documentation for creating an instance, a cluster, and tables with the same schema as the HBase tables. For automated creation of tables from HBase table DDLs, refer to the schema translator tool.

Open the Bigtable instance in Google Cloud console to view the table and server-side monitoring charts, including rows per second, latency, and throughput, to manage the newly provisioned table. For additional information, see Monitoring.

Migrate data from Dataproc to Bigtable

After you create the tables in Bigtable, you can import and validate your data by following the guidance at Migrate HBase on Google Cloud to Bigtable. After you migrate the data, you can update applications to send reads and writes to Bigtable.

What's next

See Wordcount Spark examples for running Spark with the Bigtable.
Review online migration options with live replication from HBase to Bigtable.
Watch How Box modernized their NoSQL databases to understand other benefits.