Bigtable Beam connector

The Bigtable Beam connector (BigtableIO) is an open source Apache Beam I/O connector that can help you perform batch and streaming operations on Bigtable data in a pipeline using Dataflow.

If you are migrating from HBase to Bigtable or you are running an application uses the HBase API instead of the Bigtable APIs, use the Bigtable HBase Beam connector (CloudBigtableIO) instead of the connector described on this page.

Connector details

The Bigtable Beam connector is a component of the Apache Beam GitHub repository. The Javadoc is available at Class BigtableIO.

Before you create a Dataflow pipeline, check Apache Beam runtime support to make sure you are using a version of Java that is supported for Dataflow. Use the most recent supported release of Apache Beam.

The Bigtable Beam connector is used in conjunction with the Bigtable client for Java, a client library that calls the Bigtable APIs. You write code to deploy a pipeline that uses the connector to Dataflow, which handles the provisioning and management of resources and assists with the scalability and reliability of data processing.

For more information on the Apache Beam programming model, see the Beam documentation.

Batch write flow control

When you send batch writes to a table using the Bigtable Beam connector , you can enable batch write flow control. When this feature is enabled, Bigtable automatically does the following:

  • Rate-limits traffic to avoid overloading your Bigtable cluster
  • Ensures the cluster is under enough load to trigger Bigtable autoscaling (if enabled), so that more nodes are automatically added to the cluster when needed

For details, see Batch write flow control.

What's next