Programmatically Scaling Bigtable

In some cases, it can be useful to scale your Cloud Bigtable cluster programmatically based on metrics such as the cluster's CPU usage. For example, if your cluster is under heavy load, and its CPU usage is extremely high, you can add nodes to the cluster until its CPU usage drops. You can also save money by removing nodes from the cluster when it is not being used heavily.

This page explains how to scale your Bigtable cluster programmatically and provides a code sample that you can use as a starting point. It also describes some limitations that you should be aware of before you set up programmatic scaling.

How to scale Bigtable programmatically

Bigtable exposes a variety of metrics through the Cloud Monitoring API. You can programmatically monitor these metrics for your cluster, then use one of the Bigtable client libraries or the gcloud command-line tool to add or remove nodes based on the current metrics. After you resize your cluster, you can monitor its performance through the Cloud Console; through a Cloud Monitoring custom dashboard; or programmatically.

Monitoring API metrics

The Monitoring API provides a variety of metrics that you can use to monitor the current state of your cluster. Some of the most useful metrics for programmatic scaling include:

  • The cluster's CPU load.
  • The number of nodes in the cluster.
  • The storage used as a fraction of total storage capacity.
  • The distribution of server request latencies for a table.

Sample code

As a starting point for your own programmatic scaling tool, you can use one of the following sample tools:

The sample tools add nodes to a Bigtable cluster when its CPU load is above a specified value. Similarly, the sample tools remove nodes from a Bigtable cluster when its CPU load is below a specified value. To run the sample tools, follow the instructions for each sample on GitHub.

The sample tools use the following code to gather information about the CPU load on the cluster:


Timestamp now = timeXMinutesAgo(0);
Timestamp fiveMinutesAgo = timeXMinutesAgo(5);
TimeInterval interval =
String filter = "metric.type=\"" + CPU_METRIC + "\"";
ListTimeSeriesPagedResponse response =
    metricServiceClient.listTimeSeries(projectName, filter, interval, TimeSeriesView.FULL);
return response.getPage().getValues().iterator().next().getPointsList().get(0);


client = monitoring_v3.MetricServiceClient()
cpu_query = query.Query(client,
cpu_query = cpu_query.select_resources(instance=bigtable_instance, cluster=bigtable_cluster)
cpu = next(cpu_query.iter())
return cpu.points[0].value.double_value

Based on the CPU load, the sample tools use the Bigtable client library to resize the cluster:


double latestValue = getLatestValue().getValue().getDoubleValue();
if (latestValue < CPU_PERCENT_TO_DOWNSCALE) {
  int clusterSize = clusterUtility.getClusterNodeCount(clusterId, zoneId);
  if (clusterSize > MIN_NODE_COUNT) {
    clusterUtility.setClusterSize(clusterId, zoneId,
      Math.max(clusterSize - SIZE_CHANGE_STEP, MIN_NODE_COUNT));
} else if (latestValue > CPU_PERCENT_TO_UPSCALE) {
  int clusterSize = clusterUtility.getClusterNodeCount(clusterId, zoneId);
  if (clusterSize <= MAX_NODE_COUNT) {
    clusterUtility.setClusterSize(clusterId, zoneId,
      Math.min(clusterSize + SIZE_CHANGE_STEP, MAX_NODE_COUNT));


bigtable_client = bigtable.Client(admin=True)
instance = bigtable_client.instance(bigtable_instance)

if instance.type_ == enums.Instance.Type.DEVELOPMENT:
    raise ValueError("Development instances cannot be scaled.")

cluster = instance.cluster(bigtable_cluster)

current_node_count = cluster.serve_nodes

if scale_up:
    if current_node_count < max_node_count:
        new_node_count = min(
            current_node_count + size_change_step, max_node_count)
        cluster.serve_nodes = new_node_count
        cluster.update()'Scaled up from {} to {} nodes.'.format(
            current_node_count, new_node_count))
    if current_node_count > min_node_count:
        new_node_count = max(
            current_node_count - size_change_step, min_node_count)
        cluster.serve_nodes = new_node_count
        cluster.update()'Scaled down from {} to {} nodes.'.format(
            current_node_count, new_node_count))

After the cluster is resized, you can use the Cloud Console or a Cloud Monitoring custom dashboard to monitor how its performance changes over time.


Before you set up programmatic scaling for your Bigtable cluster, be sure to consider the following limitations.

Delay in performance improvements

After you add nodes to a cluster, it can take up to 20 minutes under load before you see a significant improvement in the cluster's performance. As a result, if your workload involves short bursts of high activity, adding nodes to your cluster based on CPU load will not improve performance—by the time Bigtable rebalances your data, the short burst of activity will be over.

To address this issue, you can add nodes to your cluster, either programmatically or through the Google Cloud Console, before you increase the load on the cluster. This approach gives Bigtable time to rebalance your data across the additional nodes.

Latency increases caused by scaling down too quickly

When you decrease the number of nodes in a cluster to scale down, try not to reduce the cluster size by more than 10% in a 10-minute period. Scaling down too quickly can cause performance problems, such as increased latency, if the remaining nodes in the cluster become temporarily overwhelmed.

Schema design issues

If there are problems with the schema design for your table, adding nodes to your Bigtable cluster may not improve performance. For example, if you have a large number of reads or writes to a single row in your table, all of the reads or writes will go to the same node in your cluster; as a result, adding nodes will not improve performance. In contrast, if reads and writes are evenly distributed across rows in your table, adding nodes will generally improve performance.

See Designing Your Schema for details about how to design a schema that enables Bigtable to scale effectively.

What's next