Monitoring a Cloud Bigtable Instance

You can monitor your Cloud Bigtable instance visually, using the charts that are available in the Google Cloud Platform Console and Stackdriver Monitoring, or programmatically, using Stackdriver Monitoring.

The metrics for a Cloud Bigtable instance do not measure the performance of your application. Use Stackdriver Monitoring to monitor your application's performance.

Understanding CPU and disk usage

No matter what tools you use to monitor your instance, it's essential to monitor the CPU and disk usage for each cluster in the instance. If a cluster's CPU or disk usage exceeds certain thresholds, the cluster will not perform well, and it might return errors when you try to read or write data.

CPU usage

The nodes in your clusters use CPU resources to handle reads, writes, and administrative tasks. To learn more about how the number of nodes affects a cluster's performance, see Performance for typical workloads.

Cloud Bigtable reports the following metrics for CPU usage:

Metric Description
CPU utilization

The average CPU utilization across all nodes in the cluster.

In general, this value should be a maximum of 70%, or 35% if you use replication with multi-cluster routing. This maximum provides headroom for brief spikes in usage.

If a cluster exceeds this maximum value for more than a few minutes, add nodes to the cluster.

CPU utilization of hottest node

CPU utilization for the busiest node in the cluster.

In general, this value should be a maximum of 90%, or 45% if you use replication with multi-cluster routing.

If the hottest node is frequently above the recommended value, even when your average CPU utilization is reasonable, you might be accessing a small part of your data much more frequently than the rest of your data. Check your schema design to make sure it supports even distribution of reads and writes across each table.

Disk usage

For each cluster in your instance, Cloud Bigtable stores a separate copy of all of the tables in that instance.

Cloud Bigtable tracks disk usage in binary units, such as binary gigabytes (GB), where 1 GB is 230 bytes. This unit of measurement is also known as a gibibyte (GiB).

Cloud Bigtable reports the following metrics for disk usage:

Metric Description
Storage utilization (bytes)

The amount of data stored in the cluster.

This value affects your costs. Also, as described below, you might need to add nodes to each cluster as the amount of data increases.

Storage utilization (% max)

The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster.

In general, this value should be a maximum of 70%. If you do not plan to add significant amounts of data to your cluster, this value can be higher, up to 100%.

If this value is too high, add nodes to the cluster. You can also delete existing data, but deleted data takes up more space, not less, until a compaction occurs.

For details about how this value is calculated, see Storage utilization per node.

Disk load

The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters.

If this value is frequently at 100%, you might experience increased latency. Add nodes to the cluster to reduce the disk load percentage.

Getting a performance overview with the GCP Console

Use your instance's overview page to understand the current health of your instance's clusters.

The overview page shows the current values of several key metrics for each cluster:

Metric Description
CPU utilization average The average CPU utilization across all nodes in the cluster.
CPU utilization of hottest node

CPU utilization for the busiest node in the cluster.

Exceeding the recommended maximum for the busiest node can cause latency and other issues for the cluster.

Rows read The number of rows read per second.
Rows written The number of rows written per second.
Read throughput The number of uncompressed bytes per second that were read.
Write throughput The number of uncompressed bytes per second that were written.
System error rate The percentage of all requests that failed on the Cloud Bigtable server side.
Replication latency for input The average amount of time at the 99th percentile, in seconds, between a write to another cluster and the same write being replicated to this cluster.
Replication latency for output The average amount of time at the 99th percentile, in seconds, between a write to this cluster and the same write being replicated to another cluster.

To see an overview of these key metrics:

  1. Open the list of Cloud Bigtable instances in the GCP Console.

    Open the instance list

  2. Click the instance whose metrics you want to view. The GCP Console displays the current metrics for your instance's clusters.

Monitoring performance over time with the GCP Console

Use your instance's monitoring page to understand the past performance of your instance. You can analyze the performance of each cluster, and you can break down the metrics for different types of Cloud Bigtable resources. Charts can display a period ranging from the past 1 hour to the past 30 days.

Charts for Cloud Bigtable resources

The monitoring page provides charts for the following types of Cloud Bigtable resources:

  • Instances
  • Tables
  • Application profiles

Charts are available for the following metrics:

Metric Available for Description
CPU utilization Instances The average CPU utilization across all nodes in the cluster.
CPU utilization (hottest node) Instances

CPU utilization for the busiest node in the cluster.

Exceeding the recommended maximum for the busiest node can cause latency and other issues for the cluster.

System error rate Instances
Tables
App profiles
The percentage of all requests that failed on the Cloud Bigtable server side.
Storage utilization (bytes) Instances
Tables

The amount of data stored in the cluster.

This metric reflects the fact that Cloud Bigtable compresses your data when it is stored.

Storage utilization (% max) Instances

The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster.

For details about how this value is calculated, see Storage utilization per node.

Disk load Instances The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters.
Rows read Instances
Tables
App profiles

The number of rows read per second.

This metric provides a more useful view of Cloud Bigtable's overall throughput than the number of read requests, because a single request can read a large number of rows.

Rows written Instances
Tables
App profiles

The number of rows written per second.

This metric provides a more useful view of Cloud Bigtable's overall throughput than the number of write requests, because a single request can write a large number of rows.

Read requests Instances
Tables
App profiles
The number of random reads and scan requests per second.
Write requests Instances
Tables
App profiles
The number of write requests per second.
Read throughput Instances
Tables
App profiles
The number of uncompressed bytes per second that were read.
Write throughput Instances
Tables
App profiles
The number of uncompressed bytes per second that were written.
Node count Instances The number of nodes in the cluster.

To view metrics for these resources:

  1. Open the list of Cloud Bigtable instances in the GCP Console.

    Open the instance list

  2. Click the instance whose metrics you want to view.

  3. Click the Monitoring tab. The GCP Console displays a series of charts for the instance, as well as a tabular view of the instance's metrics. By default, the GCP Console shows metrics for the past hour.

    To view all of the charts, scroll through the pane where the charts are displayed.

    To view metrics for individual tables or application profiles, click the View metrics for drop-down list, then select Tables or Application profiles.

    To change whether the metrics are aggregated for the instance as a whole or presented separately for each cluster, use the buttons under Group by.

    To view metrics for a longer period of time, click one of the time scales to the upper right of the charts.

Charts for replication

The monitoring page provides a chart that shows replication latency over time. You can view the average latency for replicating writes at the 50th, 99th, and 100th percentiles.

To view the replication latency over time:

  1. Open the list of Cloud Bigtable instances in the GCP Console.

    Open the instance list

  2. Click the instance whose metrics you want to view.

  3. Click the Monitoring tab.

  4. In the View metrics for drop-down list, select Replication. The GCP Console displays replication latency over time. By default, the GCP Console shows replication latency for the past hour.

    You may see a gray bar covering part of the graph. The bar indicates that replication was not occurring during that period of time, either because there were no incoming writes or because of an issue with the Cloud Bigtable service. Latency metrics during these periods may not be accurate.

    To change whether the metrics are aggregated for the instance as a whole or presented separately for each cluster, click one of the buttons under Group by.

    To change which percentile to view, click one of the buttons under Percentile.

    To view metrics for a longer period of time, click one of the time scales to the upper right of the charts.

Monitoring an instance with Stackdriver Monitoring

Cloud Bigtable exports usage metrics that you can monitor programmatically using Stackdriver Monitoring. You can use the Stackdriver Monitoring API or the Metrics Explorer to track Cloud Bigtable usage metrics. In addition, you can set up alerting policies based on usage metrics, and you can add charts for Cloud Bigtable usage metrics to a custom dashboard.

To view usage metrics in the Metrics Explorer:

  1. Open the Monitoring page in the GCP Console.

    Open the Monitoring page

    If you are prompted to choose an account, choose the account that you use to access Google Cloud Platform.

  2. Click Resources, then click Metrics Explorer.

  3. Under Find resource type and metric, type bigtable. A list of Cloud Bigtable resources and metrics appears.
  4. Click a metric to view a chart for that metric.

You can also use a graphing library, such as Matplotlib for Python, to plot and analyze the usage metrics for Cloud Bigtable. To learn more, see the tutorial on using Matplotlib with Stackdriver Monitoring and Cloud Bigtable.

For additional information about using Stackdriver Monitoring, see the Stackdriver Monitoring documentation.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Bigtable Documentation