You can monitor Cloud Bigtable visually, using the charts that are available in the Google Cloud Console and Cloud Monitoring, or programmatically, using Cloud Monitoring.
The data available through the Google Cloud Console and Cloud Monitoring provides a high-level overview of your Cloud Bigtable usage. You can also use the Key Visualizer tool to drill down into your access patterns by row key and troubleshoot specific performance issues. For details, see Getting Started with Key Visualizer.
Understanding CPU and disk usage
No matter what tools you use to monitor your instance, it's essential to monitor the CPU and disk usage for each cluster in the instance. If a cluster's CPU or disk usage exceeds certain thresholds, the cluster will not perform well, and it might return errors when you try to read or write data.
CPU usage
The nodes in your clusters use CPU resources to handle reads, writes, and administrative tasks. To learn more about how the number of nodes affects a cluster's performance, see Performance for typical workloads.
Cloud Bigtable reports the following metrics for CPU usage:
Metric | Description |
---|---|
Average CPU utilization |
The average CPU utilization across all nodes in the cluster. The recommended maximum values provide headroom for brief spikes in usage. If a cluster exceeds the recommended maximum value for your configuration for more than a few minutes, add nodes to the cluster. |
CPU utilization of hottest node |
CPU utilization for the busiest node in the cluster. If the hottest node is frequently above the recommended value, even when your average CPU utilization is reasonable, you might be accessing a small part of your data much more frequently than the rest of your data.
|
The values for these metrics should not exceed the following:
Configuration | Recommended maximum values |
---|---|
Single cluster |
70% average CPU utilization |
Any number of clusters with single-cluster routing |
70% average CPU utilization |
2 clusters with multi-cluster routing |
35% average CPU
utilization |
3 or more clusters with multi-cluster routing |
Depends on your configuration. See the examples of replication settings for common use cases. |
Disk usage
For each cluster in your instance, Cloud Bigtable stores a separate copy of all of the tables in that instance.
Cloud Bigtable tracks disk usage in binary units, such as binary gigabytes (GB), where 1 GB is 230 bytes. This unit of measurement is also known as a gibibyte (GiB).
Cloud Bigtable reports the following metrics for disk usage:
Metric | Description |
---|---|
Storage utilization (bytes) |
The amount of data stored in the cluster. This value affects your costs. Also, as described below, you might need to add nodes to each cluster as the amount of data increases. |
Storage utilization (% max) |
The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster. In general, do not use more than 70% of the hard limit on total storage, so you have room to add more data. If you do not plan to add significant amounts of data to your instance, you can use up to 100% of the hard limit. If you are using more than the recommended percentage of the storage limit, add nodes to the cluster. You can also delete existing data, but deleted data takes up more space, not less, until a compaction occurs. For details about how this value is calculated, see Storage utilization per node. |
Disk load |
The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters. If this value is frequently at 100%, you might experience increased latency. Add nodes to the cluster to reduce the disk load percentage. |
Getting a performance overview with the Cloud Console
Use your instance's overview page to understand the current status of your instance's clusters.
The overview page shows the current values of several key metrics for each cluster:
Metric | Description |
---|---|
CPU utilization average | The average CPU utilization across all nodes in the cluster. |
CPU utilization of hottest node |
CPU utilization for the busiest node in the cluster. Exceeding the recommended maximum for the busiest node can cause latency and other issues for the cluster. |
Rows read | The number of rows read per second. |
Rows written | The number of rows written per second. |
Read throughput | The number of uncompressed bytes per second of response data sent. This metric refers to the full amount of data that is returned after filters are applied. |
Write throughput | The number of uncompressed bytes per second that were received when data was written. |
System error rate | The percentage of all requests that failed on the Cloud Bigtable server side. |
Replication latency for input | The highest amount of time at the 99th percentile, in seconds, for a write to another cluster to be replicated to this cluster. |
Replication latency for output | The highest amount of time at the 99th percentile, in seconds, for a write to this cluster to be replicated to another cluster. |
To see an overview of these key metrics:
Open the list of Cloud Bigtable instances in the Cloud Console.
Click the instance whose metrics you want to view. The Cloud Console displays the current metrics for your instance's clusters.
Monitoring performance over time with the Cloud Console
Use your instance's monitoring page to understand the past performance of your instance. You can analyze the performance of each cluster, and you can break down the metrics for different types of Cloud Bigtable resources. Charts can display a period ranging from the past 1 hour to the past 30 days.
Charts for Cloud Bigtable resources
The monitoring page provides charts for the following types of Cloud Bigtable resources:
- Instances
- Tables
- Application profiles
Charts are available for the following metrics:
Metric | Available for | Description |
---|---|---|
CPU utilization | Instances | The average CPU utilization across all nodes in the cluster. |
CPU utilization (hottest node) | Instances |
CPU utilization for the busiest node in the cluster. Exceeding the recommended maximum for the busiest node can cause latency and other issues for the cluster. |
User error rate | Instances |
The rate of errors caused by the content of a request, as opposed to errors on the Cloud Bigtable server side. The user error rate includes the following status codes:
User errors are typically caused by a configuration issue, such as a request that specifies the wrong cluster, table, or app profile. |
System error rate | Instances |
The percentage of all requests that failed on the Cloud Bigtable server side.
The system error rate includes the following
status codes:
|
Automatic failovers |
Instances Tables App profiles |
The number of requests that were automatically rerouted from one cluster to another due to a failover scenario, such as a brief outage or delay. Automatic rerouting can occur if an app profile uses multi-cluster routing. This chart does not include manually rerouted requests. |
Storage utilization (bytes) |
Instances Tables |
The amount of data stored in the cluster. This metric reflects the fact that Cloud Bigtable compresses your data when it is stored. |
Storage utilization (% max) | Instances |
The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster. For details about how this value is calculated, see Storage utilization per node. |
Disk load | Instances | The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters. |
Rows read |
Instances Tables App profiles |
The number of rows read per second. This metric provides a more useful view of Cloud Bigtable's overall throughput than the number of read requests, because a single request can read a large number of rows. |
Rows written |
Instances Tables App profiles |
The number of rows written per second. This metric provides a more useful view of Cloud Bigtable's overall throughput than the number of write requests, because a single request can write a large number of rows. |
Read requests |
Instances Tables App profiles |
The number of random reads and scan requests per second. |
Write requests |
Instances Tables App profiles |
The number of write requests per second. |
Read throughput |
Instances Tables App profiles |
The number of uncompressed bytes per second of response data sent. This metric refers to the full amount of data that is returned after filters are applied. |
Write throughput |
Instances Tables App profiles |
The number of uncompressed bytes per second that were received when data was written. |
Node count | Instances | The number of nodes in the cluster. |
To view metrics for these resources:
Open the list of Cloud Bigtable instances in the Cloud Console.
Click the instance whose metrics you want to view.
In the left pane, click Monitoring. The Cloud Console displays a series of charts for the instance, as well as a tabular view of the instance's metrics. By default, the Cloud Console shows metrics for the past hour, and it shows separate metrics for each cluster in the instance.
To view all of the charts, scroll through the pane where the charts are displayed.
To view metrics at the table level, click Tables.
To view metrics for individual app profiles, click Application Profiles.
To view combined metrics for the instance as a whole, find the Group by section above the charts, then click Instance.
To view metrics for a longer period of time, click the arrow next to 1 Hour. Choose a pre-set time range or enter a custom time range, then click Apply.
Charts for replication
The monitoring page provides a chart that shows replication latency over time. You can view the average latency for replicating writes at the 50th, 99th, and 100th percentiles.
To view the replication latency over time:
Open the list of Cloud Bigtable instances in the Cloud Console.
Click the instance whose metrics you want to view.
In the left pane, click Monitoring. The page opens with the Instance tab selected.
Click the Replication tab. The Cloud Console displays replication latency over time. By default, the Cloud Console shows replication latency for the past hour.
To toggle between latency charts grouped by table or by cluster, use the Group by menu.
To change which percentile to view, use the Percentile menu.
To view metrics for a longer period of time, click the arrow next to 1 Hour. Choose a pre-set time range or enter a custom time range, then click Apply.
Monitoring an instance with Cloud Monitoring
Cloud Bigtable exports usage metrics that you can monitor programmatically using Cloud Monitoring. You can use the Cloud Monitoring API or the Metrics Explorer to track Cloud Bigtable usage metrics. In addition, you can set up alerting policies based on usage metrics, and you can add charts for Cloud Bigtable usage metrics to a custom dashboard.
To view usage metrics in the Metrics Explorer:
Open the Monitoring page in the Cloud Console.
If you are prompted to choose an account, choose the account that you use to access Google Cloud.
Click Resources, then click Metrics Explorer.
Under Find resource type and metric, type
bigtable
. A list of Cloud Bigtable resources and metrics appears.Click a metric to view a chart for that metric.
You can also use a graphing library, such as Matplotlib for Python, to plot and analyze the usage metrics for Cloud Bigtable.
For additional information about using Cloud Monitoring, see the Cloud Monitoring documentation.
What's next
- Learn how to programmatically scale your Cloud Bigtable cluster.
- Find out how to troubleshoot issues with Key Visualizer.
- Learn more about Cloud Bigtable performance.
- Read about client-side metrics for the HBase client for Java.
- Try the Cloud Monitoring quickstart.
- Learn about creating alerts based on Cloud Bigtable metrics.