Monitoring cloud volumes

Various metrics for monitoring cloud volumes are available within Cloud Monitoring. Categories include volume capacity/operations metrics, backup metrics, and replication metrics.

You can select and chart individual metrics in Metrics Explorer, create a dashboard with multiple charts, add alerting, or retrieve metrics data with the Cloud Monitoring API.

Cloud Volumes Service metrics are under the resource type Monitored Resource for NetApp CVS. These metrics use the resource type prefix cloudvolumesgcp-api.netapp.com/CloudVolume.

Metrics are sampled over a 60-second or 300-second period and pushed to Cloud Monitoring every minute.

Each volume resource also has labels that can be used to filter or group the volumes.

Resource labels

Metric name Description CVS CVS-Performance
location Region/zone information.
volume_id ID of the volume.
name Name of the volume or replication relationship.
service_type Service type of the volume or replication relationship.

Volume capacity/operations metrics

Metric name Description CVS CVS-Performance
volume_usage Space utilized by the volume, in bytes; the actual size of the volume.
volume_size Space allocated to the volume, in bytes; the provisioned size of the volume.
operation_count Number of operations per second being performed on the cloud volume by the end users.
read_bytes_count I/O bytes from read operations by the end user.
write_bytes_count I/O bytes from write operations by the end user.
request_latencies The volume's responsiveness for I/O operation requests in milliseconds. This is latency at the storage level. It doesn't include network latency to the client.
inode_allocation Number of file and directory inodes allocated for the volume (hard cap); based on the allocated capacity (size) of the volume.
inode_usage Number of inodes in use on the volume.

Backup metrics

Metric name Description CVS CVS-Performance
logical_bytes_backed_up Logical bytes backed up (baseline and incremental changes). x

Replication metrics

Metric name Description CVS CVS-Performance
replication_healthy Health of replication relationships: 1/TRUE for healthy and 0/FALSE for unhealthy. x
replication_lag_time Elapsed time since the exported snapshot copy (last complete transfer) was created on the destination. x
replication_last_transfer_duration Duration of the last transfer job. x
replication_last_transfer_size Number of bytes transferred for the last data transfer job. x
replication_relationship_progress Number of bytes transferred so far for the current data transfer job. x
replication_relationship_status Status of replication: 1/TRUE for transferring, 0/FALSE for idle. x
replication_total_transfer_bytes Cumulative number of bytes transferred for the relationship since it was created. x

Monitoring and alerting for out-of-space conditions

Starting in November 2021, Cloud Volumes Service will limit growth beyond the allocated size for volumes of the CVS-Performance service type.

An application or user that writes more data into the volume than allocated will receive an out-of-space error, which can cause application problems.

To implement monitoring and alerting for out-of-space conditions, use Cloud Monitoring alerting.

Follow the instructions to set up an alert, using the following MQL condition:

    fetch cloudvolumesgcp-api.netapp.com/CloudVolume
    | {
       metric 'cloudvolumesgcp-api.netapp.com/cloudvolume/volume_usage'
       | filter (metric.type == 'logical')
       ;
       metric 'cloudvolumesgcp-api.netapp.com/cloudvolume/volume_size'
    } | join | div
    | group_by sliding(5m), max(val())
    | condition val() > 0.8

The last line of the condition checks against a threshold. In this example, the threshold is 0.8, which corresponds to 80% of capacity used for the volumes.

When a volume reaches usage beyond the threshold, you get an alert. To make sure that volumes don't fill more quickly than you can react, consider choosing the recommended threshold of 80% (0.8). If your environment is write-intensive, use a smaller threshold.

If you receive an alert, do one of the following:

  • Increase the volume size using the Google Cloud Console or the API.
  • Change the volume size in your declaration when using a GitOps system, such as Terraform.
  • Automatically adjust volume size with the GCP-CVS-CapacityManager script.

What's next