Monitor cloud volumes

Last reviewed 2023-02-24 UTC

Various metrics for monitoring cloud volumes are available within Cloud Monitoring. Categories include volume capacity/operations metrics, storage pool metrics, backup metrics, and replication metrics.

You can select and chart individual metrics in Metrics Explorer, create a dashboard with multiple charts, add alerting, or retrieve metrics data with the Cloud Monitoring API.

Cloud Volumes Service metrics are under the resource type Monitored Resource for NetApp CVS. These metrics use the resource type prefix cloudvolumesgcp-api.netapp.com/CloudVolume.

Storage pool metrics are under the resource type Monitored Resource for NetApp CVS storage pool. These metrics use the resource type prefix cloudvolumesgcp-api.netapp.com/CloudVolumePool.

Metrics are sampled and pushed to Cloud Monitoring every 5 minutes. In Cloud Monitoring Metrics Explorer, select a metric and use a Minimum alignment period of 5 minutes for accurate results.

Each volume resource also has labels that can be used to filter or group the volumes.

Resource labels

Metric name	Description	CVS	CVS-Performance
`location`	Region/zone information.	✓	✓
`volume_id`	ID of the volume.	✓	✓
`name`	Name of the volume or replication relationship.	✓	✓
`service_type`	Service type of the volume or replication relationship.	✓	✓

Volume capacity/operations metrics

Metric name	Description	CVS	CVS-Performance
`volume_usage`	Space utilized by the volume, in bytes; the actual size of the volume.	✓	✓
`volume_size`	Space allocated to the volume, in bytes; the provisioned size of the volume.	✓	✓
`volume_percent_used`	Percentage of allocated space used by the volume.	✓	✓
`operation_count`	Number of operations per second being performed on the cloud volume by the end users.	✓	✓
`read_bytes_count`	I/O bytes from read operations by the end user.	✓	✓
`write_bytes_count`	I/O bytes from write operations by the end user.	✓	✓
`request_latencies`	The volume's responsiveness for I/O operation requests in milliseconds. This is latency at the storage level. It doesn't include network latency to the client.	✓	✓
`inode_allocation`	Number of file and directory inodes allocated for the volume (hard cap); based on the allocated capacity (size) of the volume.	x	✓
`inode_usage`	Number of inodes in use on the volume.	x	✓

Storage pool metrics

Metric name	Description	CVS	CVS-Performance
`usage`	Space used by all of the volumes in the storage pool, in bytes.	✓	x
`size`	Space allocated to the pool, in bytes.	✓	x

Backup metrics

Metric name	Description	CVS	CVS-Performance
`logical_bytes_backed_up`	Logical bytes backed up (baseline and incremental changes).	✓	x

Replication metrics

Metric name	Description	CVS	CVS-Performance
`replication_healthy`	Health of replication relationships: 1/TRUE for healthy and 0/FALSE for unhealthy.	x	✓
`replication_lag_time`	Elapsed time since the exported snapshot copy (last complete transfer) was created on the destination.	x	✓
`replication_last_transfer_duration`	Duration of the last transfer job.	x	✓
`replication_last_transfer_size`	Number of bytes transferred for the last data transfer job.	x	✓
`replication_relationship_progress`	Number of bytes transferred so far for the current data transfer job.	x	✓
`replication_relationship_status`	Status of replication: 1/TRUE for transferring, 0/FALSE for idle.	x	✓
`replication_total_transfer_bytes`	Cumulative number of bytes transferred for the relationship since it was created.	x	✓

Implement monitoring and alerting for out-of-space conditions

Cloud Volumes Service limits growth beyond the allocated size for volumes of the CVS-Performance service type.

An application or user that writes more data into the volume than allocated receives an out-of-space error, which can cause application problems.

To implement monitoring and alerting for out-of-space conditions, use Cloud Monitoring alerting.

Follow the instructions to set up an alert, using the following MQL condition:

    fetch cloudvolumesgcp-api.netapp.com/CloudVolume
    | {
       metric 'cloudvolumesgcp-api.netapp.com/cloudvolume/volume_usage'
       | filter (metric.type == 'logical')
       ;
       metric 'cloudvolumesgcp-api.netapp.com/cloudvolume/volume_size'
    } | join | div
    | group_by sliding(5m), max(val())
    | condition val() > 0.8

The last line of the condition checks against a threshold. In this example, the threshold is 0.8, which corresponds to 80% of capacity used for the volumes.

When a volume reaches usage beyond the threshold, you get an alert. To make sure that volumes don't fill more quickly than you can react, consider choosing the recommended threshold of 80% (0.8). If your environment is write-intensive, use a smaller threshold.

If you receive an alert, do one of the following:

Increase the volume size using the Google Cloud console or the API.
Change the volume size in your declaration when using a GitOps system, such as Terraform.
Automatically adjust volume size with the GCP-CVS-CapacityManager script.