Supported monitoring metrics

This page lists Cloud Monitoring metrics available for Memorystore for Valkey, and describes what each metric measures.

Cloud Monitoring metrics

Instance-level metrics

These metrics provide a high-level overview of the overall health and performance of the instance. They help you understand the overall capacity and utilization of the instance, as well as identify potential bottlenecks or areas for improvement.

Metric name	Description
`memorystore.googleapis.com/instance/clients/average_connected_clients`	Mean current number of client connections across all nodes in the instance.
`memorystore.googleapis.com/instance/clients/maximum_connected_clients`	Maximum current number of client connections for a single node in the instance.
`memorystore.googleapis.com/instance/clients/maximum_connection_duration`	Maximum duration of a client connection for a single node in the instance.
`memorystore.googleapis.com/instance/clients/total_connected_clients`	Current number of client connectionsto the instance.
`memorystore.googleapis.com/instance/stats/total_connections_received_count`	Count of instance-level total client connections created in the last one minute.
`memorystore.googleapis.com/instance/stats/total_rejected_connections_count`	Number of connections rejected because of maxclients limit.
`memorystore.googleapis.com/instance/commandstats/total_usec_count`	The total time consumed per command.
`memorystore.googleapis.com/instance/commandstats/total_calls_count`	Total number of calls for this command in one minute.
`memorystore.googleapis.com/instance/cpu/average_utilization`	Mean CPU utilization across all nodes in the instance from 0.0 to 1.0.
`memorystore.googleapis.com/instance/cpu/maximum_utilization`	Maximum CPU utilization for a single node in the instance from 0.0 to 1.0.
`memorystore.googleapis.com/instance/stats/average_expired_keys`	Mean number of key expiration events for the primaries of all nodes in the instance.
`memorystore.googleapis.com/instance/stats/maximum_expired_keys`	Maximum number of key expiration events for a single node in the instance for the primary.
`memorystore.googleapis.com/instance/stats/total_expired_keys_count`	Total number of key expiration events across primaries of all nodes in the instance.
`memorystore.googleapis.com/instance/stats/average_evicted_keys`	Mean number of evicted keys due to memory capacity across primaries of all nodes in the instance.
`memorystore.googleapis.com/instance/stats/maximum_evicted_keys`	Maximum number of evicted keys for a single node in the instance due to memory capacity for the primary.
`memorystore.googleapis.com/instance/stats/total_evicted_keys_count`	Number of evicted keys due to memory capacity across primaries of all nodes in the instance.
`memorystore.googleapis.com/instance/keyspace/total_keys`	Number of keys stored in the instance.
`memorystore.googleapis.com/instance/stats/average_keyspace_hits`	Mean number of successful lookup of keys across all nodes in the instance.
`memorystore.googleapis.com/instance/stats/maximum_keyspace_hits`	Maximum number of successful lookup of keys for a single node in the instance.
`memorystore.googleapis.com/instance/stats/total_keyspace_hits_count`	Number of successful lookup of keys for the instance.
`memorystore.googleapis.com/instance/stats/average_keyspace_misses`	Mean number of failed lookup of keys across all nodes in the instance.
`memorystore.googleapis.com/instance/stats/maximum_keyspace_misses`	Maximum number of failed lookup of keys for a single node in the instance.
`memorystore.googleapis.com/instance/stats/total_keyspace_misses_count`	Total number of failed lookup of keys for the instance.
`memorystore.googleapis.com/instance/memory/average_utilization`	Mean memory utilization across all nodes in the instance. Value is from 0.0 to 1.0.
`memorystore.googleapis.com/instance/memory/maximum_utilization`	Maximum memory utilization for a single node in the instance from 0.0 to 1.0.
`memorystore.googleapis.com/instance/memory/total_used_memory`	Total memory usage of the instance.
`memorystore.googleapis.com/instance/memory/size`	Memory size of the instance.
`memorystore.googleapis.com/instance/replication/average_ack_lag`	Mean acknowledgement lag (in seconds) of replicas across all nodes in the instance. Acknowledgment lag is a bottleneck on the primary node in an instance. This bottleneck is caused by its replicas that can't keep up with the information that the primary node sends to them. When this happens, the primary node must wait for the acknowledgment that the replicas received the information. This might slow down transaction commits and cause a performance hit on the primary node.
`memorystore.googleapis.com/instance/replication/maximum_ack_lag`	Maximum acknowledgement lag (in seconds) for a single replica in the instance.
`memorystore.googleapis.com/instance/replication/average_offset_diff`	Mean replication acknowledge offset diff (in bytes) across all nodes in the instance. Replication acknowledge offset diff means the number of bytes that have not been replicated between replicas and their primaries.
`memorystore.googleapis.com/instance/replication/maximum_offset_diff`	Maximum replication offset diff (in bytes) for a single node in the instance. Replication offset diff means the number of bytes that have not been replicated between a replicas and their primaries.
`memorystore.googleapis.com/instance/stats/total_net_input_bytes_count`	Count of incoming network bytes received by the instance endpoints.
`memorystore.googleapis.com/instance/stats/total_net_output_bytes_count`	Count of outgoing network bytes sent from the instance endpoints.

Node-level metrics

These metrics offer detailed insights into the health and performance of individual nodes within the instance. They help you troubleshoot issues with nodes and optimize the performance of nodes.

Metric name	Description
`memorystore.googleapis.com/instance/node/clients/connected_clients`	The number of clients connected to the instance node.
`memorystore.googleapis.com/instance/node/clients/blocked_clients`	The number of client connections that the instance node blocks.
`memorystore.googleapis.com/instance/node/server/uptime`	The uptime of the instance node.
`memorystore.googleapis.com/instance/node/stats/connections_received_count`	The number of client connections that Memorystore for Valkey creates in the last minute on the instance node.
`memorystore.googleapis.com/instance/node/stats/rejected_connections_count`	The number of connections that Memorystore for Valkey rejects because the instance node reaches the `maxclients` limit.
`memorystore.googleapis.com/instance/node/commandstats/usec_count`	The time consumed for each command in the instance node.
`memorystore.googleapis.com/instance/node/commandstats/calls_count`	The number of calls for this command on the instance node in one minute.
`memorystore.googleapis.com/instance/node/cpu/utilization`	The CPU utilization for the instance node (from 0.0 to 1.0).
`memorystore.googleapis.com/instance/node/stats/expired_keys_count`	The number of expiration events in the instance node.
`memorystore.googleapis.com/instance/node/stats/evicted_keys_count`	The number of evicted keys by the instance node.
`memorystore.googleapis.com/instance/node/keyspace/total_keys`	The number of keys that Memorystore for Valkey stores in the instance node.
`memorystore.googleapis.com/instance/node/stats/keyspace_hits_count`	The number of successful lookups of keys in the instance node.
`memorystore.googleapis.com/instance/node/stats/keyspace_misses_count`	The number of failed lookups of keys in the instance node.
`memorystore.googleapis.com/instance/node/memory/utilization`	The memory utilization for the instance node (from 0.0 to 1.0).
`memorystore.googleapis.com/instance/node/memory/usage`	The memory usage of the instance node.
`memorystore.googleapis.com/instance/node/stats/net_input_bytes_count`	The number of incoming network bytes that the instance node receives.
`memorystore.googleapis.com/instance/node/stats/net_output_bytes_count`	The number of outgoing network bytes that the instance node sends.
`memorystore.googleapis.com/instance/node/replication/offset`	The replication offset bytes of the instance node.
`memorystore.googleapis.com/instance/node/server/healthy`	Determines whether an instance node is available and functioning correctly. This metric is in Preview.

Cross-region replication metrics

This section lists metrics used for cross-region replication.

Metric name	Description
`memorystore.googleapis.com/instance/cross_instance_replication/secondary_replication_links`	This metric shows the number of shard links between the primary and secondary instances. Within a cross-region replication group, a primary instance reports the number of cross-region replication links that it has with the secondary instances in the group. For each secondary instance, this number is expected to be equal to the number of shards. If the number drops below the number of shards, then this metric identifies the number of shards when replication stopped between the replicator and the follower. In an ideal state, this metric has the same number as the shard count for the primary instance.
`memorystore.googleapis.com/instance/cross_instance_replication/secondary_maximum_replication_offset_diff`	This metric shows the maximum replication offset difference between the primary and secondary shards.
`memorystore.googleapis.com/instance/cross_instance_replication/secondary_average_replication_offset_diff`	This metric shows the average replication offset difference between the primary and secondary shards.

Backup metrics

This section lists backup and import metrics.

Instance-level metrics

Metric name	Description
`memorystore.googleapis.com/instance/backup/last_backup_start_time`	The start time of the last backup operation.
`memorystore.googleapis.com/instance/backup/last_backup_status`	The status of the last backup operation. Statuses are `1` (success) and `0` (failure).
`memorystore.googleapis.com/instance/backup/last_backup_duration`	The duration of the last backup operation (in milliseconds).
`memorystore.googleapis.com/instance/backup/last_backup_size`	The size of the last backup (in bytes).
`memorystore.googleapis.com/instance/import/last_import_start_time`	The start time of the last import operation.
`memorystore.googleapis.com/instance/import/last_import_duration`	The duration of the last import operation(in milliseconds).

Persistence metrics

This section lists persistence metrics and provides sample use cases for persistence metrics.

RDB persistence metrics

Instance-level metrics

Metric name	Description
`memorystore.googleapis.com/instance/persistence/load_count`	The cumulative count count of loads from across the instance for AOF or RDB persistence.
`memorystore.googleapis.com/instance/persistence/rdb_saves_count`	The cumulative number of times that your instance takes an RDB snapshot (also known as save). This metric has a `status_code` field. To check if a snapshot fails, you can filter the `status_code` field for the following error: `3 - INTERNAL ERROR`.
`memorystore.googleapis.com/instance/persistence/rdb_last_success_ages`	A distribution snapshot age for all nodes across the instance. You want to see the distribution have values that have less lag time (or the same time) than your snapshot frequency.
`memorystore.googleapis.com/instance/persistence/rejected_writes_count`	The cumulative count of denied write commands across the instance because of a failure to persist.

Node-level metrics

Metric name	Description
`memorystore.googleapis.com/instance/node/persistence/rdb_bgsave_in_progress`	An `RDB BGSAVE` is in progress on the instance node. `TRUE` means that the save is in progress.
`memorystore.googleapis.com/instance/node/persistence/rdb_last_bgsave_status`	The success of the last `BGSAVE` on the instance node. `TRUE` means that a successful `BGSAVE` occurs. If no `bgrewrite` occurs. then the value might default to `TRUE`.
`memorystore.googleapis.com/instance/node/persistence/rdb_saves_count`	The metric shows the cumulative number of RDB saves run on the instance node.
`memorystore.googleapis.com/instance/node/persistence/rdb_last_save_age`	The time (in seconds) since the last successful snapshot.
`memorystore.googleapis.com/instance/node/persistence/rdb_next_save_time_until`	The time remaining (in seconds) until the next snapshot.
`memorystore.googleapis.com/instance/node/persistence/current_save_keys_total`	The number of keys in the RDB save that runs on the instance node.

AOF persistence metrics

Instance-level metrics

Metric name	Description
`memorystore.googleapis.com/instance/persistence/aof_fsync_lags`	This metric shows a distribution of the lag (from data write to durable storage sync) for all nodes in the instance. It is only emitted for instances with appendfsync=everysec. Ideally you want to see the distribution have values that have less lag time (or the same time) than your AOF sync frequency.
`memorystore.googleapis.com/instance/persistence/aof_rewrite_count`	This metric shows the cumulative number of times for your instance that a node has triggered an AOF rewrite. This metric has a `status_code` field. To check if AOF rewrites are failing, you can filter the `status_code` field for the following error: 3 - INTERNAL_ERROR

Node-level metrics

Metric name	Description
`memorystore.googleapis.com/instance/node/persistence/aof_last_write_status`	This metrics shows the success of the most recent AOF write on the instance node. TRUE means success, if no write has occurred the value may default to TRUE.
`memorystore.googleapis.com/instance/node/persistence/aof_last_bgrewrite_status`	This metric shows the success of the last AOF bgrewrite operation on the instance node. TRUE means success, if no bgrewrite has occurred the value may default to TRUE.
`memorystore.googleapis.com/instance/node/persistence/aof_fsync_lag`	This metric shows the AOF lag between memory and persistent store in the instance node. It is only applicable for AOF enabled instances where appendfsync=EVERYSEC
`memorystore.googleapis.com/instance/node/persistence/aof_rewrites_count`	This metric shows the count of AOF rewrites in the instance node. To check if AOF rewrites are failing, you can filter the `status_code` field for the following error: 3 - INTERNAL_ERROR
`memorystore.googleapis.com/instance/node/persistence/aof_fsync_errors_count`	This metric shows the count of AOF fsync() call errors and is only applicable for AOF enabled instances where appendfsync=EVERYSEC\|ALWAYS.

Common Persistence Metrics

Metrics that are applicable to both AOF and RDB persistence mechanisms.

Node level metrics

Metric name	Description
`memorystore.googleapis.com/instance/node/persistence/auto_restore_count`	This metric shows the count of restores from the dumpfile (AOF or RDB). To check if restores are failing, you can filter the `status_code` field for the following error: 2 - INTERNAL_ERROR

Sample use cases for persistence metrics

Checking if AOF write operations cause latency and memory pressure

Suppose that you detect increased latency or memory usage on your instance or the node within the instance. In this case you may want to check if the extra usage is related to AOF persistence.

Since you know AOF rewrite operations can trigger transient load spikes, you can inspect the aof_rewrites_count metric which gives you the cumulative count of AOF rewrites over the lifetime of the instance or the node within the instance. Suppose this metric shows you that increments in the rewrites count correspond to latency increases. In this circumstance you could address the issue by reducing the write rate or increasing the shard count to reduce the frequency of rewrites.

Checking if RDB save operations cause latency and memory pressure

Suppose that you detect increased latency or memory usage on your instance or the node within the instance. In this case you may want to check if the extra usage is related to RDB persistence.

Since you know RDB save operations can trigger transient load spikes, you can inspect the rdb_saves_count metric which gives the cumulative count of RDB saves over the lifetime of the instance or the node within the instance. Suppose this metric shows you that increments in the RDB saves count correspond to latency increases. In this circumstance you could reduce the RDB snapshot interval to lower the frequency of rewrites. You could also scale out the instance to reduce the baseline load levels.

Interpreting metrics for Memorystore for Valkey

As seen in the list above, many of the metrics share three categories: average, maximum, and total.

For Memorystore for Valkey, we provide average and maximum variations of the same metric so you can use them both to identify hotspotting for that metric family.

The total value for the metric is independent, and provides separate insight unrelated to the hotspotting purpose of average and maximum.

Understanding average and maximum metrics

Suppose you compare the average_keyspace_hits and maximum_keyspace_hits values for your instance. As the difference between the two metrics grows, a greater difference indicates more hot spotting of hits in your instance. Ideally you would have a close value between average_keyspace_hits and maximum_keyspace_hits, because this means that hits are more evenly distributed across your instance.

This principle applies to all metrics that have the average and maximum variations of the same metric.

Hot spotting example

If you compare average_keyspace_hits and maximum_keyspace_hits for all of the shards in your instance, comparing these values indicates where hot spotting occurs. For example, suppose shards in a 6-shard instance have the following number of hits:

Shard 1 – 2 hits
Shard 2 – 2 hits
Shard 3 – 2 hits
Shard 4 – 2 hits
Shard 5 – 2 hits
Shard 6 – 8 hits

In this example the average_keyspace_hits returns a value of 3, and the maximum_keyspace_hits returns 8, indicating that shard 6 is hot.

We provide node level metrics which could be useful to identify hotspots within the instance.

Supported monitoring metrics Stay organized with collections Save and categorize content based on your preferences.

Cloud Monitoring metrics

Instance-level metrics

Node-level metrics

Cross-region replication metrics

Backup metrics

Instance-level metrics

Persistence metrics

RDB persistence metrics

Instance-level metrics

Node-level metrics

AOF persistence metrics

Instance-level metrics

Node-level metrics

Common Persistence Metrics

Node level metrics

Sample use cases for persistence metrics

Checking if AOF write operations cause latency and memory pressure

Checking if RDB save operations cause latency and memory pressure

Interpreting metrics for Memorystore for Valkey

Understanding average and maximum metrics

Hot spotting example

Supported monitoring metrics