This page lists Cloud Monitoring metrics available for Memorystore for Valkey, and describes what each metric measures.
Cloud Monitoring metrics
Metric name | Description |
---|---|
memorystore.googleapis.com/instance/clients/average_connected_clients |
Mean current number of client connections across all nodes in the instance. |
memorystore.googleapis.com/instance/clients/maximum_connected_clients |
Maximum current number of client connections for a single node in the instance. |
memorystore.googleapis.com/instance/clients/maximum_connection_duration |
Maximum duration of a client connection for a single node in the instance. |
memorystore.googleapis.com/instance/clients/total_connected_clients |
Current number of client connectionsto the instance. |
memorystore.googleapis.com/instance/stats/total_connections_received_count |
Count of instance-level total client connections created in the last one minute. |
memorystore.googleapis.com/instance/stats/total_rejected_connections_count |
Number of connections rejected because of maxclients limit. |
memorystore.googleapis.com/instance/commandstats/total_usec_count |
The total time consumed per command. |
memorystore.googleapis.com/instance/commandstats/total_calls_count |
Total number of calls for this command in one minute. |
memorystore.googleapis.com/instance/cpu/average_utilization |
Mean CPU utilization across all nodes in the instance from 0.0 to 1.0. |
memorystore.googleapis.com/instance/cpu/maximum_utilization |
Maximum CPU utilization for a single node in the instance from 0.0 to 1.0. |
memorystore.googleapis.com/instance/stats/average_expired_keys |
Mean number of key expiration events for the primaries of all nodes in the instance. |
memorystore.googleapis.com/instance/stats/maximum_expired_keys |
Maximum number of key expiration events for a single node in the instance for the primary. |
memorystore.googleapis.com/instance/stats/total_expired_keys_count |
Total number of key expiration events across primaries of all nodes in the instance. |
memorystore.googleapis.com/instance/stats/average_evicted_keys |
Mean number of evicted keys due to memory capacity across primaries of all nodes in the instance. |
memorystore.googleapis.com/instance/stats/maximum_evicted_keys |
Maximum number of evicted keys for a single node in the instance due to memory capacity for the primary. |
memorystore.googleapis.com/instance/stats/total_evicted_keys_count |
Number of evicted keys due to memory capacity across primaries of all nodes in the instance. |
memorystore.googleapis.com/instance/keyspace/total_keys |
Number of keys stored in the instance. |
memorystore.googleapis.com/instance/stats/average_keyspace_hits |
Mean number of successful lookup of keys across all nodes in the instance. |
memorystore.googleapis.com/instance/stats/maximum_keyspace_hits |
Maximum number of successful lookup of keys for a single node in the instance. |
memorystore.googleapis.com/instance/stats/total_keyspace_hits_count |
Number of successful lookup of keys for the instance. |
memorystore.googleapis.com/instance/stats/average_keyspace_misses |
Mean number of failed lookup of keys across all nodes in the instance. |
memorystore.googleapis.com/instance/stats/maximum_keyspace_misses |
Maximum number of failed lookup of keys for a single node in the instance. |
memorystore.googleapis.com/instance/stats/total_keyspace_misses_count |
Total number of failed lookup of keys for the instance. |
memorystore.googleapis.com/instance/memory/average_utilization |
Mean memory utilization across all nodes in the instance. Value is from 0.0 to 1.0. |
memorystore.googleapis.com/instance/memory/maximum_utilization |
Maximum memory utilization for a single node in the instance from 0.0 to 1.0. |
memorystore.googleapis.com/instance/memory/total_used_memory |
Total memory usage of the instance. |
memorystore.googleapis.com/instance/memory/size |
Memory size of the instance. |
memorystore.googleapis.com/instance/replication/average_ack_lag |
Mean replication lag (in seconds) of replicas across all nodes in the instance.Replication lag (in seconds) indicates how far replicas are lagging behind primaries. |
memorystore.googleapis.com/instance/replication/maximum_ack_lag |
Maximum replication acknowledge lag (in seconds) for a single replica in the instance.Replication acknowledge lag (in seconds) indicates how far replication acknowledgements are lagging behind primaries. |
memorystore.googleapis.com/instance/replication/average_offset_diff |
Mean replication acknowledge offset diff (in bytes) across all nodes in the instance.Replication acknowledge offset diff means the number of bytes that have not been replicated between replicas and their primaries. |
memorystore.googleapis.com/instance/replication/maximum_offset_diff |
Maximum replication offset diff (in bytes) for a single node in the instance.Replication offset diff means the number of bytes that have not been replicated between a replicas and their primaries. |
memorystore.googleapis.com/instance/stats/total_net_input_bytes_count |
Count of incoming network bytes received by the instance endpoints. |
memorystore.googleapis.com/instance/stats/total_net_output_bytes_count |
Count of outgoing network bytes sent from the instance endpoints. |
Persistence metrics
This sections lists persistence metrics and provides sample use cases for persistence metrics.
RDB persistence metrics
Metric name | Description |
---|---|
memorystore.googleapis.com/instance/persistence/load_count |
Cumulative count of loads from dumpfile across the instance (AOF or RDB). |
memorystore.googleapis.com/instance/persistence/rdb_saves_count |
This metric shows the cumulative number of times your instance has taken an RDB snapshot (also known as save). This metric has a status_code field. To check if a snapshot has failed, you can filter the status_code field for the following error: 3 - INTERNAL_ERROR |
memorystore.googleapis.com/instance/persistence/rdb_last_success_ages |
This metric shows a distribution snapshot age for all nodes across the instance. Ideally you want to see the distribution have values that have less lag time (or the same time) than your snapshot frequency. |
memorystore.googleapis.com/instance/persistence/rejected_writes_count |
Cumulative count of denied write commands across the instance due to failure to persist. |
AOF persistence metrics
Metric name | Description |
---|---|
memorystore.googleapis.com/instance/persistence/aof_fsync_lags |
This metrics shows a distribution of the lag (from data write to durable storage sync) for all nodes in the instance. It is only emitted for instances with appendfsync=everysec. Ideally you want to see the distribution have values that have less lag time (or the same time) than your AOF sync frequency. |
memorystore.googleapis.com/instance/persistence/aof_rewrite_count |
This metric shows the cumulative number of times for your instance that a node has triggered an AOF rewrite. This metric has a status_code field. To check if AOF rewrites are failing, you can filter the status_code field for the following error: 3 - INTERNAL_ERROR |
Sample use cases for persistence metrics
Checking if AOF write operations cause latency and memory pressure
Suppose that you detect increased latency or memory usage on your instance. In this case you may want to check if the extra usage is related to AOF persistence.
Since you know AOF rewrite operations can trigger transient load spikes, you can inspect the aof_rewrites_count
metric which gives you the cumulative count of AOF rewrites over the lifetime of the instance. Suppose this metric shows you that increments in the rewrites count correspond to latency increases. In this circumstance you could address the issue by reducing the write rate or increasing the shard count to reduce the frequency of rewrites.
Checking if RDB save operations cause latency and memory pressure
Suppose that you detect increased latency or memory usage on your instance. In this case you may want to check if the extra usage is related to RDB persistence.
Since you know RDB save operations can trigger transient load spikes, you can inspect the rdb_saves_count
metric which gives the cumulative count of RDB saves over the lifetime of the instance. Suppose this metric shows you that increments in the RDB saves count correspond to latency increases. In this circumstance you could reduce the RDB snapshot interval to lower the frequency of rewrites. You could also scale out the instance to reduce the baseline load levels.
Interpreting metrics for Memorystore for Valkey
As seen in the list above, many of the metrics share three categories: average, maximum, and total.
For Memorystore for Valkey, we provide average and maximum variations of the same metric so you can use them both to identify hotspotting for that metric family.
The total value for the metric is independent, and provides separate insight unrelated to the hotspotting purpose of average and maximum.
Understanding average and maximum metrics
Suppose you compare the average_keyspace_hits
and maximum_keyspace_hits
values for your instance. As the difference between the two metrics grows, a
greater difference indicates more hot spotting of hits in your instance. Ideally
you would have a close value between average_keyspace_hits
and
maximum_keyspace_hits
, because this means that hits are more evenly
distributed across your instance.
This principle applies to all metrics that have the average and maximum variations of the same metric.
Hot spotting example
If you compare average_keyspace_hits
and maximum_keyspace_hits
for all of
the shards in your instance, comparing these values indicates where hot spotting
occurs. For example, suppose shards in a 6-shard instance have the following
number of hits:
- Shard 1 – 2 hits
- Shard 2 – 2 hits
- Shard 3 – 2 hits
- Shard 4 – 2 hits
- Shard 5 – 2 hits
- Shard 6 – 8 hits
In this example the average_keyspace_hits
returns a value of 3, and the
maximum_keyspace_hits
returns 8, indicating that shard 6 is hot.