Supported monitoring metrics

This page lists Cloud Monitoring metrics available for Memorystore for Redis Cluster, and describes what each metric measures.

Cloud Monitoring metrics

Metric name Description
redis.googleapis.com/cluster/clients/average_connected_clients Mean current number of client connections across the cluster.
redis.googleapis.com/cluster/clients/maximum_connected_clients Maximum current number of client connections across the cluster.
redis.googleapis.com/cluster/clients/total_connected_clients Current number of client connections to the cluster.
redis.googleapis.com/cluster/stats/total_connections_received_count Count of cluster-level total client connections created in the last one minute.
redis.googleapis.com/cluster/stats/cluster/stats/total_rejected_connections_count Number of connections rejected because of maxclients limit.
redis.googleapis.com/cluster/commandstats/total_usec_count The total time consumed per command.
redis.googleapis.com/cluster/commandstats/total_calls_count Total number of calls for this command in one minute.
redis.googleapis.com/cluster/cpu/average_utilization Mean CPU utilization for the cluster from 0.0 to 1.0.
redis.googleapis.com/cluster/cpu/maximum_utilization Maximum CPU utilization for the cluster from 0.0 to 1.0.
redis.googleapis.com/cluster/stats/average_expired_keys Mean number of key expiration events for the primaries.
redis.googleapis.com/cluster/stats/maximum_expired_keys Maximum number of key expiration events for the primaries.
redis.googleapis.com/cluster/stats/total_expired_keys_count Total number of key expiration events for the primaries.
redis.googleapis.com/cluster/stats/average_evicted_keys Mean number of evicted keys due to memory capacity for the primaries.
redis.googleapis.com/cluster/stats/maximum_evicted_keys Maximum number of evicted keys due to memory capacity on primaries
redis.googleapis.com/cluster/stats/total_evicted_keys_count Number of evicted keys due to memory capacity on primaries.
redis.googleapis.com/cluster/keyspace/total_keys Number of keys stored in the cluster.
redis.googleapis.com/cluster/stats/average_keyspace_hits Mean number of successful lookup of keys across the cluster.
redis.googleapis.com/cluster/stats/maximum_keyspace_hits Maximum number of successful lookup of keys across the cluster.
redis.googleapis.com/cluster/stats/total_keyspace_hits_count Number of successful lookup of keys across the cluster.
redis.googleapis.com/cluster/stats/average_keyspace_misses Mean number of failed lookup of keys across the cluster.
redis.googleapis.com/cluster/stats/maximum_keyspace_misses Maximum number of failed lookup of keys across the cluster.
redis.googleapis.com/cluster/stats/total_keyspace_misses_count Total number of failed lookup of keys across the cluster.
redis.googleapis.com/cluster/memory/average_utilization Mean memory utilization across the cluster from 0.0 to 1.0.
redis.googleapis.com/cluster/memory/maximum_utilization Maximum memory utilization across the cluster from 0.0 to 1.0.
redis.googleapis.com/cluster/memory/total_used_memory Total memory usage of the cluster.
redis.googleapis.com/cluster/memory/size Memory size of the cluster.
redis.googleapis.com/cluster/replication/average_ack_lag Mean replication lag (in seconds) of replicas across the cluster.

Replication lag (in seconds) indicates how far replicas are lagging behind primaries.
redis.googleapis.com/cluster/replication/maximum_ack_lag Maximum replication acknowledge lag (in seconds) of replicas across the cluster.

Replication acknowledge lag (in seconds) indicates how far replication acknowledgements are lagging behind primaries.
redis.googleapis.com/cluster/replication/average_offset_diff Mean replication acknowledge offset diff (in bytes) across the cluster.

Replication acknowledge offset diff means the number of bytes that have not been replicated between replicas and their primaries.
redis.googleapis.com/cluster/replication/maximum_offset_diff Maximum replication offset diff (in bytes) across the cluster.

Replication offset diff means the number of bytes that have not been replicated between a replicas and their primaries.
redis.googleapis.com/cluster/stats/total_net_input_bytes_count Count of incoming network bytes received by the cluster endpoints.
redis.googleapis.com/cluster/stats/total_net_output_bytes_count Count of outgoing network bytes sent from the cluster endpoints.

Persistence metrics

This sections lists persistence metrics and provides sample use cases for persistence metrics.

RDB persistence metrics

Metric name Description
redis.googleapis.com/cluster/persistence/rdb_saves_count This metric shows the cumulative number of times your cluster has taken an RDB snapshot (also known as save). This metric has a status_code field. To check if a snapshot has failed, you can filter the status_code field for the following error: 3 - INTERNAL_ERROR
redis.googleapis.com/cluster/persistence/rdb_save_ages This metric shows a distribution snapshot age for all nodes across the cluster. Ideally you want to see the distribution have values that have less lag time (or the same time) than your snapshot frequency.

AOF persistence metrics

Metric name Description
redis.googleapis.com/cluster/persistence/aof_fsync_lags This metrics shows a distribution of the lag (from data write to durable storage sync) for all nodes in the cluster. It is only emitted for clusters with appendfsync=everysec. Ideally you want to see the distribution have values that have less lag time (or the same time) than your AOF sync frequency.
redis.googleapis.com/cluster/persistence/aof_rewrite_count This metric shows the cumulative number of times for your cluster that a node has triggered an AOF rewrite. This metric has a status_code field. To check if AOF rewrites are failing, you can filter the status_code field for the following error: 3 - INTERNAL_ERROR

Sample use cases for persistence metrics

Checking if AOF write operations cause latency and memory pressure

Suppose that you detect increased latency or memory usage on your cluster. In this case you may want to check if the extra usage is related to AOF persistence.

Since you know AOF rewrite operations can trigger transient load spikes, you can inspect the aof_rewrites_count metric which gives you the cumulative count of AOF rewrites over the lifetime of the cluster. Suppose this metric shows you that increments in the rewrites count correspond to latency increases. In this circumstance you could address the issue by reducing the write rate or increasing the shard count to reduce the frequency of rewrites.

Checking if RDB save operations cause latency and memory pressure

Suppose that you detect increased latency or memory usage on your cluster. In this case you may want to check if the extra usage is related to RDB persistence.

Since you know RDB save operations can trigger transient load spikes, you can inspect the rdb_saves_count metric which gives the cumulative count of RDB saves over the lifetime of the cluster. Suppose this metric shows you that increments in the RDB saves count correspond to latency increases. In this circumstance you could reduce the RDB snapshot interval to lower the frequency of rewrites. You could also scale out the cluster to reduce the baseline load levels.

Interpreting metrics for Memorystore for Redis Cluster

As seen in the list above, many of the metrics share three categories: average, maximum, and total.

For Memorystore for Redis Cluster, we provide average and maximum variations of the same metric so you can use them both to identify hotspotting for that metric family.

The total value for the metric is independent, and provides separate insight unrelated to the hotspotting purpose of average and maximum.

Understanding average and maximum metrics

Suppose you compare the average_keyspace_hits and maximum_keyspace_hits values for your cluster. As the difference between the two metrics grows, a greater difference indicates more hot spotting of hits in your instance. Ideally you would have a close value between average_keyspace_hits and maximum_keyspace_hits, because this means that hits are more evenly distributed across your instance.

This principle applies to all metrics that have the average and maximum variations of the same metric.

Hot spotting example

If you compare average_keyspace_hits and maximum_keyspace_hits for all of the shards in your cluster, comparing these values indicates where hot spotting occurs. For example, suppose shards in a 6-shard cluster have the following number of hits:

  • Shard 1 – 2 hits
  • Shard 2 – 2 hits
  • Shard 3 – 2 hits
  • Shard 4 – 2 hits
  • Shard 5 – 2 hits
  • Shard 6 – 8 hits

In this example the average_keyspace_hits returns a value of 3, and the maximum_keyspace_hits returns 8, indicating that shard 6 is hot.