This page explains how to manage your instance's memory usage with key expiration recommendations, metrics monitoring, and key eviction configurations for different Cloud Memorystore for Redis use cases. These recommendations help you avoid and mitigate OOM (out of memory) errors.
Causes of OOM Errors
The following usage patterns can contribute to your instance running out of memory and getting an OOM error:
- High application write-rate with insufficient instance size
- An unsuitable maxmemory-policy
- Suboptimal key expiration configurations
You should set an alert to notify you when your instance memory usage exceeds 50%, at which point you should monitor your memory usage regularly, and consider scaling up your instance size.
Instance memory usage metrics
When you create a Cloud Memorystore for Redis instance, you choose an instance size to hold your Redis data. Cloud Memorystore for Redis provisions an instance with a total system memory larger than the instance size in order to run internal infrastructure that keeps the instance functioning properly.
When you add the memory used by your Redis data to the memory used by internal
infrastructure, you get the total system memory usage. To see the ratio of
system memory usage to the total system memory, see the system memory usage
ratio metric in the Stackdriver Metrics Explorer.
A ratio of
0.3 means that your Redis data plus internal infrastructure
occupies 30% of the available system memory.
If the system memory usage ratio metric for your instance reaches
instance can become unresponsive, requiring an instance restart and data loss.
To avoid this, set a Stackdriver alert
to notify you if your system memory usage ratio reaches
50%. If this happens, closely monitor this metric and consider scaling up your
instance if memory usage rises dramatically.
In addition to the system memory usage ratio Stackdriver also
provides the memory usage ratio metric that displays the current ratio of
stored Redis data to the instance size that you set during the instance's
creation. A ratio of
0.3 means that your Redis data uses 30% of your instance
size. The memory usage ratio metric can grow up to
1.0, in which case
Redis data has filled up the instance completely, and one of the maxmemory
policies described in Redis configurations
takes effect based on your instance settings.
Redis removes keys either through client requested deletion (DEL), expiration (EXPIRE), set (SET with EX), or eviction. Keys only expire if you set them with TTLs. After a key's designated TTL runs out, Redis removes the key asynchronously. For more details on how Redis evicts keys, see the Redis EXPIRE command documentation.
If your instance's memory is full and a new write comes in, Redis evicts keys according to your instance's maxmemory policy to make enough room for the write.
Best practices for data management
Understanding volatile vs non-volatile keys
You can use Cloud Memorystore for Redis to store both volatile keys and non-volatile keys. Volatile keys are values that you expect to stay in your Redis instance temporarily, such as cache values or expirable values. Non-volatile keys are values that you intend to keep in Redis long term.
For example, video game leaderboard scores, user sessions, and stock prices qualify as volatile-data because these values are only valid for a short amount of time. You should set TTLs on volatile keys.
Non-volatile data includes examples like total visits, or a map of config values. Do not set TTLs on these values. This information is intended to stay in Redis long-term.
Managing a combination of volatile and non-volatile data
Where possible, segregate volatile and non-volatile data into separate Redis instances. This allows you to configure each use-case separately.
If you decide to use one instance for both volatile and non-volatile data, only
set TTLs on the keys that you want to expire, and use one of the
maxmemory policies. Non-volatile keys should be small compared with volatile
So long as you are not frequently adding or removing non-volatile data, the ratio of expirable keys to non-expirable keys should be stable. You can calculate the percentage of expirable keys to all keys using Stackdriver metrics. In your Stackdriver dashboard, find the expirable keys metric and the total keys metric. Divide total keys by expirable keys to find the ratio of expirable keys.
If you see an unexpected decrease in the ratio of expirable keys, your client may be failing to mark some keys with TTLs. In this case, Redis fails to evict those keys and keeps them around just like non-volatile data. Over time this causes the space left for cacheable values to shrink until all of the keys are non-volatile, and writes fail.
If you notice your percentage of volatile keys decreases unexpectedly, use the Redis RANDOMKEY command to sample keys and find the source of non-volatile keys. You can also use the Redis SCAN command to find non-volatile keys.
Managing volatile data
Use one of the
volatile maxmemory policies, and set TTLs on all volatile data.
Setting longer TTLs increases your cache-hit ratio, which reduces the load on
your servers, backend, and databases. However, you may need a larger instance as
you could run out of memory if there is high write load and long TTLs. Shorter
TTLs have the reverse effect.
You typically set all of your keys with the same TTL. You can see the average TTL from a sample of keys by looking at the Stackdriver Metrics Explorer and graphing your instance's average TTL metric. You can expect a healthy cache to have an average TTL of half the maximum TTL.
If your instance reaches maxmemory, the maxmemory policy evicts keys before they are set to expire, bringing the average TTL closer to the maximum TTL. If you see an increase in the eviction rate, we advise increasing your Redis instance memory in order to increase cache hits.
Managing cacheable data without expirations
allkeys-lru policy to cache data that is not inherently temporal or
does not expire in a reasonable time. Or, if your instance is running Redis
version 4.0 or greater, you can also use
Take, for example, caching blog posts by their URLs. Blog posts don't "expire" per-se. Most of them just become less relevant, while some blog posts may continue to be popular.
Your Redis instance can only store a finite amount of data for blog posts, so as
your instance becomes full the
allkeys-lfu policy eventually
evicts the keys that are the least recently used. This maxmemory policy
preserves the most requested data in your Redis instance.
If you think your cache-hit ratio is too low, you can decide on a cache-hit ratio target and increase instance size until you reach the target. For some workloads it may take multiple days to get a steady cache-hit ratio.
Cache-hit ratio for
Your cache-hit ratio should be between ~50%-98%. If your cache-hit ratio decreases, that means you are getting cache misses. If you have a stable cache-hit ratio lower than 50%, you should scale up to a bigger instance.
If you choose not to scale up to attain a higher cache-hit ratio, your Redis instance continues to serve the keys that are used most often, but you may miss out on caching opportunities and could have a more expensive load on your system due to cache misses.
Cache-hit ratio for
A low or decreasing cache-hit ratio for the
maxmemory policies may be caused by insufficient instance size.
Your cache-hit ratio may also decrease because your application sets keys without TTLs. If this is the case, set TTLs on those keys so that they expire.
The expirable keys metric and the evicted keys metric, found in the Stackdriver Metrics Explorer page for your project, provide additional insight into key removal for your instance. Track the evicted keys per second to find the eviction rate for your instance.