CPU utilization and task priority
When Cloud Spanner measures CPU utilization, it organizes tasks into the following categories:
|User tasks||System tasks|
|High priority||High-priority user tasks||High-priority system tasks|
Tasks that your application initiates and that Cloud Spanner handles as a high priority.
A read or commit request is usually high priority.
Tasks that Cloud Spanner initiates and handles as a high priority.
Examples include backfilling an index and data splitting.
|Low priority||Low-priority user tasks||Low-priority system tasks|
Tasks that your application initiates, and that do not need to be completed as quickly as high-priority tasks.
Examples include batch reads and batch queries.
Tasks that Cloud Spanner initiates, and that do not need to be completed as quickly as high-priority tasks.
Examples include database compaction and schema change validation.
High-priority tasks immediately preempt low-priority tasks. If necessary, Cloud Spanner stops all low-priority tasks and allows high-priority tasks to utilize up to 100% of the available CPU resources. While low-priority system tasks can be delayed in the short term, they must run eventually for optimal performance. Therefore, you must provision your instance with enough nodes to handle both high- and low-priority tasks.
If there are no high-priority tasks, Cloud Spanner will utilize up to 100% of the available CPU resources to complete low-priority tasks more quickly. Spikes in background usage are not a sign of a problem. Low-priority tasks can yield to high-priority tasks, including user tasks, almost instantly.
Cloud Spanner provides the following metrics for CPU utilization:
- Rolling average 24 hour: A rolling average of total CPU utilization, as a percentage of the instance's CPU resources, for each database. Each data point is an average for the previous 24 hours.
- High priority: The CPU utilization, as a percentage of the instance's CPU resources, for high-priority tasks.
Total: The total CPU utilization, as a percentage of the instance's CPU resources.
For instances, you can view total CPU utilization by database or by task priority.
For databases, you can view total CPU utilization by task priority.
You can view charts for these metrics in the GCP Console or in the Stackdriver Monitoring console. You can also use the Stackdriver Monitoring console to create alerts for high CPU utilization, as described below.
Alerts for high CPU utilization
To ensure that your Cloud Spanner instance has enough CPU resources to support your workload, we recommend that you keep CPU utilization below the following maximum values:
|Metric||Maximum for single-region instances||Maximum per region for multi-region instances|
|High priority total||65%||45%|
|24-hour smoothed aggregate||90%||90%|
To help you stay below the recommended maximums, create alerts in Stackdriver Monitoring that track high-priority CPU utilization and the average CPU utilization over 24 hours.
If you exceed the recommended maximums, we strongly recommend provisioning more
nodes for your instance so it can continue to operate. If you want to automate
this process, you can create an application that monitors CPU utilization, then
adds and removes nodes as needed, using either a client library or
gcloud command-line tool.
To determine the number of nodes you need, consider both the peak high-priority CPU utilization and the 24-hour smoothed average. Always allocate enough nodes to keep the CPU utilization below the recommended maximums. We recommend allocating extra resources to accommodate workload spikes, especially for performance-sensitive applications.
If you do not have enough nodes, Cloud Spanner postpones tasks by priority level. Low-priority system tasks, like database compaction and schema change validation, can be deferred in favor of user tasks. However, these tasks are critical to the health of your instance, and Cloud Spanner cannot defer them indefinitely. If Cloud Spanner cannot complete its low-priority system tasks within a certain time window—on the order of several hours to a day—due to insufficient compute resources, Cloud Spanner might increase their priority. When this happens, it affects the performance of user tasks.
- Monitor your instance with the GCP Console or the Stackdriver Monitoring console.
- Create alerts for Cloud Spanner CPU utilization.
- Find out how to change the number of nodes in a Cloud Spanner instance.
- Learn how to find correlations between high latency and other metrics.