CPU utilization and task priority
Cloud spanner measures CPU utilization based on the source and the priority of the task.
Source: A task can either be initiated by the user or the system.
Priority: The priority helps Cloud Spanner determine which tasks should execute first. The priority of system tasks is predetermined and cannot be configured. User tasks run at high priority unless otherwise specified. Many data requests, such as read and executeSql, let you specify a lower priority for the request. This can be useful, for example, when you are running batch, maintenance, or analytical queries that do not have strict performance SLOs.
Higher-priority tasks are in general going to be executed ahead of lower priority tasks, and Cloud Spanner will allow High priority tasks to utilize up to 100% of the available CPU resources even if there are competing lower priority tasks. While lower-priority system tasks can be delayed in the short term, they must run eventually. Therefore, you must provision your instance with enough nodes to handle all tasks.
If there are no high-priority tasks, Cloud Spanner will utilize up to 100% of the available CPU resources to complete lower-priority tasks more quickly. Spikes in background usage are not a sign of a problem. Lower-priority tasks can yield to higher-priority tasks, including user tasks, almost instantly.
The following table shows examples of each task:
|User tasks||System tasks|
Includes data splitting.
Includes data requests where PRIORITY_LOW is specified.
Cloud Spanner provides the following metrics for CPU utilization:
Smoothed CPU utilization: A rolling average of total CPU utilization, as a percentage of the instance's CPU resources, for each database. Each data point is an average for the previous 24 hours. Use this metric to create alerts and analyze CPU usage over long period of time, for example, 24 hours. You can view a chart for this metric in the Cloud Console or in the Cloud Monitoring console as Rolling average 24 hour.
CPU Utilization by priority: The CPU utilization, as a percentage of the instance's CPU resources, grouped by priority, user-initiated tasks and system-initiated tasks. Use this metric to create alerts and analyze CPU usage at a high level. You can view a chart for this metric in the Cloud Console or in the Cloud Monitoring console.
CPU Utilization by operation type: The CPU utilization, as a percentage of the instance's CPU resources, grouped by user-initiated operations such as reads, writes, and commits. Use this metric to get a detailed breakdown of CPU usage and to troubleshoot further, as explained in Investigating high CPU utilization. You can create a chart for this metric in the Cloud Monitoring console.
You can also use the Cloud Monitoring console to create alerts for CPU utilization, as described below.
Alerts for high CPU utilization
The following table specifies our recommendations for maximum CPU usage for both single-region and multi-region instances. These numbers are to ensure that your instance has enough compute capacity to continue to serve your traffic in the event of the loss of an entire zone (for single-region instances) or an entire region (for multi-region instances).
|Metric||Maximum for single-region instances||Maximum per region for multi-region instances|
|High priority total||65%||45%|
|24-hour smoothed aggregate||90%||90%|
To help you stay below the recommended maximums, create alerts in Cloud Monitoring that track high-priority CPU utilization and the average CPU utilization over 24 hours.
CPU utilization can have an impact on request latencies. Overloading of an individual backend server will trigger higher request latencies. Applications should run benchmarks and active monitoring to verify that Cloud Spanner meets their performance requirements.
Thus, for performance-sensitive applications, you may need to further reduce CPU utilization using techniques described in the following section.
Reducing CPU utilization
This section explains how to reduce an instance's CPU utilization.
In general, we recommend that you add nodes to your instance as a starting point. After you add nodes, you can investigate and address the root causes of high CPU utilization.
If you exceed the recommended maximums for CPU utilization, we strongly
recommend adding nodes to your instance so it can continue to operate
effectively. If you want to automate this process, you can create an application
that monitors CPU utilization, then adds and removes nodes as needed, using the
To determine how many nodes you need, consider the peak high-priority CPU utilization as well as the 24-hour smoothed average. Always allocate enough nodes to keep the CPU utilization below the recommended maximums. As previously described, you may need to allocate extra nodes for performance-sensitive applications (for example, to accommodate workload spikes).
If you do not have enough nodes, Cloud Spanner postpones tasks by priority level. Low-priority system tasks, like database compaction and schema change validation, can be deferred in favor of user tasks. However, these tasks are critical to the health of your instance, and Cloud Spanner cannot defer them indefinitely. If Cloud Spanner cannot complete its low-priority system tasks within a certain time window—on the order of several hours to a day—due to insufficient compute resources, Cloud Spanner might increase the priority of the system tasks. This change affects the performance of user tasks.
Investigating further with introspection tools
If the CPU Utilization by operation type metric indicates that a particular type of operation is contributing to high CPU utilization, use the Cloud Spanner introspection tools to troubleshoot further. For more information, see Investigating high CPU utilization.
- Monitor your instance with the Cloud Console or the Cloud Monitoring console.
- Create alerts for Cloud Spanner CPU utilization.
- Find out how to change the number of nodes in a Cloud Spanner instance.
Learn how to find correlations between high latency and other metrics.
To learn how to troubleshoot high CPU usage caused by a particular operation type, see Investigating high CPU utilization.