CPU utilization metrics

This page describes the CPU utilization metrics that Cloud Spanner provides. You can view these metrics in the Google Cloud Console and in the Cloud Monitoring console.

CPU utilization and task priority

When Cloud Spanner measures CPU utilization, it organizes tasks into the following categories:

User tasks System tasks
High priority High-priority user tasks High-priority system tasks

Tasks that your application initiates and that Cloud Spanner handles as a high priority.

Includes most read and commit requests. Also includes parts of the work for batch writes, but not batch reads.

Tasks that Cloud Spanner initiates and handles as a high priority.

Includes backfilling an index and data splitting.

Low priority Low-priority user tasks Low-priority system tasks

Tasks that your application initiates, and that do not need to be completed as quickly as high-priority tasks.

Includes reads and writes issued from dataflow jobs (including Import/ Export).

Tasks that Cloud Spanner initiates, and that do not need to be completed as quickly as high-priority tasks.

Includes database compaction, schema change validation, and backup creation.

High-priority tasks immediately preempt low-priority tasks. If necessary, Cloud Spanner stops all low-priority tasks and allows high-priority tasks to utilize up to 100% of the available CPU resources. While low-priority system tasks can be delayed in the short term, they must run eventually for optimal performance. Therefore, you must provision your instance with enough nodes to handle both high- and low-priority tasks.

If there are no high-priority tasks, Cloud Spanner will utilize up to 100% of the available CPU resources to complete low-priority tasks more quickly. Spikes in background usage are not a sign of a problem. Low-priority tasks can yield to high-priority tasks, including user tasks, almost instantly.

Available metrics

Cloud Spanner provides the following metrics for CPU utilization:

  • Rolling average 24 hour: A rolling average of total CPU utilization, as a percentage of the instance's CPU resources, for each database. Each data point is an average for the previous 24 hours.
  • High priority: The CPU utilization, as a percentage of the instance's CPU resources, for high-priority tasks.
  • Total: The total CPU utilization, as a percentage of the instance's CPU resources.

    For instances, you can view total CPU utilization by database or by task priority.

    For databases, you can view total CPU utilization by task priority.

You can view charts for these metrics in the Cloud Console or in the Cloud Monitoring console. You can also use the Cloud Monitoring console to create alerts for high CPU utilization, as described below.

The following table specifies our recommendations for maximum CPU usage for both single-region and multi-region instances. These numbers are to ensure that your instance has enough compute capacity to continue to serve your traffic in the event of the loss of an entire zone (for single-region instances) or an entire region (for multi-region instances).

Metric Maximum for single-region instances Maximum per region for multi-region instances
High priority total 65% 45%
24-hour smoothed aggregate 90% 90%

To help you stay below the recommended maximums, create alerts in Cloud Monitoring that track high-priority CPU utilization and the average CPU utilization over 24 hours.

CPU utilization can have an impact on request latencies. Overloading of an individual backend server will trigger higher request latencies. Applications should run benchmarks and active monitoring to verify that Cloud Spanner meets their performance requirements.

Thus, for performance-sensitive applications, you may need to further reduce CPU utilization using techniques described in the following section.

Reducing CPU utilization

This section explains how to reduce an instance's CPU utilization.

In general, we recommend that you add nodes to your instance as a starting point. After you add nodes, you can investigate and address the root causes of high CPU utilization.

Adding nodes

If you exceed the recommended maximums for CPU utilization, we strongly recommend adding nodes to your instance so it can continue to operate effectively. If you want to automate this process, you can create an application that monitors CPU utilization, then adds and removes nodes as needed, using the UpdateInstance method.

To determine how many nodes you need, consider the peak high-priority CPU utilization as well as the 24-hour smoothed average. Always allocate enough nodes to keep the CPU utilization below the recommended maximums. As previously described, you may need to allocate extra nodes for performance-sensitive applications (for example, to accommodate workload spikes).

If you do not have enough nodes, Cloud Spanner postpones tasks by priority level. Low-priority system tasks, like database compaction and schema change validation, can be deferred in favor of user tasks. However, these tasks are critical to the health of your instance, and Cloud Spanner cannot defer them indefinitely. If Cloud Spanner cannot complete its low-priority system tasks within a certain time window—on the order of several hours to a day—due to insufficient compute resources, Cloud Spanner might increase the priority of the system tasks. This change affects the performance of user tasks.

Optimizing query performance

In some cases, your instance might have high CPU utilization because of SQL queries that are not as efficient as they could be. You can use the query statistics for your database to identify queries that result in high CPU usage. Then, based on their query plans, you can optimize these queries to reduce CPU usage.

What's next