CPU utilization metrics

This page describes the CPU utilization metrics that Cloud Spanner provides. You can view these metrics in the Google Cloud Console and in the Cloud Monitoring console.

CPU utilization and task priority

Cloud spanner measures CPU utilization based on the source and the priority of the task.

  • Source: A task can either be initiated by the user or the system.

  • Priority: The priority helps Cloud Spanner determine which tasks should execute first. The priority of system tasks is predetermined and cannot be configured. User tasks run at high priority unless otherwise specified. Many data requests, such as read and executeSql, let you specify a lower priority for the request. This can be useful, for example, when you are running batch, maintenance, or analytical queries that do not have strict performance SLOs.

    Higher-priority tasks are in general going to be executed ahead of lower priority tasks, and Cloud Spanner will allow High priority tasks to utilize up to 100% of the available CPU resources even if there are competing lower priority tasks. While lower-priority system tasks can be delayed in the short term, they must run eventually. Therefore, you must provision your instance with enough compute capacity to handle all tasks.

    If there are no high-priority tasks, Cloud Spanner will utilize up to 100% of the available CPU resources to complete lower-priority tasks more quickly. Spikes in background usage are not a sign of a problem. Lower-priority tasks can yield to higher-priority tasks, including user tasks, almost instantly.

The following table shows examples of each task:

User tasks System tasks
High priority

Includes data requests, such as read or executeSql, where either no priority or PRIORITY_HIGH is specified.

Includes data splitting.

Medium priority

Includes:

  • data requests where PRIORITY_MEDIUM is specified
  • reads and writes issued from Dataflow jobs (including Import/ Export).
  • Includes:

  • database compaction
  • schema change validation
  • the optimization phase of database restore
  • Low priority

    Includes data requests where PRIORITY_LOW is specified.

    Includes:

  • backup creation
  • backfilling an index
  • Available metrics

    Cloud Spanner provides the following metrics for CPU utilization:

    • Smoothed CPU utilization: A rolling average of total CPU utilization, as a percentage of the instance's CPU resources, for each database. Each data point is an average for the previous 24 hours. Use this metric to create alerts and analyze CPU usage over long period of time, for example, 24 hours. You can view a chart for this metric in the Cloud Console or in the Cloud Monitoring console as Rolling average 24 hour.

    • CPU Utilization by priority: The CPU utilization, as a percentage of the instance's CPU resources, grouped by priority, user-initiated tasks and system-initiated tasks. Use this metric to create alerts and analyze CPU usage at a high level. You can view a chart for this metric in the Cloud Console or in the Cloud Monitoring console.

    • CPU Utilization by operation type: The CPU utilization, as a percentage of the instance's CPU resources, grouped by user-initiated operations such as reads, writes, and commits. Use this metric to get a detailed breakdown of CPU usage and to troubleshoot further, as explained in Investigating high CPU utilization. You can create a chart for this metric in the Cloud Monitoring console.

      You can also use the Cloud Monitoring console to create alerts for CPU utilization, as described below.

    The following table specifies our recommendations for maximum CPU usage for both single-region and multi-region instances. These numbers are to ensure that your instance has enough compute capacity to continue to serve your traffic in the event of the loss of an entire zone (for single-region instances) or an entire region (for multi-region instances).

    Metric Maximum for single-region instances Maximum per region for multi-region instances
    High priority total 65% 45%
    24-hour smoothed aggregate 90% 90%

    To help you stay below the recommended maximums, create alerts in Cloud Monitoring that track high-priority CPU utilization and the average CPU utilization over 24 hours.

    CPU utilization can have an impact on request latencies. Overloading of an individual backend server will trigger higher request latencies. Applications should run benchmarks and active monitoring to verify that Cloud Spanner meets their performance requirements.

    Thus, for performance-sensitive applications, you may need to further reduce CPU utilization using techniques described in the following section.

    Reducing CPU utilization

    This section explains how to reduce an instance's CPU utilization.

    In general, we recommend that you increase the compute capacity of your instance as a starting point. After you increase the compute capacity, you can investigate and address the root causes of high CPU utilization.

    Increasing compute capacity

    If you exceed the recommended maximums for CPU utilization, we strongly recommend increasing the compute capacity of your instance so it can continue to operate effectively. If you want to automate this process, you can create an application that monitors CPU utilization, then increases or decreases compute capacity as needed, using the UpdateInstance method.

    To determine how much compute capacity you need, consider the peak high-priority CPU utilization as well as the 24-hour smoothed average. Always allocate enough compute capacity to keep the CPU utilization below the recommended maximums. As previously described, you may need to allocate extra compute capacity for performance-sensitive applications (for example, to accommodate workload spikes).

    If you do not have enough compute capacity, Cloud Spanner postpones tasks by priority level. Low-priority system tasks, like database compaction and schema change validation, can be deferred in favor of user tasks. However, these tasks are critical to the health of your instance, and Cloud Spanner cannot defer them indefinitely. If Cloud Spanner cannot complete its low-priority system tasks within a certain time window—on the order of several hours to a day—due to insufficient compute resources, Cloud Spanner might increase the priority of the system tasks. This change affects the performance of user tasks.

    Investigating further with introspection tools

    If the CPU Utilization by operation type metric indicates that a particular type of operation is contributing to high CPU utilization, use the Cloud Spanner introspection tools to troubleshoot further. For more information, see Investigating high CPU utilization.

    What's next