Overcommitting CPUs on sole-tenant VMs


CPU overcommit on sole-tenant nodes lets you schedule instances that can share their spare CPU cycles with each other. This lets you overprovision sole-tenant node resources and schedule more VM CPUs on a sole-tenant node than are normally available. CPU overcommit is particularly valuable for workloads that are underutilized but might experience relatively uncorrelated bursts. CPU overcommit can help you reduce costs by giving you access to CPUs that might otherwise be unavailable. It also lets you add more VMs on a single host, which can reduce per-socket or per-core licensing requirements.

When you are setting up a VM to be provisioned on a sole-tenant node, you must select a machine type for the VM. Machine types have a fixed number of CPUs, and if a VM is not using all of the CPU resources provided by the machine type, then there are CPU resources that can be made available to other VMs that are using CPU overcommit; that is, the sole-tenant node might have additional CPU resources available for use.

The value for the minimum number of CPUs allocated to a VM is specified either while creating a VM or after stopping a VM, and represents the minimum number of CPUs that are guaranteed to be available for a VM. This value can be set for each VM, which lets you provision VMs with different ratios of CPU overcommit on a single sole-tenant node. Lower values reduce capacity requirements at the potential expense of performance if correlated bursts occur. Determining an optimal value for the minimum number of CPUs requires an understanding of your workload utilization and iterative modification of the value. When setting this value, keep in mind the following:

  • If you don't set the value for the minimum number of CPUs, or you set the value for the minimum number of CPUs equal to the number of CPUs on the VM's machine type, the VM's allowable overcommit ratio is 1.0. With an overcommit ratio of 1.0, all of the CPUs are accessible only to this VM, and there are no CPU resources available to be overcommitted to other VMs.

  • The minimum number of CPUs can't be greater than the number of CPUs specified by the VM's machine type.

  • The sum of the values for the minimum number of CPUs for all of the VMs on a sole-tenant node can't exceed the CPU capacity of that sole-tenant node type, which on the n1-node-96-624 node type is 96.

The value for the number of CPUs specified by the VM's machine type is a static value, and represents the number of CPUs that a VM can burst up to from the minimum number if those CPUs are available. If you require a number of CPUs different from those provided by fixed machine types, you can use a custom machine type.

To configure sole-tenant VMs to have CPU resources available for overcommitting, do the following:

  1. Create a sole-tenant node template with CPU overcommit enabled. You must enable CPU overcommit while creating the node template. You can't enable CPU overcommit after creating a node template.

  2. Create a sole-tenant node group based on the sole-tenant node template that has CPU overcommit enabled.

  3. Create a VM and do the following:

    1. Choose a machine type for the VM. The number of CPUs on the machine type represents the maximum number of CPUs that the VM can burst up to from the minimum number of CPUs if the minimum number of CPUs is less than the number of CPUs specified by the machine type. You can choose a different machine type for each VM on a sole-tenant node, provided you do not exceed the CPU and memory capacity of the sole-tenant node.

    2. Specify the minimum number of CPUs to allocate to that single VM, or use a managed instance group to create multiple VMs all with the same CPU overcommit level.

Considerations

Before configuring the CPU overcommit levels for VMs, consider the criticality of your workload. Less critical workloads, such as development and test workloads, can potentially tolerate higher overcommit levels. More critical workloads, such as a production payments system, might not tolerate as much overcommit or any at all.

Also consider the utilization of your workload. Workloads with high CPU utilization are not good candidates for CPU overcommit because they will not have spare utilization cycles for other overcommitted VMs to utilize. Additionally, workloads with low average CPU utilization, but low utilization peak, might benefit from different sizes of machine types.

Using CPU overcommit benefits uncorrelated bursty workloads that have high peak utilization and low average utilization because these workloads are more likely to have available CPU resources to share across VMs when some VMs need to burst their utilization. If all of the VMs on a host burst at one time, the host will not have sufficient resources for your VMs.

Limitations

  • CPU overcommit is best suited for workloads without stringent performance requirements, for example, development and test workloads, and virtual desktop infrastructures.

  • High levels of CPU overcommit might not be appropriate for workloads that are sensitive to performance.

  • For workloads with average and peak utilization that is consistently low, Google recommends rightsizing. That is, instead of overcommitting CPUs, Google recommends modifying the size of the VM instance to match the resource requirements of that workload.

  • You can only overcommit CPUs on the following:

    • n1 machine type VMs that are provisioned on node groups based on the n1-node-96-624 node type

    • n2 machine type VMs that are provisioned on node groups based on the n2-node-80-640 node type

  • You can only configure the minimum CPU on each sole-tenant node to half of the VM's CPUs, allowing for a maximum sole-tenant node overcommit ratio of 2.0.

  • Sole-tenant node groups based on sole-tenant node templates that aren't configured for CPU overcommit do not allow for provisioning of VMs with CPU overcommit enabled; that is, you can't schedule a VM with a specified minimum number of CPUs on a sole-tenant node group that is not configured for CPU overcommit.

  • If your instances are too highly overcommitted, move them to another sole-tenant node.

  • CPU quota is based on the number of vCPUs of the sole-tenant node type, not the potential maximum of vCPUs available for overcommitting.

Pricing

Sole-tenant nodes that have CPU overcommit selected on their node template are charged an additional 25%. This charge is in addition to the 10% premium for running VMs on sole-tenant nodes. The CPU overcommit premium is fixed, regardless of the CPU overcommit level and how many VMs are scheduled on the sole-tenant node.

Sole-tenant nodes offer committed use discounts. Sustained use discounts are available for the sole-tenancy premium and the CPU overcommit premium.

To estimate the cost of running VMs on sole-tenant nodes, see the Pricing Calculator.

Before you begin

Setting the CPU overcommit level

The following procedures show you how to create a sole-tenant VM with CPU resources available for overcommitting. If you need to modify the CPU overcommit level of a VM that is currently running, you must first stop the VM.

Console

In the Google Cloud Console, create a sole-tenant VM on a sole-tenant node group that was created from a sole-tenant node template that has CPU overcommit enabled:

  1. Go to the Sole-tenant nodes page.

    Go to the Sole-tenant nodes page

  2. Click Node Groups.

  3. Click the sole-tenant node group on which to create a VM.

  4. Click Create instance.

  5. Specify the Name, Region, and Zone for the VM.

  6. Under Machine configuration, choose a fixed or custom Machine type with at least 4 vCPUs.

  7. Under CPU overcommit, select Enable CPU overcommit.

  8. Under Minimum vCPUs Allocated, adjust the slider or manually enter the number of vCPUs to specify the level of overcommit for the CPUs on this VM.

  9. Click Create to create a VM instance that has CPU resources available for overcommitting.

gcloud

The following example shows how to use the [gcloud compute instances create(/sdk/gcloud/reference/beta/compute/instances/create) command] to create a sole-tenant VM on a fixed machine type with CPU resources available for overcommitting.

To create a sole-tenant VM with CPU resources available for overcommitting on a custom machine type, omit the --machine-type flag, and instead, use the --custom-cpu and --custom-memory flags to specify the number of CPUs and the amount of memory, in gigabytes, for the custom machine.

gcloud compute instances create VM_NAME \
  --machine-type=MACHINE_TYPE \
  --min-node-cpu=MIN_VCPUS \
  --node-group=GROUP_NAME

Replace the following:

  • VM_NAME: name of the VM to overcommit CPUs on.

  • MACHINE_TYPE: machine type to provision the sole-tenant VM on. The number of CPUs specified by the machine type is the maximum number of CPUs the VM can burst up to from MIN_VCPUS.

  • MIN_VCPUS: minimum number of vCPUs guaranteed to be available to this VM.

  • GROUP_NAME: name of the sole-tenant node group to provision the VM on.

API

The following example shows how to use the instances.insert command to create a sole-tenant VM on a fixed machine type with CPU resources available for overcommitting.

To create a sole-tenant VM with CPU resources available for overcommitting on a custom machine type, replace the value for the machineType field with zones/zone/machineTypes/custom-CPUS-MEMORY, replacing CPUS with the number of CPUs and MEMORY with the amount of memory, in megabytes, for the custom machine type.

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/PROJECT_ZONE/instances

{
  "machineType": "zones/MACHINE_TYPE_ZONE/machineTypes/MACHINE_TYPE",
  "name": "VM_NAME",
  "scheduling": {
    "minNodeCpus": MIN_VCPUS,
    "nodeAffinities": [
      {
        "key": "compute.googleapis.com/node-group-name",
        "operator": "IN",
        "values": [
          "GROUP_NAME"
        ]
      }
    ]
  },
  "disks": [
    {
      "boot": true,
      "initializeParams": {
        "sourceImage": "/projects/IMAGE_PROJECT/global/images/family/IMAGE_FAMILY"
      }
    }
  ],
  "networkInterfaces": [
    {
      "network": "/global/networks/NETWORK",
    }
  ]
}

Replace the following:

  • PROJECT_ID: ID of your project.

  • PROJECT_ZONE: zone hosting your project.

  • MACHINE_TYPE_ZONE: zone hosting the machine type.

  • MACHINE_TYPE: machine type to provision the sole-tenant VM on. The number of CPUs specified by the machine type is the maximum number of CPUs the VM can burst up to from MIN_VCPUS.

  • VM_NAME: name of the sole-tenant VM to overcommit CPUs on.

  • MIN_VCPUS: minimum number of vCPUs guaranteed to be available to this VM.

  • GROUP_NAME: name of the sole-tenant node group to provision the VM on.

  • IMAGE_PROJECT: name of the image project containing the image family.

  • IMAGE_FAMILY: name of the image family from which to copy an image onto your VM.

  • NETWORK: name of the network, for example, default. Depending on your configuration, you might need to add a subnetwork field.

Viewing CPU usage

Check the CPU usage of sole-tenant VMs in a sole-tenant node group by following the procedure below.

Console

  1. In the Google Cloud Console, go to the Sole-tenant nodes page.

    Go to the Sole-tenant nodes page

  2. Click Node groups.

  3. Click the sole-tenant node group containing the sole-tenant node that has the VM with overcommitted CPUs.

  4. Click the sole-tenant node that has the VM with overcommitted CPUs.

  5. Under the name of the sole-tenant node, view the CPU usage, CPU overcommit type, and the Min CPU usage.

    • CPU usage shows the total of the maximum number of CPUs for all of the VMs on this sole-tenant node divided by the number of CPUs specified by the sole-tenant node type. The number of CPUs on the node available for overcommitting is the numerator minus the denominator, and the overcommit level is the quotient of the numerator and the denominator.

    • Min CPU usage shows the sum of the minimum number of CPUs allocated for all of the VMs on a sole-tenant node divided by the number of CPUs specified by the node type.

Optimizing CPU overcommit levels

To help optimize tuning of your CPU overcommit levels, Compute Engine provides the Scheduler Wait Time metric. The Scheduler Wait Time metric indicates the aggregate wait time for all vCPUs on the VM and helps you determine the impact of CPU overcommit on the VM's performance.

Workload sensitivity varies, but a general rule is to use 20 Mega Samples per second (MS/s) as the maximum wait time for each vCPU. For example, if a VM is set to 8 vCPUs, then a rule-of-thumb threshold is 160 MS/s, which results in an acceptable average Scheduler Wait Time of 20 MS/s per vCPU. The performance requirements of your workload will ultimately dictate acceptable thresholds.

Console

  1. In the Google Cloud Console, go to the Monitoring page.

    Go to the Monitoring page

  2. Click Metrics explorer.

  3. In Find resource type and metric, enter VM Instance.

  4. In Select a metric, enter Scheduler Wait Time.

  5. Optionally, set up alerting to trigger alerts for VM wait time thresholds by clicking Alerting.

What's next