Troubleshooting sole-tenancy


This page describes how to troubleshoot some potential issues that might occur while using sole-tenant nodes.

Node group size limitation

  • Problem: Size of a node group is limited to 100.

    • Solution: Create multiple node groups and use the same affinity label for each node group. Then, when scheduling VMs on these node groups, use the affinity label you assigned to the node groups.

VM scheduling failures

  • Problem: Can't schedule a VM on a sole-tenant node.

    • Solution:

      • You can't schedule a sole-tenant VM if there's no node in the zone that matches the VM's affinity or anti- affinity specification. Check that you have specified the correct affinity labels. Also, check that you have not specified any conflicting affinity labels.

      • If you are using the restart in place maintenance policy, check that the VM's OnHostMaintenance setting is set to terminate.

      • If you are using the migrate within node group maintenance policy, check that you are scheduling VMs on a node group, not a specific node or by using an affinity label.

      • Check that the specified node name matches the name of a node in the zone.

      • Check that the specified node group name matches the name of a node group in the zone.

      • You can't schedule a sole-tenant VM if the VM's minimum CPU platform (--min-cpu-platform) is set to any value other than AUTOMATIC.

VM tenancy

Autoscaling node groups

  • Problem: Can't enable the node group autoscaler.

    • Solution: You can only enable the node group autoscaler when you set the node group maintenance policy to the Default maintenance policy.
  • Problem: Want to retain already reserved nodes with the migrate within node group maintenance policy.

    • Solution: When using the Migrate within node group maintenance policy, set the node group autoscaler to only scale out, which adds nodes to the node group when it needs extra capacity.
  • Problem: No remaining CPU quota in the region.

    • Solution: Autoscaling might fail if you have no remaining CPU quota in the region, the number of nodes in a group is at the maximum number allowed, or there was a billing issue. Depending on the error, you might need to request an increase in CPU quota or create a new sole-tenant node group.

Bringing your own licenses (BYOL)

  • Problem: Configuring the restart in-place maintenance policy.

    • Solution: If using the restart in-place maintenance policy, set the VM's OnHostMaintenanceSetting to terminate.
  • Problem: Scheduling VMs on node groups with the migrate within node group maintenance policy.

    • Solution:

      • Schedule VMs onto a node group, not on a specific node or by using a customized affinity label.

      • Create 2 nodes and enable the autoscaler; otherwise, if you create a node group of size 1, the node is reserved for holdback.

Capacity issues

  • Problem: Not enough capacity on a node or in a node group.

    • Solution:

      • If you reschedule a VM onto a node that is scheduling VMs in parallel, in rare situations there might not be capacity.

      • If you reschedule a VM onto a node in a node group on which you haven't enabled autoscaling, there might not be capacity.

      • If you reschedule a VM onto a node in a node group on which you have enabled autoscaling but have exceeded your CPU quota, there might not be capacity.

CPU overcommit

  • Problem: An error indicating that no sole-tenant node group was specified when you set the value for the minimum number of CPUs:

    Invalid value for field 'resource.scheduling.minNodeCpus': '2'. Node virtual
    CPU count may only be specified for sole-tenant instances.
    
    • Solution: Specify a sole-tenant node group when setting the value for the minimum number of CPUs
  • Problem: An error indicating that the total of the minimum number of CPUs for all sole-tenant VMs on a node is greater than the CPU capacity of the node type.

    Node virtual CPU count must not be greater than the guest virtual CPU count.
    
    No feasible nodes found for the instance given its node affinities and other
    constraints.
    
    • Solution: Specify values for the minimum number of CPUs for each VM so that the total for all VMs does not exceed the number of CPUs specified by the sole-tenant node type.
  • Problem: An error indicating that the total number of CPUs specified by the machine types for all VMs on a node is more than twice the minimum number of CPUs specified for all VMs on a node.

    Guest virtual CPU count must not be greater than [~2.0] times the node
    virtual CPU count.
    
    • Solution: Increase the value for the minimum number of CPUs for VMs on this node until the total minimum number of CPUs is greater than or equal to half the value for the total number of CPUs determined by the machine types.
  • Problem: An error indicating that the value for the minimum number of CPUs is not an even number greater than or equal to 2.

    Invalid value for field 'resource.scheduling.minNodeCpus': '3'. Node virtual
    CPU count must be even.
    
    • Solution: Specify a value for the minimum number of CPUs that is an even number greater than or equal to 2.

GPUs

  • Problem: An error indicating that instance creation failed because of node property incompatibility.

    Instance could not be scheduled due to no matching node with property compatibility.
    
    • Solution: GPU-enabled sole-tenant nodes only support VMs that have GPUs attached. To resolve this issue, Provision a sole-tenant VM with GPUs.

What's next