Sole-tenant nodes


This document describes sole-tenant nodes. For information about how to provision VMs on sole-tenant nodes, see Provisioning VMs on sole-tenant nodes.

Sole-tenancy lets you have exclusive access to a sole-tenant node, which is a physical Compute Engine server that is dedicated to hosting only your project's VMs. Use sole-tenant nodes to keep your VMs physically separated from VMs in other projects, or to group your VMs together on the same host hardware, as shown in the following diagram.

Figure 1: A multi-tenant host versus a sole-tenant node.

VMs running on sole-tenant nodes can use the same Compute Engine features as other VMs, including transparent scheduling and block storage, but with an added layer of hardware isolation. To give you full control over the VMs on the physical server, each sole-tenant node maintains a one-to-one mapping to the physical server that is backing the node.

Within a sole-tenant node, you can provision multiple VMs on machine types of various sizes, which lets you efficiently use the underlying resources of the dedicated host hardware. Also, because you aren't sharing the host hardware with other projects, you can meet security or compliance requirements with workloads that require physical isolation from other workloads or VMs. If your workload requires sole tenancy only temporarily, you can modify VM tenancy as necessary.

Sole-tenant nodes can help you meet dedicated hardware requirements for bring your own license (BYOL) scenarios that require per-core or per-processor licenses. When you use sole-tenant nodes, you have some visibility into the underlying hardware, which lets you track core and processor usage. To track this usage, Compute Engine reports the ID of the physical server on which a VM is scheduled. Then, by using Cloud Logging, you can view the historical server usage of a VM. To optimize the use of the host hardware, you can overcommit sole-tenant VM CPUs.

Through a configurable maintenance policy, you can control the behavior of sole-tenant VMs while their host is undergoing maintenance. You can specify when maintenance occurs, and whether the VMs maintain affinity with a specific physical server or are moved to other sole-tenant nodes within a node group.

Workload considerations

The following types of workloads might benefit from using sole-tenant nodes:

  • Gaming workloads with performance requirements.

  • Finance or healthcare workloads with security and compliance requirements.

  • Windows workloads with licensing requirements.

  • Machine learning, data processing, or image rendering workloads. For these workloads, consider reserving GPUs.

  • Workloads requiring increased input/output operations per second (IOPS) and decreased latency, or workloads that use temporary storage in the form of caches, processing space, or low-value data. For these workloads, consider reserving local SSDs.

Node templates

A node template is a regional resource that defines the properties of each node in a node group. When you create a node group from a node template, the properties of the node template are immutably copied to each node in the node group.

When you create a node template, specify a node type, and optionally specify node affinity labels. You can only specify node affinity labels on a node template; you can't specify node affinity labels on a node group.

Node types

When configuring a node template, specify a node type to apply to all nodes within a node group created based on the node template. The sole-tenant node type, referenced by the node template, specifies the total amount of vCPU cores and memory for nodes created in node groups that use that template. For example, the n2-node-80-640 node type has 80 vCPUs and 640 GB of memory.

The VMs that you add to a sole-tenant node must have the same machine type as the node type that you specify in the node template. For example, n2 sole-tenant node types are only compatible with VMs created with the n2 machine type. You can add VMs to a sole-tenant node until the total amount of vCPUs or memory exceeds the capacity of the node.

When you create a node group using a node template, each node in the node group inherits the node template's node type specifications. A node type applies to each individual node within a node group, not to all of the nodes in the group uniformly. So, if you create a node group with two nodes that are both of the n2-node-80-640 node type, each node is allocated 80 vCPUs and 640 GB of memory.

Depending on your workload requirements, you might fill the node with multiple smaller VMs running on machine types of various sizes, including predefined machine types, custom machine types, and machine types with extended memory. When a node is full, you cannot schedule additional instances on that node.

The following table below shows all available node types. To see a list of the node types available for your project, run the gcloud compute sole-tenancy node-types list command or the nodeTypes.list REST request. For information about the prices of these node types, see sole-tenant node pricing.

Node type Processor vCPU GB vCPU:GB Sockets Cores:Socket Total cores
c2-node-60-240 Cascade Lake 60 240 1:4 2 18 36
m1-node-96-1433 Skylake 96 1433 1:14.9 2 28 56
m1-node-160-3844 Broadwell E7 160 3844 1:24 4 22 88
m2-node-416-11776 Skylake 416 11776 1:28.3 8 28 224
n1-node-96-624 Skylake 96 624 1:6.5 2 28 56
n2-node-80-640 Cascade Lake 80 640 1:8 2 24 48
n2d-node-224-896 AMD EPYC Rome 224 896 1:4 2 64 128

All nodes let you schedule VMs of different shapes. Node of type n nodes are general purpose nodes, on which you can schedule custom machine type instances. For recommendations about which node type to choose, see Recommendations for machine types. For information about performance, see CPU platforms.

There may be times when Compute Engine replaces an older node type with a newer node type. If Compute Engine replaces a node type, you can't create additional node groups from templates that specify the replaced node type. When Compute Engine replaces a node type, you must review and modify any existing node templates that specify the node type that is no longer available.

Node groups and VM provisioning

Sole-tenant node templates define the properties of a node group, and you must create a node template before creating a node group in a Google Cloud zone. When you create a group, specify the maintenance policy for VM instances on the node group, and the number of nodes for the node group. A node group can have zero or more nodes; for example, you can reduce the number of nodes in a node group to zero when you don't need to run any VM instances on nodes in the group, or you can enable the node group autoscaler to manage the size of the node group automatically.

Before provisioning VMs on sole-tenant nodes, you must create a sole-tenant node group. A node group is a homogeneous set of sole-tenant nodes in a specific zone. Node groups can contain multiple VMs running on machine types of various sizes, as long as the machine type has 2 or more vCPUs.

When you create a node group, enable autoscaling so that the size of the group adjusts automatically to meet the requirements of your workload. If your workload requirements are static, you can manually specify the size of the node group.

After creating a node group, you can provision VMs on the group or on a specific node within the group. For further control, use node affinity labels to schedule VMs on any node with matching affinity labels.

After you've provisioned VMs on node groups, and optionally assigned affinity labels to provision VMs on specific node groups or nodes, consider labeling your resources to help manage your VMs. Labels are key-value pairs that can help you categorize your VMs so that you can view them in aggregate for reasons such as billing. For example, you can use labels to mark the role of a VM, its tenancy, the license type, or its location.

Maintenance policies

Depending on your licensing scenarios and workloads, you might want to limit the number of physical cores used by your VMs. The maintenance policy you choose might depend on, for example, your licensing or compliance requirements, or, you might want to choose a policy that lets you limit usage of physical servers. With all of these policies, your VMs remain on dedicated hardware.

When you schedule VMs on sole-tenant nodes, you can choose from the following three different maintenance policies, which let you determine how and whether Compute Engine live migrates VMs during host maintenance events, which occur approximately every 4 to 6 weeks. During maintenance, Compute Engine live migrates, as a group, all of the VMs on the host to a different sole-tenant node, but, in some cases, Compute Engine might break up the VMs into smaller groups and live migrate each smaller group of VMs to separate sole-tenant nodes.

Default maintenance policy

This is the default maintenance policy, and VMs on nodes groups configured with this policy follow traditional maintenance behavior for non-sole-tenant VMs. That is, depending on the on-host maintenance setting of the VM's host, VMs live migrate to a new sole-tenant node in the node group before a host maintenance event, and this new sole-tenant node only runs the customer's VMs.

This policy is most suitable for per-user or per-device licenses that require live migration during maintenance events. This setting doesn't restrict migration of VMs to within a fixed pool of physical servers, and is recommended for general workloads without physical server requirements and that do not require existing licenses.

Because VMs live migrate to any server without considering existing server affinity with this policy, this policy is not suitable for scenarios requiring minimization of the use of physical cores during maintenance events.

The following figure shows an animation of the Default maintenance policy.

Figure 2: Animation of the Default maintenance policy.

Restart in place maintenance policy

When you use this maintenance policy, Compute Engine stops VMs during maintenance events, and then restarts the VMs on the same physical server after the maintenance event. You must set the VM's on host maintenance setting to TERMINATE when using this policy.

This policy is most suitable for workloads that are fault-tolerant and can experience approximately one hour of downtime during host maintenance events, workloads that must remain on the same physical server, workloads that do not require live migration, or if you have licenses that are based on the number of physical cores or processors.

With this policy, the instance can be assigned to the node group using node-name, node-group-name, or node affinity label.

The following figure shows an animation of the Restart in place maintenance policy.

Figure 3: Animation of the Restart in place maintenance policy.

Migrate within node group maintenance policy

When using this maintenance policy, Compute Engine live migrates VMs within a fixed-sized group of physical servers during maintenance events, which helps limit the number of unique physical servers used by the VM.

This policy is most suitable for high-availability workloads with licenses that are based on the number of physical cores or processors, because with this maintenance policy, each sole-tenant node in the group is pinned to a fixed set of physical servers, which is different than the default policy that lets VMs migrate to any server.

To ensure capacity for live migration, Compute Engine reserves 1 holdback node for every 20 nodes that you reserve. The following table shows how many holdback nodes Compute Engine reserves depending on how many nodes you reserve for your node group.

Total nodes in group Holdback nodes reserved for live migration
1 Not applicable. Must reserve at least 2 nodes.
2 to 20 1
21 to 40 2
41 to 60 3
61 to 80 4
81 to 100 5

With this policy, each instance must target a single node group by using the node-group-name affinity label and cannot be assigned to any specific node node-name. This is required to let Compute Engine live migrate the VMs to the holdback node when there is a maintenance event. Please note that the VMs can use any custom node affinity labels as long as they are assigned the node-group-name and not the node-name.

The following figure shows an animation of the Migrate within node group maintenance policy.

Figure 4: Animation of the Migrate within node group maintenance policy.

Maintenance windows

If you are managing workloads—for example—finely tuned databases, that might be sensitive to the performance impact of live migration, you can determine when maintenance begins on a sole-tenant node group by specifying a maintenance window when you create the node group. You cannot modify the maintenance window after you create the node group.

Maintenance windows are 4-hour blocks of time that you can use to specify when Google performs maintenance on your sole-tenant VMs. Maintenance events occur approximately once every two weeks.

The maintenance window applies to all VMs in the sole-tenant node group, and it only specifies when the maintenance begins. Maintenance is not guaranteed to finish during the maintenance window, and there is no guarantee on how frequently maintenance occurs. Maintenance windows are not supported on node groups with the Migrate within node group maintenance policy.

Host errors

When there is a rare critical hardware failure on the host—sole-tenant or multi-tenant—Compute Engine does the following:

  1. Retires the physical server and its unique identifier.

  2. Revokes your project's access to the physical server.

  3. Replaces the failed hardware with a new physical server that has a new unique identifier.

  4. Moves the VMs from the failed hardware to the replacement node.

  5. Restarts the affected VMs if you configured them to automatically restart.

Node affinity and anti-affinity

Sole-tenant nodes ensure that your VMs do not share host hardware with VMs from other projects. However, you still might want to group several workloads together on the same sole-tenant node or isolate your workloads from one another on different nodes. For example, to help meet some compliance requirements, you might need to use affinity labels to separate sensitive workloads from non-sensitive workloads.

When you create a VM, you request sole-tenancy by specifying node affinity or anti-affinity, referencing one or more node affinity labels. You specify custom node affinity labels when you create a node template, and Compute Engine automatically includes some default affinity labels on each node. By specifying affinity when you create a VM, you can schedule VMs together on a specific node or nodes in a node group. By specifying anti-affinity when you create a VM, you can ensure that certain VMs are not scheduled together on the same node or nodes in a node group.

Node affinity labels are key-value pairs assigned to nodes, and are inherited from a node template. Affinity labels let you:

  • Control how individual VM instances are assigned to nodes.
  • Control how VM instances created from a template, such as those created by a managed instance group, are assigned to nodes.
  • Group sensitive VM instances on specific nodes or node groups, separate from other VMs.

Default affinity labels

Compute Engine assigns two default affinity labels to each node:

  • A label for the node group name:
    • Key: compute.googleapis.com/node-group-name
    • Value: Name of the node group.
  • A label for the node name:
    • Key: compute.googleapis.com/node-name
    • Value: Name of the individual node.

Custom affinity labels

You can create custom node affinity labels when you create a node template. These affinity labels are assigned to all nodes in node groups created from the node template. You can't add more custom affinity labels to nodes in a node group after the node group has been created.

For information about how to use affinity labels, see Configuring node affinity.

Pricing

  • To help you to minimize the cost of your sole-tenant nodes, Compute Engine provides committed use discounts and sustained use discounts. Also, because you are already billed for the vCPU and memory of your sole-tenant nodes, you do not pay extra for the VMs on your sole-tenant nodes.

  • If you provision sole-tenant nodes with GPUs or local SSDs, you are billed for all of the GPUs or local SSDs on each node that you provision. The sole-tenancy premium does not apply to GPUs or local SSDs.

  • If you reserve GPUs or local SSDs on a sole-tenant node, you are billed for all of the GPUs or local SSDs on each node that you reserve the resource on.

Availability

  • Sole-tenant nodes are available in select zones. To ensure high-availability, schedule VMs on sole-tenant nodes in different zones.

  • Before using GPUs or local SSDs on sole-tenant nodes, make sure you have enough GPU or local SSD quota in the zone where you are reserving the resource.

  • Compute Engine supports GPUs on n1 sole-tenant node types that are in zones with GPU support. The following table shows the types of GPUs that you can attach to n1 nodes and how many GPUs you must attach when you create the node template.

    GPU type GPU quantity
    NVIDIA® P100 4
    NVIDIA® P4 4
    NVIDIA® T4 4
    NVIDIA® V100 8
  • Compute Engine supports local SSDs on n1, n2, and n2d sole-tenant node types that are in zones with local SSD support.

Restrictions

What's next