About GKE node sizing


This page describes how to plan the size of nodes in Google Kubernetes Engine (GKE) Standard node pools to reduce the risk of workload disruptions and out-of-resource terminations.

This planning is not required in GKE Autopilot because Google Cloud manages the nodes for you. However, this document helps Autopilot cluster operators who want to understand how much of the resource capacity in a node is available for your workloads to use.

Benefits of right-sized nodes

Ensuring that your nodes are correctly sized to accommodate your workloads and to handle spikes in activity provides benefits such as the following:

  • Better workload reliability because of a reduced risk of out-of-resource eviction.
  • Improved scalability for scaling workloads during high-traffic periods.
  • Lower costs because nodes aren't too large for your needs, which might result in wasted resources.

Node allocatable resources

GKE nodes run system components that let the node function as a part of your cluster. These components use node resources, such as CPU and memory. You might notice a difference between your node's total resources, which are based on the size of the underlying Compute Engine virtual machine (VM), and the resources that are available for your GKE workloads to request. This difference is because GKE reserves a pre-defined quantity of resources for system functionality and node reliability. The disk space that GKE reserves for system resources differs based on the node image. The remaining resources that are available for your workloads are called allocatable resources.

When you define Pods in a manifest, you can specify resource requests and limits in the Pod specification. When GKE places the Pods on a node, the Pod requests those specified resources from the allocatable resources on the node. When planning the size of the nodes in your node pools, you should consider how many resources your workloads need to function correctly.

Check allocatable resources on a node

To inspect the allocatable resources on an existing node, run the following command:

kubectl get node NODE_NAME \
    -o=yaml | grep -A 7 -B 7 capacity

Replace NODE_NAME with the name of the node.

The output is similar to the following:

allocatable:
  attachable-volumes-gce-pd: "127"
  cpu: 3920m
  ephemeral-storage: "47060071478"
  hugepages-1Gi: "0"
  hugepages-2Mi: "0"
  memory: 13498416Ki
  pods: "110"
capacity:
  attachable-volumes-gce-pd: "127"
  cpu: "4"
  ephemeral-storage: 98831908Ki
  hugepages-1Gi: "0"
  hugepages-2Mi: "0"
  memory: 16393264Ki
  pods: "110"

In this output, the values in the allocatable section are the allocatable resources on the node. The values in the capacity section are the total resources on the node. The units of ephemeral storage are bytes.

GKE resource reservations

GKE reserves specific amounts of memory and CPU resources on nodes based on the total size of the resource available on the node. Larger machine types run more containers and Pods, so the amount of resources that GKE reserves scales up for larger machines. Windows Server nodes also require more resources than equivalent Linux nodes, to account for running the Windows OS and for the Windows Server components that can't run in containers.

Memory and CPU reservations

The following sections describe the default memory and CPU reservations based on the machine type.

Memory reservations

For memory resources, GKE reserves the following:

  • 255 MiB of memory for machines with less than 1 GiB of memory
  • 25% of the first 4 GiB of memory
  • 20% of the next 4 GiB of memory (up to 8 GiB)
  • 10% of the next 8 GiB of memory (up to 16 GiB)
  • 6% of the next 112 GiB of memory (up to 128 GiB)
  • 2% of any memory above 128 GiB

GKE also reserves an additional 100 MiB of memory on every node to handle Pod eviction.

CPU reservations

For CPU resources, GKE reserves the following:

  • 6% of the first core
  • 1% of the next core (up to 2 cores)
  • 0.5% of the next 2 cores (up to 4 cores)
  • 0.25% of any cores above 4 cores

For shared-core E2 machine types, GKE reserves a total of 1060 millicores.

Local ephemeral storage reservation

GKE provides nodes with local ephemeral storage, backed by locally attached devices such as the node's boot disk or local SSDs. Ephemeral storage has no guarantee of availability, and data in ephemeral storage could be lost if a node fails and is deleted.

GKE reserves a portion of the node's total ephemeral storage as a single file system for the kubelet to use during Pod eviction, and for other system components running on the node. You can allocate the remaining ephemeral storage to your Pods to use for purposes such as logs. To learn how to specify ephemeral storage requests and limits in your Pods, refer to Local ephemeral storage.

GKE calculates the local ephemeral storage reservation as follows:

EVICTION_THRESHOLD + SYSTEM_RESERVATION

The actual values vary based on the size and type of device that backs the storage.

Ephemeral storage backed by node boot disk

By default, ephemeral storage is backed by the node boot disk. In this case, GKE determines the value of the eviction threshold as follows:

EVICTION_THRESHOLD = 10% * BOOT_DISK_CAPACITY

The eviction threshold is always 10% of the total boot disk capacity.

GKE determines the value of the system reservation as follows:

SYSTEM_RESERVATION = Min(50% * BOOT_DISK_CAPACITY, 6GiB + 35% * BOOT_DISK_CAPACITY, 100 GiB)

The system reservation amount is the lowest of the following:

  • 50% of the boot disk capacity
  • 35% of the boot disk capacity + 6 GiB
  • 100 GiB

For example, if your boot disk is 300 GiB, the following values apply:

  • 50% of capacity: 150 GiB
  • 35% of capacity + 6 GiB: 111 GiB
  • 100 GiB

GKE would reserve the following:

  • System reservation: 100 GiB (the lowest value)
  • Eviction threshold: 30 GiB

The total reserved ephemeral storage is 130 GiB. The remaining capacity, 170 GiB, is allocatable ephemeral storage.

Ephemeral storage backed by local SSDs

If your ephemeral storage is backed by local SSDs, GKE calculates the eviction threshold as follows:

EVICTION_THRESHOLD = 10% * SSD_NUMBER * 375 GiB

In this calculation, SSD_NUMBER is the number of attached local SSDs. All local SSDs are 375 GiB in size, so the eviction threshold is 10% of the total ephemeral storage capacity. Note this is computed before the drives are formatted, so the usable capacity is several percent less, depending on node image versions.

GKE calculates the system reservation depending on the number of attached SSDs, as follows:

Number of local SSDs System reservation (GiB)
1 50 GiB
2 75 GiB
3 or more 100 GiB

Use resource reservations to plan node sizes

  1. Consider the resource requirements of your workloads at deploy time and under load. This includes the requests and planned limits for the workloads, as well as overhead to accommodate scaling up.

  2. Consider whether you want a small number of large nodes or a large number of small nodes to run your workloads.

    • A small number of large nodes works well for resource-intensive workloads that don't require high availability. Node autoscaling is less agile because more Pods must be evicted for a scale-down to occur.
    • A large number of small nodes works well for highly-available workloads that aren't resource intensive. Node autoscaling is more agile because fewer Pods must be evicted for a scale-down to occur.
  3. Use the Compute Engine machine family comparison guide to determine the machine series and family that you want for your nodes.

  4. Consider the ephemeral storage requirements of your workloads. Is the node boot disk enough? Do you need local SSDs?

  5. Calculate the allocatable resources on your chosen machine type using the information in the previous sections. Compare this to the resources and overhead that you need.

    • If your chosen machine type is too large, consider a smaller machine to avoid paying for the extra resources.
    • If your chosen machine type is too small, consider a larger machine to reduce the risk of workload disruptions.

What's next