Troubleshoot GKE Standard node pools

Standard

This page shows you how to resolve issues with GKE Standard mode node pools.

If you need additional assistance, reach out to Cloud Customer Care.

Node pool creation issues

This section lists issues that might occur when creating new node pools in Standard clusters and provides suggestions for how you might fix them.

Issue: Node pool creation fails due to insufficient resources

The following issue occurs when you create a node pool with specific hardware in a Google Cloud zone that doesn't have enough hardware available to meet your requirements.

To validate that node pool creation failed because a zone didn't have enough resources, check your logs for relevant error messages.

Go to Logs Explorer in the Google Cloud console:

Go to Logs Explorer

In the Query field, specify the following query:

log_id(cloudaudit.googleapis.com/activity)
resource.labels.cluster_name="CLUSTER_NAME"
protoPayload.status.message:("ZONE_RESOURCE_POOL_EXHAUSTED" OR "does not have enough resources available to fulfill the request" OR "resource pool exhausted" OR "does not exist in zone")

Replace CLUSTER_NAME with the name of your GKE cluster.

Click Run query.

You might see one of the following error messages:

resource pool exhausted
The zone does not have enough resources available to fulfill the request. Try a different zone, or try again later.
ZONE_RESOURCE_POOL_EXHAUSTED
ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS
Machine type with name 'MACHINE_NAME' does not exist in zone 'ZONE_NAME'

To resolve this issue, try the following suggestions:

Ensure that the selected Google Cloud region or zone has the specific hardware that you need. Use the Compute Engine availability table to check whether specific zones support specific hardware. Choose a different Google Cloud region or zone for your nodes that might have better availability of the hardware that you need.
Create the node pool with smaller machine types. Increase the number of nodes in the node pool so that the total compute capacity remains the same.
Use Compute Engine capacity reservation to reserve the resources in advance.
Use best-effort provisioning, described in the following section, to successfully create the node pool if it can provision at least a specified minimum number of nodes out of the requested number.

Best-effort provisioning

For certain hardware, you can use best-effort provisioning, which tells GKE to successfully create the node pool if it can provision at least a specified minimum number of nodes. GKE continues attempting to provision the remaining nodes to satisfy the original request over time. To tell GKE to use best-effort provisioning, use the following command:

gcloud container node-pools create NODE_POOL_NAME \
    --cluster=CLUSTER_NAME \
    --node-locations=ZONE1,ZONE2,... \
    --machine-type=MACHINE_TYPE
    --best-effort-provision \
    --min-provision-nodes=MINIMUM_NODES

Replace the following:

NODE_POOL_NAME: the name of the new node pool.
ZONE1,ZONE2,...: the Compute Engine zones for the nodes. These zones must support the selected hardware.
MACHINE_TYPE: the Compute Engine engine machine type for the nodes. For example, a2-highgpu-1g.
MINIMUM_NODES: the minimum number of nodes for GKE to provision and successfully create the node pool. If omitted, the default is 1.

For example, consider a scenario in which you need 10 nodes with attached NVIDIA A100 40GB GPUs in us-central1-c. According to the GPU regions and zones availability table, this zone supports A100 GPUs. To avoid node pool creation failure if 10 GPU machines aren't available, you use best-effort provisioning.

gcloud container node-pools create a100-nodes \
    --cluster=ml-cluster \
    --node-locations=us-central1-c \
    --num-nodes=10 \
    --machine-type=a2-highgpu-1g \
    --accelerator=type=nvidia-tesla-a100,count=1 \
    --best-effort-provision \
    --min-provision-nodes=5

GKE creates the node pool even if only five GPUs are available in us-central1-c. Over time, GKE attempts to provision more nodes until there are 10 nodes in the node pool.

Error: Instance does not contain 'instance-template' metadata

You might see the following error as a status of a node pool that fails to upgrade, scale, or perform automatic node repair:

Instance INSTANCE_NAME does not contain 'instance-template' metadata

This error indicates that the metadata of VM instances, allocated by GKE, was corrupted. This typically happens when custom-authored automation or scripts attempt to add new instance metadata (like block-project-ssh-keys), and instead of just adding or updating values, it also deletes existing metadata. You can read about VM instance metadata in Setting custom metadata.

In case any of the critical metadata values (among others: instance-template, kube-labels, kubelet-config, kubeconfig, cluster-name, configure-sh, cluster-uid) were deleted, the node or entire node pool might render itself into an unstable state as these values are crucial for GKE operations.

If the instance metadata was corrupted, we recommend that you recover the metadata by re-creating the node pool that contains the corrupted VM instances. You will need to add a node pool to your cluster and increase the node count on the new node pool, while cordoning and removing nodes on another. See the instructions to migrate workloads between node pools.

To find who and when instance metadata was edited, you can review Compute Engine audit logging information or find logs using Logs Explorer with a search query similar to the following:

resource.type="gce_instance_group_manager"
protoPayload.methodName="v1.compute.instanceGroupManagers.setInstanceTemplate"

In the logs you can find the request originator IP address and user agent. For example:

requestMetadata: {
  callerIp: "REDACTED"
  callerSuppliedUserAgent: "google-api-go-client/0.5 GoogleContainerEngine/v1"
}

Migrate workloads between node pools

Use the following instructions to migrate workloads from one node pool to another node pool. If you want to change the machine attributes of the nodes in your node pool, see Vertically scale by changing the node machine attributes.

Understand how to migrate Pods to a new node pool

To migrate Pods to a new node pool, you must do the following:

Cordon the nodes in the existing node pool: This operation marks the nodes in the existing node pool as unschedulable. Kubernetes stops scheduling new Pods to these nodes once you mark them as unschedulable.
Drain the nodes in the existing node pool: This operation evicts the workloads running on the nodes of the existing node pool gracefully.

These steps, done individually for each node, cause Pods running in your existing node pool to gracefully terminate. Kubernetes reschedules them onto other available nodes.

To make sure Kubernetes terminates your applications gracefully, your containers should handle the SIGTERM signal. Use this approach to close active connections to clients and commit or rollback database transactions in a clean way. In your Pod manifest, you can use the spec.terminationGracePeriodSeconds field to specify how long Kubernetes must wait before stopping containers in the Pod. This defaults to 30 seconds. You can read more about Pod termination in the Kubernetes documentation.

You can cordon and drain nodes using the kubectl cordon and kubectl drain commands.

Create node pool and migrate workloads

To migrate your workloads to a new node pool, create the new node pool, then cordon and drain the nodes in the existing node pool:

Add a node pool to your cluster.

Verify that the new node pool is created by running the following command:
```
gcloud container node-pools list --cluster CLUSTER_NAME
```
Note: If you create a node pool with cluster autoscaler enabled and you wait more than 10 minutes to drain the existing node pool after creating the new node pool, cluster autoscaler might scale down the nodes in the new node pool because they are underutilized. In this scenario, if the Pods can only run on the new node pool, there would be a delay in rescheduling the Pods as the new node pool scales back up. To learn more, see how cluster autoscaler works.
Run the following command to see which node the Pods are running on (see the NODE column):
```
kubectl get pods -o=wide
```
Get a list of nodes in the existing node pool, replacing EXISTING_NODE_POOL_NAME with the name:
```
kubectl get nodes -l cloud.google.com/gke-nodepool=EXISTING_NODE_POOL_NAME
```
Run the kubectl cordon NODE command (substitute NODE with the names from the previous command). The following shell command iterates each node in the existing node pool and marks them as unschedulable:
```
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=EXISTING_NODE_POOL_NAME -o=name); do
  kubectl cordon "$node";
done
```
Optionally, update your workloads running on the existing node pool to add a nodeSelector for the label cloud.google.com/gke-nodepool:NEW_NODE_POOL_NAME, where NEW_NODE_POOL_NAME is the name of the new node pool. This ensures that GKE places those workloads on nodes in the new node pool.

Drain each node by evicting Pods with an allotted graceful termination period of 10 seconds:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=EXISTING_NODE_POOL_NAME -o=name); do
  kubectl drain --force --ignore-daemonsets --delete-emptydir-data --grace-period=GRACEFUL_TERMINATION_SECONDS  "$node";
done

Replace GRACEFUL_TERMINATION_PERIOD_SECONDS with the required amount of time for graceful termination.

Run the following command to see that the nodes in the existing node pool have SchedulingDisabled status in the node list:
```
kubectl get nodes
```
Additionally, you should see that the Pods are now running on the nodes in the new node pool:
```
kubectl get pods -o=wide
```

Delete the existing node pool if don't need it anymore:

gcloud container node-pools delete default-pool --cluster CLUSTER_NAME

What's next

If you need additional assistance, reach out to Cloud Customer Care.