Verifying node upgrades and quota

Overview

All nodes created are subject to the resource quota of your project. Any node pool created with specific reservation affinity is subject to the capacity of the reservation over the node pool's entire lifetime.

Because surge upgrades create extra VMs before draining and deleting old nodes, upgrades can fail if your project does not have enough resource quota or reservation ("quota" will refer to both in this article).

If the available quota is less than the number of nodes specified in maxSurge, then the number of parallel upgrades is less than maxSurge + maxUnavailable. If you don't have sufficient quota available to create a single node and maxUnavailable is set to 0, the upgrade fails.

The following table describes examples to demonstrate different upgrade behaviors:

Upgrade Settings Number of additional nodes allowed by quota Result
maxSurge: 5 maxUnavailable: 0 5 Upgrades 5 nodes in parallel.
maxSurge: 5 maxUnavailable: 0 2 Upgrades only 2 nodes in parallel.
maxSurge: 5 maxUnavailable: 0 0 Upgrade fails, since it's not possible to bring up additional nodes and restarting existing ones is prohibited by the upgrade settings.
maxSurge: 5 maxUnavailable: 1 5 Upgrades 6 nodes in parallel, while ensuring the node pool temporarily loses only one node due to upgrade.
maxSurge: 5 maxUnavailable: 1 2 Upgrades only 3 nodes in parallel, while ensuring the node pool is no more than one node short due to upgrade.
maxSurge: 5 maxUnavailable: 1 0 Upgrades only 1 node at a time by recreating each node on a rolling upgrade fashion.

Checking quota

You can find more information about your upgrade operations as well as information about if and why any upgrades failed by checking your upgrade operation objects. To list the upgrade operation objects, run the following command:

gcloud container operations list --filter="STATUS=DONE AND TYPE=UPGRADE_NODES AND targetLink:https://container.googleapis.com/v1/projects/[PROJECT_ID]/zones/[ZONE]/clusters/[CLUSTER_NAME]"

If your most recent upgrade failed due to insufficient resource quota, the output is similar to this:

gcloud container operations describe operation-1234567891234-1abc2d3e
detail: "Insufficient quota to satisfy the request: waiting on IG: instance https://www.googleapis.com/compute/v1/projects/my-project-123/zones/us-central1-a/instances/gke-my-cluster-default-pool-123ab45c-de67\
  \ is still CREATING. Last attempt errors: [QUOTA_EXCEEDED] Instance 'gke-my-cluster-default-pool-123ab45c-de67'\
  \ creation failed: Quota 'IN_USE_ADDRESSES' exceeded.  Limit: 50.0 in region us-central1.\
...

If the reason was due to insufficient reservation, the output would be:

gcloud container operations describe operation-1234567891234-1abc2d3e
detail: "Reservation does not have enough resources for the request: waiting on IG:\
  \ instance https://www.googleapis.com/compute/v1/projects/my-project-123/zones/us-central1-a/instances/gke-my-cluster-default-pool-123ab45c-de67\
  \ is still CREATING. Last attempt error: [CONDITION_NOT_MET] Instance 'gke-my-cluster-default-pool-123ab45c-de67'\
  \ creation failed: Specified reservation 'foo' does not have available resources\
  \ for the request."

Resolving upgrade errors

If your upgrade failed due to quota, you have 3 options:

  1. Check if there are any Compute Engine resources in your project that are consuming the quota and are no longer needed. If you find any, remove them and retry the upgrade.
  2. If step 1 did not unblock your upgrade, request a quota increase or increase the size of the specific reservation.
  3. If because of any reason increasing the quota is not an option, change maxUnavailable to 1 to unblock upgrades. This option should be used only as a last resort since it's a best practice to keep maxUnavailable = 0 to minimize disruption caused by upgrades.

What's next