Manually upgrading a cluster or node pool


By default, automatic upgrades are enabled for Google Kubernetes Engine (GKE) clusters and node pools.

This page explains how to manually request an upgrade or downgrade for a GKE cluster or its nodes. You can learn more about how automatic and manual cluster upgrades work. You can also control when auto-upgrades can and cannot occur by configuring maintenance windows and exclusions.

New versions of GKE are announced regularly. To learn about about available versions, see Versioning. To learn more about clusters, see Cluster architecture. For guidance on upgrading clusters, see Best practices for upgrading clusters.

Before you begin

Before you start, make sure you have performed the following tasks:

Set up default gcloud settings using one of the following methods:

  • Using gcloud init, if you want to be walked through setting defaults.
  • Using gcloud config, to individually set your project ID, zone, and region.

Using gcloud init

If you receive the error One of [--zone, --region] must be supplied: Please specify location, complete this section.

  1. Run gcloud init and follow the directions:

    gcloud init

    If you are using SSH on a remote server, use the --console-only flag to prevent the command from launching a browser:

    gcloud init --console-only
  2. Follow the instructions to authorize gcloud to use your Google Cloud account.
  3. Create a new configuration or select an existing one.
  4. Choose a Google Cloud project.
  5. Choose a default Compute Engine zone for zonal clusters or a region for regional or Autopilot clusters.

Using gcloud config

  • Set your default project ID:
    gcloud config set project PROJECT_ID
  • If you are working with zonal clusters, set your default compute zone:
    gcloud config set compute/zone COMPUTE_ZONE
  • If you are working with Autopilot or regional clusters, set your default compute region:
    gcloud config set compute/region COMPUTE_REGION
  • Update gcloud to the latest version:
    gcloud components update

Save your data to persistent disks

Before upgrading a node pool, you must ensure that any data you wish to keep is stored in a Pod using persistent volumes which use persistent disks. Persistent disks are unmounted, rather than erased, during upgrades, and their data is "handed off" between Pods.

The following restrictions pertain to persistent disks:

  • The nodes on which Pods are running must be Compute Engine VMs
  • Those VMs need to be in the same Compute Engine project and zone as the persistent disk

To learn how to add a persistent disk to an existing node instance, see Adding or resizing zonal persistent disks in the Compute Engine documentation.

About upgrading

A cluster's control plane and nodes are upgraded separately.

Cluster control planes are always upgraded on a regular basis, regardless of whether your cluster is enrolled in a release channel or not.

For proactive notice on upgrades, see Receiving cluster upgrade notifications.

Limitations

Alpha clusters cannot be upgraded.

Supported versions

The release notes announce when new versions become available and when older versions are no longer available. At any time, you can list all supported cluster and node versions using this command:

gcloud container get-server-config

Downgrading limitations

Downgrading a cluster control plane is not recommended. You cannot downgrade a cluster control plane from one minor version to another. For example, if your control plane is running GKE version 1.17.17, you cannot downgrade to 1.16.15. If you attempt to do this, you will see an error similar to the following:

ERROR: (gcloud.container.clusters.upgrade) ResponseError: code=400,
message=Master cannot be upgraded to "1.16.15-gke.6000": specified version is not
newer than the current version.

Downgrading Kubernetes minor versions or patches is not recommended, and should be used only as mitigation for an unsuccessful upgrade.

To mitigate an unsuccessful cluster control plane upgrade, you can downgrade your cluster to a previous patch version if the patch version is the same minor version as your cluster. For example, if your cluster is running GKE 1.17.17, you can downgrade to 1.17.16, if that version is still available.

To mitigate an unsuccessful node pool upgrade, you can downgrade a node pool to a patch version older than the cluster control plane version. Nodes cannot be more than two minor versions behind the cluster control plane version.

Upgrading the cluster

Google upgrades clusters and nodes automatically. For more control over which auto-upgrades your cluster and its nodes receive, you can enroll it in a release channel.

To learn more about managing your cluster's GKE version, see Upgrades.

You can initiate a manual upgrade any time after a new version becomes available.

Manually upgrading the control plane

When initiating a cluster upgrade, you can't modify the cluster's configuration for several minutes, until the control plane is accessible again. If you need to prevent downtime during control plane upgrades, consider using a regional cluster.

You can manually upgrade your cluster using the Cloud Console or the gcloud command-line tool. After upgrading your cluster, you can upgrade its nodes. By default, nodes created using the Google Cloud Console have auto-upgrade enabled, so this happens automatically.

gcloud

To see the available versions for your cluster's control plane, run the following command:

gcloud container get-server-config

To upgrade to the default cluster version, run the following command:

gcloud container clusters upgrade CLUSTER_NAME --master

To upgrade to a specific version that is not the default, specify the --cluster-version flag as in the following command:

gcloud container clusters upgrade CLUSTER_NAME --master \
    --cluster-version VERSION

Replace VERSION with the version that you want to upgrade your cluster to. You can use a specific version, such as 1.18.17-gke.100 or you can use a version alias, like latest. For more information, see Specifying cluster version.

Console

To manually update your cluster control plane, perform the following steps:

  1. Go to the Google Kubernetes Engine page in Cloud Console.

    Go to Google Kubernetes Engine

  2. Click the desired cluster name.

  3. Under Cluster basics, click Upgrade Available next to Version.

  4. Select the desired version, then click Save Changes.

Downgrading clusters

To downgrade a cluster to a previous patch version, change the cluster control plane version with the following command:

gcloud container clusters upgrade CLUSTER_NAME \
  --master --cluster-version VERSION

Disabling cluster auto-upgrades

Infrastructure security is high priority for GKE, and as such control planes are upgraded on a regular basis, and cannot be disabled. However, you can apply maintenance windows and exclusions to temporarily suspend upgrades for control planes and nodes.

Although it is not recommended, you can disable node auto-upgrade.

Upgrading node pools

By default, a cluster's nodes have auto-upgrade enabled, and it is recommended that you do not disable it.

When a node pool is upgraded, you can configure surge upgrade settings to control how many nodes GKE upgrades at once as well as how disruptive the upgrade is to workloads. By default, GKE upgrades one node at a time.

While a node is being upgraded, GKE stops scheduling new Pods onto it, and attempts to schedule its running Pods onto other nodes. This is similar to other events that re-create the node, such as enabling or disabling a feature on the node pool.

The upgrade is only complete when all nodes have been recreated and the cluster is in the desired state. When a newly-upgraded node registers with the control plane, GKE marks the node as schedulable.

New node instances run the desired Kubernetes version as well as:

Manually upgrade a node pool

You can manually upgrade a node pool version to match the version of the control plane or to a previous version that is still available and is compatible with the control plane. The Kubernetes version and version skew support policy guarantees that control planes are compatible with nodes up to two minor versions older than the control plane. For example, Kubernetes 1.13 control planes are compatible with Kubernetes 1.11 nodes.

When you manually upgrade a node pool, GKE removes any labels you added to individual nodes using kubectl. To avoid this, apply labels to node pools (Preview) instead.

You can manually upgrade your node pools to a version compatible with the control plane, using the Google Cloud Console or the gcloud command-line tool.

gcloud

To update all nodes to the same version as the control plane, run the gcloud container clusters upgrade command:

gcloud container clusters upgrade CLUSTER_NAME

Replace CLUSTER_NAME with the name of the cluster to be upgraded.

To update a specific node pool, specify the --node-pool flag:

gcloud container clusters upgrade CLUSTER_NAME \
  --node-pool=NODE_POOL_NAME

To specify a different version of GKE on nodes, use the optional --cluster-version flag:

gcloud container clusters upgrade CLUSTER_NAME \
  --node-pool=NODE_POOL_NAME \
  --cluster-version VERSION

Replace VERSION with the Kubernetes version to which the nodes are upgraded. For example, --cluster-version=1.7.2 or cluster-version=latest.

For more information about specifying versions, see Versioning.

For more information, refer to the gcloud container clusters upgrade documentation.

Console

To upgrade a node pool using Cloud Console, perform the following steps:

To upgrade a node pool, perform the following steps:

  1. Go to the Google Kubernetes Engine page in Cloud Console.

    Go to Google Kubernetes Engine

  2. Next to the cluster you want to edit, click Actions, then click Edit.

  3. On the Cluster details page, click the Nodes tab.

  4. In the Node Pools section, click the name of the node pool that you want to upgrade.

  5. Click Edit.

  6. Click Change under Node version.

  7. Select the desired version from the Node version drop-down list, then click Change.

Downgrading node pools

Checking node pool upgrade status

You can check the status of an upgrade using gcloud beta container operations.

To see a list of every running and completed operation in the cluster, run the following command:

gcloud beta container operations list

Each operation is assigned an operation ID and an operation type as well as start and end times, target cluster, and status. The list appears similar to the following example:

NAME                              TYPE                ZONE           TARGET              STATUS_MESSAGE  STATUS  START_TIME                      END_TIME
operation-1505407677851-8039e369  CREATE_CLUSTER      us-west1-a     my-cluster                          DONE    20xx-xx-xxT16:47:57.851933021Z  20xx-xx-xxT16:50:52.898305883Z
operation-1505500805136-e7c64af4  UPGRADE_CLUSTER     us-west1-a     my-cluster                          DONE    20xx-xx-xxT18:40:05.136739989Z  20xx-xx-xxT18:41:09.321483832Z
operation-1505500913918-5802c989  DELETE_CLUSTER      us-west1-a     my-cluster                          DONE    20xx-xx-xxT18:41:53.918825764Z  20xx-xx-xxT18:43:48.639506814Z

To get more information about a specific operation, specify the operation ID in the following command:

gcloud beta container operations describe OPERATION_ID

For example:

gcloud beta container operations describe operation-1507325726639-981f0ed6
endTime: '20xx-xx-xxT21:40:05.324124385Z'
name: operation-1507325726639-981f0ed6
operationType: UPGRADE_CLUSTER
selfLink: https://container.googleapis.com/v1/projects/.../kubernetes-engine/docs/zones/us-central1-a/operations/operation-1507325726639-981f0ed6
startTime: '20xx-xx-xxT21:35:26.639453776Z'
status: DONE
targetLink: https://container.googleapis.com/v1/projects/.../kubernetes-engine/docs/zones/us-central1-a/clusters/...
zone: us-central1-a

Canceling a node pool upgrade

You can cancel an upgrade at any time. When you cancel an upgrade:

  • Nodes that have started the upgrade complete it.
  • Nodes that have not started the upgrade do not upgrade.
  • Nodes that have already successfully completed the upgrade are unaffected and are not rolled back.
  1. Get the upgrade's operation ID using the following command:

    gcloud container operations list
    
  2. Run the following command to cancel the upgrade:

    gcloud beta container operations cancel OPERATION_ID
    

Refer to the gcloud container operations cancel documentation.

Rolling back a node pool upgrade

You can roll back node pools that failed to upgrade, or whose upgrades were canceled, to their previous version of Kubernetes. You cannot roll back node pools once they have been successfully upgraded. Nodes that have not started an upgrade are unaffected.

To roll back an upgrade, run the following command:

gcloud container node-pools rollback NODE_POOL_NAME \
  --cluster CLUSTER_NAME

Replace the following:

  • NODE_POOL_NAME: the name of the node pool to roll back.
  • CLUSTER_NAME: the name of the cluster from which to roll back the node pool.

Refer to the gcloud container node-pools rollback documentation.

Changing surge upgrade parameters

Surge Upgrades allow you to change the number of nodes GKE upgrades at one time and the amount of disruption an upgrade makes on your workloads.

The max-surge-upgrade and max-unavailable-upgrade flags are defined for each node pool. For more information on chosing the right parameters, go to Determining your optimal surge configuration.

You can change these settings when creating or updating a cluster or node pool.

The following variables are used in the commands mentioned below:

  • CLUSTER_NAME: the name of the cluster for the node pool.
  • COMPUTE_ZONE: the zone for the cluster.
  • NODE_POOL_NAME: the name of the node pool.
  • NUMBER_NODES: the number of nodes in the node pool in each of the cluster's zones.
  • SURGE_NODES: the number of extra (surge) nodes to be created on each upgrade of the node pool.
  • UNAVAILABLE_NODES: the number of nodes that can be unavailable at the same time on each upgrade of the node pool.

Creating a cluster with specific surge parameters

To create a cluster with specific settings for surge upgrades, use the max-surge-upgrade and max-unavailable-upgrade flags.

gcloud container clusters create CLUSTER_NAME \
  --max-surge-upgrade=SURGE_NODES --max-unavailable-upgrade=UNAVAILABLE_NODES

Creating a cluster with surge upgrade disabled

To create a cluster without surge upgrades, set the value for the max-surge-upgrade flag to 0.

gcloud container clusters create CLUSTER_NAME \
  --max-surge-upgrade=0 --max-unavailable-upgrade=1

Creating a node pool with specific surge parameters

To create a node pool in an existing cluster with specific settings for surge upgrades, use the max-surge-upgrade and max-unavailable-upgrade flags.

gcloud container node-pools create NODE_POOL_NAME \
  --num-nodes=NUMBER_NODES --cluster=CLUSTER_NAME \
  --max-surge-upgrade=SURGE_NODES --max-unavailable-upgrade=UNAVAILABLE_NODES

Turn on or turn off Surge Upgrade for an existing node pool

To update the upgrade settings of an existing node pool, use the max-surge-upgrade and max-unavailable-upgrade flags. If you set max-surge-upgrade to greater than 0, GKE creates surge nodes. If you set max-surge-upgrade to 0, GKE doesn't create surge nodes.

gcloud beta container node-pools update NODE_POOL_NAME \
  --cluster=CLUSTER_NAME \
  --max-surge-upgrade=SURGE_NODES --max-unavailable-upgrade=UNAVAILABLE_NODES

Checking if surge upgrades are enabled on a node pool

To see if surge upgrades are enabled on a node pool, use gcloud to describe the cluster's parameters:

gcloud container node-pools describe NODE_POOL_NAME \
--cluster=CLUSTER_NAME

Known issues

If you have PodDisruptionBudget objects configured that are unable to allow any additional disruptions, node upgrades might fail to upgrade to the control plane version after repeated attempts. To prevent this failure, we recommend that you scale up the Deployment or HorizontalPodAutoscaler to allow the node to drain while still respecting the PodDisruptionBudget configuration.

To see all PodDisruptionBudget objects that do not allow any disruptions:

kubectl get poddisruptionbudget --all-namespaces -o jsonpath='{range .items[?(@.status.disruptionsAllowed==0)]}{.metadata.name}/{.metadata.namespace}{"\n"}{end}'

Although automatic upgrades might encounter the issue, the automatic upgrade process forces the nodes to upgrade. However, the upgrade takes an extra hour for every node in the istio-system namespace that violates the PodDisruptionBudget.

Troubleshooting

Nodes CPU usage higher than expected

You might encounter an issue where some nodes are using higher CPU usage than is expected from the running Pods.

This can occur if your cluster or nodes are not running a supported version. Review the release notes to ensure the versions you are using are available and supported. You can also run the following command to list all supported cluster and node versions:

gcloud container get-server-config

What's next