Cluster upgrades

This page discusses how automatic and manual upgrades work on GKE clusters, including links to more information about related tasks and settings. You can use this information to keep your clusters updated for stability and security with minimal disruptions to your workloads.

How cluster and node pool upgrades work

This section discusses what happens in your cluster during automatic or manual upgrades. For auto-upgrades, Google initiates the auto-upgrade. Google observes automatic and manual upgrades across all GKE clusters, and intervenes if problems are observed.

A cluster is upgraded before its nodes.

Cluster upgrades

This section discusses what to expect when Google auto-upgrades your cluster or you initiate a manual upgrade.

  • zonal and multi-zonal, clusters have only a single control plane (master). During the upgrade, your workloads continue to run, but you cannot deploy new workloads, modify existing workloads, or make other changes to the cluster's configuration until the upgrade is complete.

  • Regional clusters have multiple replicas of the control plane, and only one replica is upgraded at a time, in an undefined order. During the upgrade, the cluster remains highly available, and each control plane replica is unavailable only while it is being upgraded.

If you configure a maintenance window or exclusion, it is honored if possible.

Node pool upgrades

Your cluster and its node pools do not necessarily run the same version of GKE. This section discusses what to expect when Google auto-upgrades your node pool or you initiate a manual node-pool upgrade.

Node pools are upgraded one at a time. Within a node pool, nodes are upgraded one at a time, in an undefined order. You can change the number of nodes upgraded at a time.

If you configure a maintenance window or exclusion, it is honored if possible.

When a node is upgraded, the following things happen:

  1. If Surge upgrades are enabled, GKE creates one or more nodes for workloads to move to. GKE will then select one or more nodes to upgrade.
  2. The nodes are cordoned and drained. At this point, they cannot be scheduled to run new Pods.
  3. Pods on the node are rescheduled onto other nodes. If a Pod can't be rescheduled, that Pod stays in PENDING state until the node is recreated.
  4. The node is deleted, then re-created at the new version.
  5. If the new node fails to register as healthy, auto-upgrade of the entire node pool is disabled.
  6. If a significant number of node auto-upgrades to a given version result in unhealthy nodes across the GKE fleet, upgrades to that version are halted while the problem is investigated.

Upgrading automatically

When you create a cluster using the Google Cloud Console, auto-upgrade is enabled on the cluster and its node pools by default, and Google upgrades your clusters when a new GKE version is selected for auto-upgrade.

When you create a cluster using the gcloud command or the GKE API, node auto-upgrade is currently enabled by default. To disable it manually, see Auto-upgrading nodes.

For more control over when an auto-upgrade can occur (or must not occur), you can configure maintenance windows and exclusions.

A cluster's node pools can be no more than two minor versions behind the control plane version, to maintain compatibility with the cluster API. The node pool version also determines the versions of software packages installed on each node. It is recommended to keep node pools updated to the cluster version.

If you enroll your cluster in a release channel, nodes always run the same version of GKE as the cluster itself, except during a brief period between completing the cluster's control plane upgrade and beginning to upgrade a given node pool.

How versions are selected for auto-upgrade

New GKE versions are released regularly, but a version is not selected for auto-upgrade right away. When a GKE version has accumulated enough cluster usage to prove stability over time, Google selects it as an auto-upgrade target for clusters running a subset of older versions.

New auto-upgrade targets are announced in the release notes. Until an available version is selected for auto-upgrade, you can upgrade to it manually. Occasionally, a version is selected for cluster auto-upgrade and node auto-upgrade during different weeks.

Soon after a new minor version becomes generally available, the oldest available minor version typically becomes unsupported. Clusters running minor versions that become unsupported are automatically upgraded to the next minor version.

Within a minor version (such as v1.14.x), clusters can be automatically upgraded to a new patch release.

Release channels allow you to control your cluster and node pool version based on a version's stability rather than managing the version directly.

Configuring when auto-upgrades can occur

By default, auto-upgrades can occur at any time. Auto-upgrades are minimally disruptive, especially for regional clusters. However, some workloads may require finer-grained control. You can configure maintenance windows and exclusions to manage when auto-upgrades can and must not occur.

Upgrading manually

You can request to manually upgrade your cluster or its node pools to an available and compatible version at any time. Manual upgrades bypass any configured maintenance windows and maintenance exclusions.

When you manually upgrade a cluster, its availability depends on whether the cluster is regional or not:

  • For zonal and multi-zonal clusters, the control plane is unavailable while it is being upgraded. For the most part, workloads run normally but cannot be modified during the upgrade.

  • For regional clusters, one replica of the control plane is unavailable at a time while it is upgraded, but the cluster remains highly available during the upgrade.

You can manually initiate a node upgrade to a version compatible with the control plane.

Changing upgrade settings to balance speed and disruption

You can change how many nodes GKE attempts to upgrade at once by changing the surge upgrade parameters on a node pool. Surge upgrades reduce disruption to your workloads during cluster maintenance and also allow you to control the number of nodes upgraded in parallel. Surge upgrades also work with the Cluster Autoscaler to prevent changes to nodes that are being upgraded.

Surge upgrade behavior is determined by two settings:

max-surge-upgrade

The number of additional nodes that can be added to the node pool during an upgrade. Increasing max-surge-upgrade raises the number of nodes that can be upgraded simultaneously. Default is 1. Can be set to 0 or greater.

max-unavailable-upgrade

The number of nodes that can be simultaneously unavailable during an upgrade. Default is 0. Increasing max-unavailable-upgrade raises the number of nodes that can be upgraded in parallel.

The number of nodes upgraded simultaneously is the sum of max-surge-upgrade and max-unavailable-upgrade. The maximum number of nodes upgraded simultaneously is limited to 20.

For example, a 5-node pool is created with max-surge-upgrade set to 2 and max-unavailable-upgrade set to 1. During a node pool upgrade, GKE creates two upgraded nodes. GKE brings down at most three (the sum of max-surge-upgrade and max-unavailable-upgrade) existing nodes after the upgraded nodes are ready. GKE will only make a maximum of one node unavailable (max-unavailable-upgrade) at a time. During the upgrade process, the node pool will include between four and seven nodes.

You can configure surge upgrade parameters for node pools that use auto-upgrades and manual upgrades.

What's next

Hai trovato utile questa pagina? Facci sapere cosa ne pensi:

Invia feedback per...

Kubernetes Engine Documentation