Node upgrade strategies

Autopilot Standard

This page discusses the node upgrade strategies you can use with your Google Kubernetes Engine (GKE) clusters.

In GKE Standard clusters, you can configure one of the following node upgrade strategies for each node pool:

Surge upgrades: Nodes are upgraded in a rolling window. You can control how many nodes can be upgraded at once and how disruptive upgrades are to the workloads.
Blue-green upgrades: Existing nodes are kept available for rolling back while the workloads are validated on the new node configuration.

GKE chooses the following strategies for these specific scenarios:

In Autopilot clusters, GKE uses surge upgrades. For more information, see the Autopilot cluster upgrades page's Surge upgrades section.
For flex-start nodes, powered by Dynamic Workload Scheduler, GKE uses short-lived upgrades. Flex-start with queued provisioning supports new flags that are part of the flex-start preview launch.

By choosing an upgrade strategy for your Standard cluster node pool, you can pick the process with the right balance of speed, workload disruption, risk mitigation, and cost optimization. To learn more about which node upgrade strategy is right for your environment, see Choose surge upgrades and Choose blue-green upgrades.

With both strategies, you can configure upgrade settings to optimize the process based on your environment's needs. To learn more, see Configure your chosen upgrade strategy. Ensure that for the strategy that you pick, you have enough quota, resource availability, or reservation capacity to upgrade your nodes using that strategy. For more information, see Ensure resources for node upgrades.

Surge upgrades

Surge upgrades are the default upgrade strategy, and best for applications that can handle incremental changes. Surge upgrades use a rolling method to upgrade nodes, in an undefined order. Find the optimal balance of speed and disruption for your environment by choosing how many new, surge nodes can be created, with maxSurge, and how many existing nodes can be disrupted at once, with maxUnavailable.

Surge upgrades also work with the cluster autoscaler to prevent changes to nodes that are being upgraded.

Choose surge upgrades for your environment

If cost optimization is important for you and your workload can tolerate being shut down in less than 60 minutes, we recommend choosing surge upgrades for your node pools.

Surge upgrades are optimal for the following scenarios:

if you want to optimize for the speed of upgrades.
if workloads are more tolerant of disruptions, where graceful termination up to 60 minutes is acceptable.
if you want to control costs by minimizing the creation of new nodes.

When GKE uses surge upgrades

If enabled, GKE uses surge upgrades when the following types of changes occur:

Version changes (upgrades)
Vertically scaling the nodes by changing the node machine attributes, including machine type, disk type, and disk size
Image type changes
IP rotation
Credential rotation
Network policy creation
Enabling image streaming
Network performance configuration updates
Enabling gVNIC
Node system configuration changes
Confidential nodes

Other changes, including applying updates to node labels and taints of existing node pools, don't use surge upgrades as they don't require recreating the nodes.

Understand surge upgrade settings

Use surge upgrade settings to select the appropriate balance between speed and disruption for your node pool during cluster maintenance using the surge settings. You can change how many nodes GKE attempts to upgrade at once by changing the surge upgrade parameters on a Standard node pool.

Surge upgrade behavior is determined by the maxSurge and maxUnavailable settings, which determine how many nodes are upgraded at the same time in a rolling window with the described steps.

`maxSurge`: GKE creates a new surge node before removing an existing one

Set maxSurge to choose the maximum number of additional, surge nodes that can be added to the node pool during an upgrade, per zone, increasing the likelihood that workloads running on the existing node can migrate to a new node immediately. The default is one. To upgrade one node, GKE does the following steps:

Provision a new node.
Wait for the new node to be ready.
Cordon the existing node.
Drain the existing node, respecting PodDisruptionBudget and GracefulTerminationPeriod settings for up to one hour. After one hour, any remaining Pods are forcefully evicted so that the upgrade can proceed.
Delete the existing node.

For GKE to create surge nodes, your project must have the resources to temporarily create additional nodes. If you don't have additional capacity, GKE won't start upgrading a node until the resources are available. To learn more, see Resources for surge upgrades.

`maxUnavailable`: GKE makes an existing node unavailable to recreate it

Set maxUnavailable to choose the maximum number of nodes that can be simultaneously unavailable during an upgrade, per zone. The default is zero. Workloads running on the existing node might need to wait for the existing node to upgrade, if no other nodes have capacity. To upgrade one node, GKE does the following steps:

Cordon the existing node.
Drain the existing node, respecting PodDisruptionBudget and GracefulTerminationPeriod settings for up to one hour. After one hour, any remaining Pods are forcefully evicted so that the upgrade can proceed.
Recreate the existing node with the new configuration.
Wait for the existing node to be ready.
Uncordon the existing, upgraded node.

When GKE recreates the existing node, GKE temporarily releases the capacity of the node if the capacity isn't from a reservation. This means that if there is limited capacity, you risk losing the existing capacity. So, if your environment is resource-constrained, use this setting only if you're using reserved nodes. To learn more, see Upgrade in a resource-constrained environment.

Example use of `maxSurge` and `maxUnavailable` settings

For example, a GKE cluster has a single-zone node pool with 5 nodes and the following surge upgrade configuration: maxSurge=2;maxUnavailable=1.

During a surge upgrade with this node pool, in a rolling window, GKE creates two upgraded nodes, and disrupts at most one existing node at a time. GKE brings down at most three existing nodes after the upgraded nodes are ready. During the upgrade process, the node pool will include between four and seven nodes.

Considerations for surge upgrade settings

Consider the following information before configuring surge upgrade settings:

Nodes created by surge upgrade are subject to your Google Cloud resource quotas, resource availability, and reservation capacity, for node pools with specific reservation affinity. If your environment is resource-constrained, see Upgrade in a resource-constrained environment.
The number of nodes that GKE upgrades simultaneously is the sum of maxSurge and maxUnavailable. The maximum number of nodes upgraded simultaneously is limited to 20. Surge upgrades also work with the cluster autoscaler to prevent changes to nodes that are being upgraded.
GKE upgrades multi-zone node pools one zone at a time. Surge upgrade parameters are applicable only up to the number of nodes in the zone. The maximum number of nodes that can be upgraded in parallel will be no higher than the sum of maxSurge plus maxUnavailable, and no higher than the number of nodes in the zone.
If your node pool uses Spot VMs, GKE creates surge nodes with Spot VMs, but doesn't wait for Spot VMs to be ready before cordoning and draining existing nodes. To learn more, see Upgrade Standard node pools using Spot VMs.

Tune surge upgrade settings to balance speed and disruption

The following table describes four different upgrade profiles as examples to help you understand different configurations:

Description	Configuration	Typical use case
Balanced (Default), slower but least disruptive	`maxSurge=1 maxUnavailable=0`	Most workloads
Fast, no surge resources, most disruptive	`maxSurge=0 maxUnavailable=20`	Large node pools after jobs have to run to completion
Fast, most surge resources and less disruptive	`maxSurge=20 maxUnavailable=0`	Large node pools
Slowest, disruptive, no surge resources	`maxSurge=0 maxUnavailable=1`	Resource-constrained node pool with reservation

Balanced (Default)

The simplest way to take advantage of surge upgrades is to use the default configuration, maxSurge=1;maxUnavailable=0. With this configuration, upgrades progress slowly, with only one surge node added at a time, meaning only one node is upgraded at a time. Pods can restart immediately on the new, surge node. This configuration only requires the resources to temporarily create one new node.

Fast and no surge resources

If you have a large node pool and your workload isn't sensitive to disruption (for example, a batch job that has run to completion), use the following configuration to maximize speed without using any additional resources: maxSurge=0;maxUnavailable=20. This configuration does not bring up additional surge nodes and allows 20 nodes to be upgraded at the same time.

Fast and less disruptive

If your workload is sensitive to disruption and you have already set up PodDisruptionBudgets (PDB) and you are not using externalTrafficPolicy: Local, which does not work with parallel node drains, you can increase the speed of the upgrade by using maxSurge=20;maxUnavailable=0. This configuration upgrades 20 nodes in parallel while the PDB limits the number of Pods that can be drained at a given time. Although the configurations of PDBs may vary, if you create a PDB with maxUnavailable=1 for one or more workloads running on the node pool, then only one Pod of those workloads can be evicted at a time, limiting the parallelism of the entire upgrade. This configuration requires the resources to temporarily create 20 new nodes.

Slow but no surge resources

If you can't use any additional resources, you can use maxSurge=0;maxUnavailable=1 to recreate one node at a time.

Control an in-progress surge upgrade

With surge upgrades, while an upgrade is in progress you can use commands to exercise some control over it. For more control over the upgrade process, we recommend using blue-green upgrades.

Cancel (pause) a surge upgrade

You can cancel an in-progress surge upgrade at any time during the upgrade process. Cancelling pauses the upgrade, stopping GKE from upgrading new nodes, but doesn't automatically roll back the upgrade of the already-upgraded nodes. After you cancel an upgrade, you can either resume or roll back.

When you cancel an upgrade, GKE does the following with each of the nodes:

Nodes that have started the upgrade complete it.
Nodes that have not started the upgrade don't upgrade.
Nodes that have already successfully completed the upgrade are unaffected and are not rolled back.

This means that the node pool might end up in a state where nodes are running two different versions. If automatic upgrades are enabled for the node pool, the node pool can be scheduled for auto-upgrade again, which would upgrade the remaining nodes in the node pool running the older version.

Learn how to cancel a node pool upgrade.

Resume a surge upgrade

If a node pool upgrade was canceled and left partially upgraded, you can resume the upgrade to complete the upgrade process for the node pool. This will upgrade any remaining nodes that had not been upgraded in the original operation. Learn how to resume a node pool upgrade.

Roll back a surge upgrade

If a node pool is left partially upgraded, you can roll back the node pool to revert it to its previous state. You cannot roll back node pools after they have been successfully upgraded. Nodes that have not started an upgrade are unaffected. Learn how to roll back a node pool upgrade.

If you want to downgrade a node pool back to its previous version after the upgrade is already complete, see Downgrading node pools.

Blue-green upgrades

Blue-green upgrades are an alternative upgrade strategy to the default surge upgrade strategy. With blue-green upgrades, GKE first creates a new set of node resources ("green" nodes) with the new node configuration before evicting any workloads on the original resources ("blue" nodes). GKE keeps the "blue" resources, if needed, for rolling back workloads until their soaking time has been met. You can adjust the pace of upgrades and soaking time based on your environment's needs.

With this strategy, you have more control over the upgrade process. You can roll back an in-progress upgrade, if necessary, as the original environment is maintained during the upgrade. This upgrade strategy, however, is also more resource intensive. As the original environment is replicated, the node pool uses double the number of resources during the upgrade.

Choose blue-green upgrades for your environment

If you have highly-available production workloads that you need to be able to roll back quickly in case the workload does not tolerate the upgrade, and a temporary cost increase is acceptable, we recommend choosing blue-green upgrades for your node pools.

Blue-green upgrades are optimal for the following scenarios:

if you want a gradual rollout where risk mitigation is most important, where graceful termination greater than 60 minutes is needed.
if your workloads are less tolerant of disruptions.
if a temporary cost increase due to higher resource usage is acceptable.

When GKE uses blue-green upgrades

For GKE nodes, there are different types of configuration changes that require the nodes to be recreated. If enabled, GKE uses blue-green upgrades when the following types of changes occur:

Version changes (upgrades)
Vertically scaling the nodes by changing the node machine attributes, including machine type, disk type, and disk size
Image type changes
Add or replace storage pools in a node pool

Surge upgrades will be used for any other features requiring the nodes to be recreated. To learn more, see When surge upgrades are used.

Phases of blue-green upgrades

With blue-green upgrades, you can customize and control the process by:

using the upgrade configuration parameters.
using commands to cancel (pause), resume, roll back, or complete the steps.

This section explains the phases of the upgrade process. You can use upgrade settings to tune how the phases work, and commands to control the upgrade process.

Phase 1: Create green pool

In this phase, a new set of managed instance groups (MIGs)—known as the "green" pool—are created for each zone under the target pool with the new node configuration (new version or image type).

Quota will be checked before starting provisioning new green resources.

In this phase, the original MIGs—known as the blue pool—cluster autoscaler will stop scaling up or down. The green pool can only scale up in this phase.

In this phase, you can cancel the upgrade if necessary. When you cancel a blue-green upgrade, the upgrade is paused in its current phase. After you've canceled it, you can either resume it or roll back. At this phase, rolling back will delete the green pool.

Phase 2: Cordon blue pool

In this phase, all the original nodes in the blue pool (existing MIGs) will be cordoned (marked as unschedulable). Existing workloads will keep running, but new workloads won't be scheduled on the existing nodes.

Phase 3: Drain blue pool

In this phase, the original nodes in the blue pool (existing MIGs) will be drained in batches. When Kubernetes drains a nodes, eviction requests are sent to all the Pods running on the node. The Pods will be rescheduled. For Pods that have PodDisruptionBudget violations or long terminationGracePeriodSeconds during the draining, they will be deleted in the Delete blue pool phase when the node is deleted. You can use BATCH_SOAK_DURATION and NODE_POOL_SOAK_DURATION, which are described here and in the next section, to extend the period before Pods are deleted.

You can control the size of the batches with either of the following settings:

BATCH_NODE_COUNT: the absolute number of nodes to drain in a batch.
BATCH_PERCENT: the percentage of nodes to drain in a batch, expressed as a decimal between 0 and 1, inclusive. GKE rounds down to the nearest percentage of nodes, to a minimum value of 1 node, if the percentage isn't a whole number of nodes.

If either of these settings are set to zero GKE skips this phase and proceeds to the Soak node pool phase.

Additionally, you can control how long each batch drain soaks with BATCH_SOAK_DURATION. This duration is defined in seconds, with the default being zero seconds.

In this phase, you can still cancel the upgrade if necessary. When you cancel a blue-green upgrade, the upgrade is paused in its current phase. After you've canceled it, you can either resume it or roll back. At this phase, rolling back will stop the draining of the blue pool, and un-cordon the blue pool. Workloads can then be rescheduled on the blue pool (not guaranteed), and the green pool will be deleted.

Phase 4: Soak node pool

This phase is used for you to verify the workload's health after the blue pool nodes have been drained.

The soak time is set with NODE_POOL_SOAK_DURATION, in seconds. By default, it is set to one hour (3600 seconds). If the total soak duration reaches 7 days (604,800 seconds), the Delete blue pool phase begins immediately.

The total soak duration is the sum of NODE_POOL_SOAK_DURATION, plus BATCH_SOAK_DURATION multiplied by the number of batches, which is determined by either BATCH_NODE_COUNT or BATCH_PERCENT.

In this phase, you can finish the upgrade and skip any remaining soak time by completing the upgrade. This will immediately begin the process of removing the blue pool nodes.

You can still cancel the upgrade if necessary. When you cancel a blue-green upgrade, the upgrade is paused in its current phase. After you've canceled it, you can either resume it or roll back.

In this phase, cluster autoscaler can now scale up or down the green pool as normal.

Phase 5: Delete blue pool

After the expiration of the soaking time, the blue pool nodes will be removed from the target pool. This phase cannot be paused. Also, this phase does not use eviction and instead attempts to delete the Pods. Unlike eviction, deletion doesn't respect PDBs and forcibly deletes the Pods. The deletion caps a Pod's terminationGracePeriodSeconds to no more than 60 minutes. After this final attempt is made to delete the remaining Pods, the blue pool nodes are deleted from the node pool.

At the completion of this phase, your node pool will have only new nodes with the updated configuration (version or image type).

How cluster autoscaler works with blue-green upgrades

During the phases of a blue-green upgrade, the original "blue" pool does not scale up or down. When the new "green" pool is created, it can only be scaled up until the Soak node pool phase, where it can scale up or down. If an upgrade is rolled back, the original "blue" pool might scale up during this process if additional capacity is needed.

Control an in-progress blue-green upgrade

With blue-green upgrades, while an upgrade is in progress you can use commands to exercise control over it. This gives you a high level of control over the process in case you determine, for instance, that your workloads need to be rolled back to the old node configuration.

Cancel (pause) a blue-green upgrade

When you cancel a blue-green upgrade, you pause the upgrade in its current phase. This command can be used at all phases except the Delete blue pool phase. When cancelled, the node pool will be paused at an intermediate status based on the phase where the request was issued.

Learn how to cancel a node pool upgrade.

After an upgrade is canceled, you can choose one of two paths forward: resume or roll back.

Resume a blue-green upgrade

If you have determined the upgrade is okay to move forward, you can resume it.

If you resume, the upgrade process will continue at the intermediate phase it was paused. To learn how to resume a node pool upgrade, see Resume a node pool upgrade.

Roll back a blue-green upgrade

If you have determined that the upgrade shouldn't move forward and you want to bring the node pool back to its original state, you can roll back. To learn how to roll back a node pool upgrade, see roll back a node pool upgrade.

With the roll back workflow, the process reverses itself to bring the node pool back to its original state. The blue pool will be un-cordoned so that workloads may be rescheduled on it. During this process, cluster autoscaler may scale up the blue pool as needed. The green pool will be drained and deleted.

If you want to downgrade a node pool back to its previous version after the upgrade is already complete, see Downgrading node pools.

Complete a blue-green upgrade

During the Soak phase, you can complete an upgrade if you have determined that the workload does not need further validation on the new node configuration and the old nodes can be removed. Completing an upgrade skips the rest of the Soak phase and proceeds to the Delete blue pool phase.

To learn more about how to use the complete command, see Complete a blue-green node pool upgrade.

Short-lived upgrades (flex-start and queued provisioning only)

Short-lived upgrades are a node upgrade strategy exclusively for use with flex-start nodes and nodes which use queued provisioning (with 1.32.2-gke.1652000 or later), both of which are powered by Dynamic Workload Scheduler. To learn more about the nodes that use short-lived upgrades, see About GPU obtainability with Dynamic Workload Scheduler.

GKE uses the short-lived upgrades strategy for Standard node pools and groups of nodes in Autopilot clusters.

With this strategy, GKE upgrades these limited runtime nodes without disrupting existing workloads. The strategy works in the following way:

Existing nodes run until they're preempted.
New nodes use the new node configuration.
Over a maximum of seven days, the nodes transition from running the existing configuration to running the new configuration.

GKE automatically configures this strategy for flex-start nodes. This strategy has no configuration settings.

When GKE uses short-lived upgrades

GKE automatically sets flex-start nodes to use short-lived upgrades. Nodes which only use queued provisioning, but run on clusters on GKE version 1.32.2-gke.1652000 or later, also use short-lived upgrades.

For Standard node pools and groups of nodes in Autopilot clusters that use short-lived upgrades, GKE uses this strategy in situations where surge upgrades would otherwise be used. In addition to node upgrades (version changes), GKE uses short-lived upgrades for other types of node updates, similar to how surge upgrades are used. For more information, see When surge upgrades are used.

What's next

Configuring node upgrade strategies

Node upgrade strategies Stay organized with collections Save and categorize content based on your preferences.

Surge upgrades

Choose surge upgrades for your environment

When GKE uses surge upgrades

Understand surge upgrade settings

maxSurge: GKE creates a new surge node before removing an existing one

maxUnavailable: GKE makes an existing node unavailable to recreate it

Example use of maxSurge and maxUnavailable settings

Considerations for surge upgrade settings

Tune surge upgrade settings to balance speed and disruption

Balanced (Default)

Fast and no surge resources

Fast and less disruptive

Slow but no surge resources

Control an in-progress surge upgrade

Cancel (pause) a surge upgrade

Resume a surge upgrade

Roll back a surge upgrade

Blue-green upgrades

Choose blue-green upgrades for your environment

When GKE uses blue-green upgrades

Phases of blue-green upgrades

Phase 1: Create green pool

Phase 2: Cordon blue pool

Phase 3: Drain blue pool

Phase 4: Soak node pool

Phase 5: Delete blue pool

How cluster autoscaler works with blue-green upgrades

Control an in-progress blue-green upgrade

Cancel (pause) a blue-green upgrade

Resume a blue-green upgrade

Roll back a blue-green upgrade

Complete a blue-green upgrade

Short-lived upgrades (flex-start and queued provisioning only)

When GKE uses short-lived upgrades

What's next

Node upgrade strategies

`maxSurge`: GKE creates a new surge node before removing an existing one

`maxUnavailable`: GKE makes an existing node unavailable to recreate it

Example use of `maxSurge` and `maxUnavailable` settings