This page discusses the node upgrade strategies you can use with your Google Kubernetes Engine (GKE) clusters.
In GKE Standard clusters, you can configure one of the following node upgrade strategies for each node pool:
- Surge upgrades: Nodes are upgraded in a rolling window. You can control how many nodes can be upgraded at once and how disruptive upgrades are to the workloads.
- Blue-green upgrades: Existing nodes are kept available for rolling back while the workloads are validated on the new node configuration.
In Autopilot clusters, GKE uses surge upgrades. To learn more, see the Autopilot cluster upgrades page's Surge upgrades section.
By choosing an upgrade strategy for your Standard cluster node pool, you can pick the process with the right balance of speed, workload disruption, risk mitigation, and cost optimization. To learn more about which node upgrade strategy is right for your environment, see Choose surge upgrades and Choose blue-green upgrades.
With both strategies, you can configure upgrade settings to optimize the process based on your environment's needs. To learn more, see Configure your chosen upgrade strategy. Ensure that for the strategy that you pick, you have enough quota, resource availability, or reservation capacity to upgrade your nodes using that strategy. For more information, see Ensure resources for node upgrades.
Surge upgrades
Surge upgrades are the default upgrade strategy, and best for applications that
can handle incremental changes. Surge upgrades use a rolling method to upgrade
nodes, in an undefined order. Find the optimal balance of speed and disruption
for your environment by choosing how many new, surge nodes can be created, with
maxSurge
, and how many existing nodes can be disrupted at once, with
maxUnavailable
.
Surge upgrades also work with the cluster autoscaler to prevent changes to nodes that are being upgraded.
Choose surge upgrades for your environment
If cost optimization is important for you and your workload can tolerate being shut down in less than 60 minutes, we recommend choosing surge upgrades for your node pools.
Surge upgrades are optimal for the following scenarios:
- if you want to optimize for the speed of upgrades.
- if workloads are more tolerant of disruptions, where graceful termination up to 60 minutes is acceptable.
- if you want to control costs by minimizing the creation of new nodes.
When GKE uses surge upgrades
If enabled, GKE uses surge upgrades when the following types of changes occur:
- Version changes (upgrades)
- Vertically scaling the nodes by changing the node machine attributes, including machine type, disk type, and disk size
- Image type changes
- IP rotation
- Credential rotation
- Network policy creation
- Enabling image streaming
- Network performance configuration updates
- Enabling gVNIC
- Node system configuration changes
- Confidential nodes
Other changes, including applying updates to node labels and taints of existing node pools, don't use surge upgrades as they don't require recreating the nodes.
Understand surge upgrade settings
Use surge upgrade settings to select the appropriate balance between speed and disruption for your node pool during cluster maintenance using the surge settings. You can change how many nodes GKE attempts to upgrade at once by changing the surge upgrade parameters on a Standard node pool.
Surge upgrade behavior is determined by the maxSurge
and maxUnavailable
settings, which determine how many nodes are upgraded at the same time in a
rolling window with the described steps.
maxSurge
: GKE creates a new surge node before removing an existing one
Set maxSurge
to choose the maximum number of additional, surge nodes that can
be added to the node pool during an upgrade, per zone, increasing the likelihood
that workloads running on the existing node can migrate to a new node
immediately. The default is one. To upgrade one node, GKE does the
following steps:
- Provision a new node.
- Wait for the new node to be ready.
- Cordon the existing node.
- Drain the existing node, respecting PodDisruptionBudget and GracefulTerminationPeriod settings for up to one hour.
- Delete the existing node.
For GKE to create surge nodes, your project must have the resources to temporarily create additional nodes. If you don't have additional capacity, GKE won't start upgrading a node until the resources are available. To learn more, see Resources for surge upgrades.
maxUnavailable
: GKE makes an existing node unavailable to recreate it
Set maxUnavailable
to choose the maximum number of nodes that can be
simultaneously unavailable during an upgrade, per zone. The default is zero.
Workloads running on the existing node might need to wait for the existing node
to upgrade, if no other nodes have capacity. To upgrade one node,
GKE does the following steps:
- Cordon the existing node.
- Drain the existing node, respecting PodDisruptionBudget and GracefulTerminationPeriod settings for up to one hour.
- Recreate the existing node with the new configuration.
- Wait for the existing node to be ready.
- Uncordon the existing, upgraded node.
When GKE recreates the existing node, GKE temporarily releases the capacity of the node if the capacity isn't from a reservation. This means that if there is limited capacity, you risk losing the existing capacity. So, if your environment is resource-constrained, use this setting only if you're using reserved nodes. To learn more, see Upgrade in a resource-constrained environment.
Example use of maxSurge
and maxUnavailable
settings
For example, a GKE cluster has a single-zone node pool with 5
nodes and the following surge upgrade configuration:
maxSurge=2;maxUnavailable=1
.
During a surge upgrade with this node pool, in a rolling window, GKE creates two upgraded nodes, and disrupts at most one existing node at a time. GKE brings down at most three existing nodes after the upgraded nodes are ready. During the upgrade process, the node pool will include between four and seven nodes.
Considerations for surge upgrade settings
Consider the following information before configuring surge upgrade settings:
- Nodes created by surge upgrade are subject to your Google Cloud resource quotas, resource availability, and reservation capacity, for node pools with specific reservation affinity. If your environment is resource-constrained, see Upgrade in a resource-constrained environment.
- The number of nodes that GKE upgrades simultaneously is the
sum of
maxSurge
andmaxUnavailable
. The maximum number of nodes upgraded simultaneously is limited to 20. Surge upgrades also work with the cluster autoscaler to prevent changes to nodes that are being upgraded. - GKE upgrades multi-zone node
pools one zone
at a time. Surge upgrade parameters are applicable only up to the number of
nodes in the zone. The maximum number of nodes that can be upgraded in
parallel will be no higher than the sum of
maxSurge
plusmaxUnavailable
, and no higher than the number of nodes in the zone. - If your node pool uses Spot VMs, GKE creates surge nodes with Spot VMs, but doesn't wait for Spot VMs to be ready before cordoning and draining existing nodes. To learn more, see Upgrade Standard node pools using Spot VMs.
Tune surge upgrade settings to balance speed and disruption
The following table describes four different upgrade profiles as examples to help you understand different configurations:
Description | Configuration | Typical use case |
---|---|---|
Balanced (Default), slower but least disruptive | maxSurge=1 maxUnavailable=0 |
Most workloads |
Fast, no surge resources, most disruptive | maxSurge=0 maxUnavailable=20 |
Large node pools after jobs have to run to completion |
Fast, most surge resources and less disruptive | maxSurge=20 maxUnavailable=0 |
Large node pools |
Slowest, disruptive, no surge resources | maxSurge=0 maxUnavailable=1 |
Resource-constrained node pool with reservation |
Balanced (Default)
The simplest way to take advantage of surge upgrades is to use the default
configuration, maxSurge=1;maxUnavailable=0.
With this configuration, upgrades
progress slowly, with only one surge node added at a time, meaning only
one node is upgraded at a time. Pods can restart immediately on the new, surge
node. This configuration only requires the resources to temporarily create one
new node.
Fast and no surge resources
If you have a large node pool and your workload isn't sensitive to disruption
(for example, a batch job that has run to completion), use the following
configuration to maximize speed without using any additional resources:
maxSurge=0;maxUnavailable=20
. This configuration does not bring up additional
surge nodes and allows 20 nodes to be upgraded at the same time.
Fast and less disruptive
If your workload is sensitive to disruption and you have already set up
PodDisruptionBudgets
(PDB) and you are not using externalTrafficPolicy: Local
, which does not work
with parallel node drains, you can increase the speed of the upgrade by using
maxSurge=20;maxUnavailable=0
. This configuration upgrades 20 nodes in parallel
while the PDB limits the number of Pods that can be drained at a given time.
Although the configurations of PDBs may vary, if you create a PDB with
maxUnavailable=1
for one or more workloads running on the node pool, then
only one Pod of those workloads can be evicted at a time, limiting the
parallelism of the entire upgrade. This configuration requires the resources to
temporarily create 20 new nodes.
Slow but no surge resources
If you can't use any additional resources, you can use
maxSurge=0;maxUnavailable=1
to recreate one node at a time.
Control an in-progress surge upgrade
With surge upgrades, while an upgrade is in progress you can use commands to exercise some control over it. For more control over the upgrade process, we recommend using blue-green upgrades.
Cancel (pause) a surge upgrade
You can cancel an in-progress surge upgrade at any time during the upgrade process. Cancelling pauses the upgrade, stopping GKE from upgrading new nodes, but doesn't automatically roll back the upgrade of the already-upgraded nodes. After you cancel an upgrade, you can either resume or roll back.
When you cancel an upgrade, GKE does the following with each of the nodes:
- Nodes that have started the upgrade complete it.
- Nodes that have not started the upgrade don't upgrade.
- Nodes that have already successfully completed the upgrade are unaffected and are not rolled back.
This means that the node pool might end up in a state where nodes are running two different versions. If automatic upgrades are enabled for the node pool, the node pool can be scheduled for auto-upgrade again, which would upgrade the remaining nodes in the node pool running the older version.
Learn how to cancel a node pool upgrade.
Resume a surge upgrade
If a node pool upgrade was canceled and left partially upgraded, you can resume the upgrade to complete the upgrade process for the node pool. This will upgrade any remaining nodes that had not been upgraded in the original operation. Learn how to resume a node pool upgrade.
Roll back a surge upgrade
If a node pool is left partially upgraded, you can roll back the node pool to revert it to its previous state. You cannot roll back node pools after they have been successfully upgraded. Nodes that have not started an upgrade are unaffected. Learn how to roll back a node pool upgrade.
If you want to downgrade a node pool back to its previous version after the upgrade is already complete, see Downgrading node pools.
Blue-green upgrades
Blue-green upgrades are an alternative upgrade strategy to the default surge upgrade strategy. With blue-green upgrades, GKE first creates a new set of node resources ("green" nodes) with the new node configuration before evicting any workloads on the original resources ("blue" nodes). GKE keeps the "blue" resources, if needed, for rolling back workloads until their soaking time has been met. You can adjust the pace of upgrades and soaking time based on your environment's needs.
With this strategy, you have more control over the upgrade process. You can roll back an in-progress upgrade, if necessary, as the original environment is maintained during the upgrade. This upgrade strategy, however, is also more resource intensive. As the original environment is replicated, the node pool uses double the number of resources during the upgrade.
Choose blue-green upgrades for your environment
If you have highly-available production workloads that you need to be able to roll back quickly in case the workload does not tolerate the upgrade, and a temporary cost increase is acceptable, we recommend choosing blue-green upgrades for your node pools.
Blue-green upgrades are optimal for the following scenarios:
- if you want a gradual rollout where risk mitigation is most important, where graceful termination greater than 60 minutes is needed.
- if your workloads are less tolerant of disruptions.
- if a temporary cost increase due to higher resource usage is acceptable.
When GKE uses blue-green upgrades
For GKE nodes, there are different types of configuration changes that require the nodes to be recreated. If enabled, GKE uses blue-green upgrades when the following types of changes occur:
- Version changes (upgrades)
- Vertically scaling the nodes by changing the node machine attributes, including machine type, disk type, and disk size
- Image type changes
- Add or replace storage pools in a node pool
Surge upgrades will be used for any other features requiring the nodes to be recreated. To learn more, see When surge upgrades are used.
Phases of blue-green upgrades
With blue-green upgrades, you can customize and control the process by:
- using the upgrade configuration parameters.
- using commands to cancel (pause), resume, roll back, or complete the steps.
This section explains the phases of the upgrade process. You can use upgrade settings to tune how the phases work, and commands to control the upgrade process.
Phase 1: Create green pool
In this phase, a new set of managed instance groups (MIGs)—known as the "green" pool—are created for each zone under the target pool with the new node configuration (new version or image type).
Quota will be checked before starting provisioning new green resources.
In this phase, the original MIGs—known as the blue pool—cluster autoscaler will stop scaling up or down. The green pool can only scale up in this phase.
In this phase, you can cancel the upgrade if necessary. When you cancel a blue-green upgrade, the upgrade is paused in its current phase. After you've canceled it, you can either resume it or roll back. At this phase, rolling back will delete the green pool.
Phase 2: Cordon blue pool
In this phase, all the original nodes in the blue pool (existing MIGs) will be cordoned (marked as unschedulable). Existing workloads will keep running, but new workloads won't be scheduled on the existing nodes.
In this phase, you can cancel the upgrade if necessary. When you cancel a blue-green upgrade, the upgrade is paused in its current phase. After you've canceled it, you can either resume it or roll back. At this phase, rolling back will un-cordon the blue pool and delete the green pool.
Phase 3: Drain blue pool
In this phase, the original nodes in the blue pool (existing MIGs) will be
drained in batches. When Kubernetes drains a nodes, eviction requests are sent
to all the Pods running on the node. The Pods will be rescheduled. For Pods
that have PodDisruptionBudget
violations or long terminationGracePeriodSeconds
during the draining, they will be deleted in the Delete blue pool
phase when the node is deleted.
You can use BATCH_SOAK_DURATION
and NODE_POOL_SOAK_DURATION
, which are
described here and in the next section, to extend the period before Pods are
deleted.
You can control the size of the batches with either of the following settings:
BATCH_NODE_COUNT
: the absolute number of nodes to drain in a batch.BATCH_PERCENT
: the percentage of nodes to drain in a batch, expressed as a decimal between 0 and 1, inclusive. GKE rounds down to the nearest percentage of nodes, to a minimum value of 1 node, if the percentage isn't a whole number of nodes.
If either of these settings are set to zero GKE skips this phase and proceeds to the Soak node pool phase.
Additionally, you can control how long each batch drain soaks with
BATCH_SOAK_DURATION
. This duration is defined in seconds, with the default
being zero seconds.
In this phase, you can still cancel the upgrade if necessary. When you cancel a blue-green upgrade, the upgrade is paused in its current phase. After you've canceled it, you can either resume it or roll back. At this phase, rolling back will stop the draining of the blue pool, and un-cordon the blue pool. Workloads can then be rescheduled on the blue pool (not guaranteed), and the green pool will be deleted.
Phase 4: Soak node pool
This phase is used for you to verify the workload's health after the blue pool nodes have been drained.
The soak time is set with NODE_POOL_SOAK_DURATION
, in seconds. By default, it
is set to one hour (3600 seconds). If the total soak duration reaches 7 days
(604,800 seconds), the Delete blue pool phase
begins immediately.
The total soak duration is the sum of NODE_POOL_SOAK_DURATION
, plus
BATCH_SOAK_DURATION
multiplied by the number of batches, which is determined
by either BATCH_NODE_COUNT
or BATCH_PERCENT
.
In this phase, you can finish the upgrade and skip any remaining soak time by completing the upgrade. This will immediately begin the process of removing the blue pool nodes.
You can still cancel the upgrade if necessary. When you cancel a blue-green upgrade, the upgrade is paused in its current phase. After you've canceled it, you can either resume it or roll back.
In this phase, cluster autoscaler can now scale up or down the green pool as normal.
Phase 5: Delete blue pool
After the expiration of the soaking time, the blue pool nodes will be removed
from the target pool. This phase cannot be paused. Also, this phase does not use
eviction and instead attempts to delete the Pods. Unlike eviction, deletion
doesn't respect PDBs and forcibly deletes the Pods. The deletion caps a Pod's
terminationGracePeriodSeconds
to no more than 60 minutes. After this final
attempt is made to delete the remaining Pods, the blue pool nodes are deleted
from the node pool.
At the completion of this phase, your node pool will have only new nodes with the updated configuration (version or image type).
How cluster autoscaler works with blue-green upgrades
During the phases of a blue-green upgrade, the original "blue" pool does not scale up or down. When the new "green" pool is created, it can only be scaled up until the Soak node pool phase, where it can scale up or down. If an upgrade is rolled back, the original "blue" pool might scale up during this process if additional capacity is needed.
Control an in-progress blue-green upgrade
With blue-green upgrades, while an upgrade is in progress you can use commands to exercise control over it. This gives you a high level of control over the process in case you determine, for instance, that your workloads need to be rolled back to the old node configuration.
Cancel (pause) a blue-green upgrade
When you cancel a blue-green upgrade, you pause the upgrade in its current phase. This command can be used at all phases except the Delete blue pool phase. When cancelled, the node pool will be paused at an intermediate status based on the phase where the request was issued.
Learn how to cancel a node pool upgrade.
After an upgrade is canceled, you can choose one of two paths forward: resume or roll back.
Resume a blue-green upgrade
If you have determined the upgrade is okay to move forward, you can resume it.
If you resume, the upgrade process will continue at the intermediate phase it was paused. To learn how to resume a node pool upgrade, see Resume a node pool upgrade.
Roll back a blue-green upgrade
If you have determined that the upgrade shouldn't move forward and you want to bring the node pool back to its original state, you can roll back. To learn how to roll back a node pool upgrade, see roll back a node pool upgrade.
With the roll back workflow, the process reverses itself to bring the node pool back to its original state. The blue pool will be un-cordoned so that workloads may be rescheduled on it. During this process, cluster autoscaler may scale up the blue pool as needed. The green pool will be drained and deleted.
If you want to downgrade a node pool back to its previous version after the upgrade is already complete, see Downgrading node pools.
Complete a blue-green upgrade
During the Soak phase, you can complete an upgrade if you have determined that the workload does not need further validation on the new node configuration and the old nodes can be removed. Completing an upgrade skips the rest of the Soak phase and proceeds to the Delete blue pool phase.
To learn more about how to use the complete
command, see Complete a blue-green node pool upgrade.