Upgrade clusters

When you install a new version of bmctl, you can upgrade your existing clusters that were created with an earlier version. Upgrading a cluster to the latest GKE on Bare Metal version brings added features and fixes to your cluster. It also ensures that your cluster remains supported. You can upgrade admin, hybrid, standalone, or user clusters with the bmctl upgrade cluster command, or you can use kubectl.

To learn more about the upgrade process, see Lifecycle and stages of cluster upgrades.

Plan your upgrade

This section contains information and links to information that you should consider before you upgrade a cluster.

Best practices

For information to help you prepare for a cluster upgrade, see Best practices for GKE on Bare Metal cluster upgrades.

Upgrade preflight checks

Preflight checks are run as part of the cluster upgrade to validate cluster status and node health. The cluster upgrade doesn't proceed if the preflight checks fail. For more information on preflight checks, see Understand preflight checks.

You can check if the clusters are ready for an upgrade by running the preflight check before running the upgrade. For more information, see Preflight checks for upgrades.

Known issues

For information about potential problems related to cluster upgrades, see Anthos clusters on bare metal known issues and select the Upgrades and updates problem category.

Configure upgrade options

Before you start a cluster upgrade, you can configure the following upgrade options that control how the upgrade process works:

These options can reduce the risk of disruptions to critical applications and services and significantly reduce overall upgrade time. These options are especially useful for large clusters with numerous nodes and node pools running important workloads. For more information about what these options do and how to use them, see the following sections.

Selective worker node pool upgrades

By default, the cluster upgrade operation upgrades every node and node pool in the cluster. A cluster upgrade can be disruptive and time consuming, as it results in each node being drained and all associated pods being restarted and rescheduled. This section describes how you can include or exclude select worker node pools for a cluster upgrade to minimize workload disruption. This feature applies to user, hybrid, and standalone clusters only, since admin clusters don't allow worker node pools.

You might use selective node pool upgrades in the following situations:

  • To pick up security fixes without disrupting workloads: You can upgrade just your control plane nodes (and load balancer nodes) to apply Kubernetes vulnerability fixes without disrupting your worker node pools.

  • To confirm proper operation of an upgraded subset of worker nodes before upgrading all worker nodes: You can upgrade your worker node pools selectively to ensure that workloads are running properly on an upgraded node pool before you upgrade another node pool.

  • To reduce the maintenance window: Upgrading a large cluster can be time consuming and it's difficult to accurately predict when an upgrade will complete. Cluster upgrade time is proportional to the number of nodes being upgraded. Reducing the number of nodes being upgraded by excluding node pools reduces the upgrade time. You upgrade multiple times, but the smaller, more predictable maintenance windows may help with scheduling.

Two minor version node pool version skew

With GKE on Bare Metal minor release 1.28, a worker node pool version can be up to two minor versions behind the cluster (control plane) version. This n-2 version skew is available as a (Preview) capability.

  • To enable this Preview capability, add the preview.baremetal.cluster.gke.io/two-minor-version-node-pool: enable annotation to your cluster configuration file:

    apiVersion: baremetal.cluster.gke.io/v1
    kind: Cluster
    metadata:
      name: baremetal-demo
      namespace: cluster-baremetal-demo
      annotations:
        preview.baremetal.cluster.gke.io/two-minor-version-node-pool: enable
    spec:
    ...
    

    If you don't enable this Preview capability, the maximum version skew between a worker node pool and the cluster is one minor version.

For more information about the versioning rules for selectively upgrading worker node pools, see Node pool versioning rules in Lifecycle and stages of cluster upgrades.

Upgrade your cluster control plane and selected node pools

To selectively upgrade worker node pools in the initial cluster upgrade:

  1. For the worker node pools that you want to include in the cluster upgrade, make one of the following changes to the NodePool spec:

    • Set anthosBareMetalVersion in the NodePool spec to the cluster target upgrade version.
    • Omit the anthosBareMetalVersion field from the NodePool spec. or set it to the empty string. By default, worker node pools are included in cluster upgrades.
  2. For the worker node pools that you want to exclude from the upgrade, set anthosBareMetalVersion to the current (pre-upgrade) version of the cluster:

  3. Continue with your upgrade as described in Start the cluster upgrade.

    The cluster upgrade operation upgrades the following nodes:

    • Cluster control plane nodes.
    • Load balancer node pool, if your cluster uses one (spec.loadBalancer.nodePoolSpec). By default, load balancer nodes can run regular workloads. You can't selectively upgrade a load balancer node pool, it's always included in the initial cluster upgrade.
    • Worker node pools that you haven't excluded from the upgrade.

For example, suppose that your cluster is at version 1.16.0 and has two worker node pools: wpool01 and wpool02. Also, suppose that you want to upgrade the control plane and wpool01 to 1.28.300-gke.131, but you want wpool02 to remain at version 1.16.0.

The following cluster configuration file excerpt shows how you can modify the cluster configuration to support this partial upgrade:

...
---
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: user001
  namespace: cluster-user001
spec:
  type: user
  profile: default
  anthosBareMetalVersion: 1.28.300-gke.131
---
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
  name: wpool01
  namespace: cluster-user001
spec:
  clusterName: user001
  anthosBareMetalVersion: 1.28.300-gke.131
  nodes:
  - address:  10.200.0.1
  - address:  10.200.0.2
  - address:  10.200.0.3
  ...
  - address:  10.200.0.8

apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
  name: wpool02
  namespace: cluster-user001
spec:
  clusterName: user001
  anthosBareMetalVersion: 1.16.0
  nodes:
  - address:  10.200.1.1
  - address:  10.200.1.2
  - address:  10.200.1.3
  ...
  - address:  10.200.1.12

Upgrade node pools to the current cluster version

If you've excluded node pools from a cluster upgrade, you can run a cluster upgrade that brings them up to the target cluster version. Worker node pools that have been excluded from a cluster upgrade have the anthosBareMetalVersion field in their NodePool spec set to the previous (pre-upgrade) cluster version.

To bring worker node pools up to the current, upgraded cluster version:

  1. Edit the NodePool specs in the cluster configuration file for the worker node pools that you want to bring up to the current cluster version. Set anthosBareMetalVersion to the current (post-upgrade) cluster version.

    If multiple worker node pools are selected for upgrade, the value ofspec.nodePoolUpgradeStrategy.concurrentNodePools in the cluster spec determines how many node pools are upgraded in parallel, if any. If you don't want to upgrade worker node pools concurrently, select one node pool at a time for upgrade.

  2. Continue with your upgrade as described in Start the cluster upgrade.

    The cluster upgrade operation upgrades only the previously excluded worker node pools for which you have set anthosBareMetalVersion to the current, upgraded cluster version.

For example, suppose that you upgraded your cluster to version 1.28.300-gke.131, but node pool wpool02 is still at the old, pre-upgrade cluster version 1.16.0. Workloads are running properly on the upgraded node pool, wpool01, so now you want to bring wpool02 up to the current cluster version, too. To upgrade wpool02, you can remove the anthosBareMetalVersion field or set its value to the empty string.

The following cluster configuration file excerpt shows how you can modify the cluster configuration to support this partial upgrade:

...
---
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: user001
  namespace: cluster-user001
spec:
  type: user
  profile: default
  anthosBareMetalVersion: 1.28.300-gke.131
---
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
  name: wpool01
  namespace: cluster-user001
spec:
  clusterName: user001
  anthosBareMetalVersion: 1.28.300-gke.131
  nodes:
  - address:  10.200.0.1
  - address:  10.200.0.2
  - address:  10.200.0.3
  ...
  - address:  10.200.0.8

apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
  name: wpool02
  namespace: cluster-user001
spec:
  clusterName: user001
  anthosBareMetalVersion: ""
  nodes:
  - address:  10.200.1.1
  - address:  10.200.1.2
  - address:  10.200.1.3
  ...
  - address:  10.200.1.12

Parallel upgrades

In a typical, default cluster upgrade, each cluster node is upgraded sequentially, one after the other. This section shows you how to configure your cluster and worker node pools so that multiple nodes upgrade in parallel when you upgrade your cluster. Upgrading nodes in parallel speeds up cluster upgrades significantly, especially for clusters that have hundreds of nodes.

There are two parallel upgrade strategies that you can use to speed up your cluster upgrade:

  • Concurrent node upgrade: you can configure your worker node pools so that multiple nodes upgrade in parallel. Parallel upgrades of nodes are configured in the NodePool spec (spec.upgradeStrategy.parallelUpgrade) and only nodes in a worker node pool can be upgraded in parallel. Nodes in control plane or load balancer node pools can only be upgraded one at a time. For more information, see Node upgrade strategy.

  • Concurrent node pool upgrade: you can configure your cluster so that multiple node pools upgrade in parallel. Only worker node pools can be upgraded in parallel. Control plane and load balancer node pools can only be upgraded one at a time.

Node upgrade strategy

You can configure worker node pools so that multiple nodes upgrade concurrently (concurrentNodes). You can also set a minimum threshold for the number of nodes able to run workloads throughout the upgrade process (minimumAvailableNodes). This configuration is made in the NodePool spec. For more information about these fields, see the Cluster configuration field reference.

The node upgrade strategy applies to worker node pools only. You can't specify a node upgrade strategy for control plane or load balancer node pools. During a cluster upgrade, nodes in control plane and load balancer node pools upgrade sequentially, one at a time. Control plane node pools and load balancer node pools are specified in the Cluster spec (controlPlane.nodePoolSpec.nodes and loadBalancer.nodePoolSpec.nodes).

When you configure parallel upgrades for nodes, note the following restrictions:

  • The value of concurrentNodes can't exceed either 50 percent of the number of nodes in the node pool, or the fixed number 15, whichever is smaller. For example, if your node pool has 20 nodes, you can't specify a value greater than 10. If your node pool has 100 nodes, 15 is the maximum value you can specify.

  • When you use concurrentNodes together with minimumAvailableNodes, the combined values can't exceed the total number of nodes in the node pool. For example, if your node pool has 20 nodes and minimumAvailableNodes is set to 18, concurrentNodes can't exceed 2. Likewise, if concurrentNodes is set to 10, minimumAvailableNodes can't exceed 10.

The following example shows a worker node pool np1 with 10 nodes. In an upgrade, nodes upgrade 5 at a time and at least 4 nodes must remain available for the upgrade to proceed:

apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
  name: np1
  namespace: cluster-cluster1
spec:
  clusterName: cluster1
  nodes:
  - address:  10.200.0.1
  - address:  10.200.0.2
  - address:  10.200.0.3
  - address:  10.200.0.4
  - address:  10.200.0.5
  - address:  10.200.0.6
  - address:  10.200.0.7
  - address:  10.200.0.8
  - address:  10.200.0.9
  - address:  10.200.0.10 
  upgradeStrategy:
    parallelUpgrade:
      concurrentNodes: 5
      minimumAvailableNodes: 4 

Node pool upgrade strategy

You can configure a cluster so that multiple worker node pools upgrade in parallel. The nodePoolUpgradeStrategy.concurrentNodePools Boolean field in the cluster spec specifies whether or not to upgrade all worker node pools for a cluster concurrently. By default (1), node pools upgrade sequentially, one after the other. When you set concurrentNodePools to 0, every worker node pool in the cluster upgrades in parallel.

Control plane and load balancing node pools are not affected by this setting. These node pools always upgrade sequentially, one at a time. Control plane node pools and load balancer node pools are specified in the Cluster spec (controlPlane.nodePoolSpec.nodes and loadBalancer.nodePoolSpec.nodes).

apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: cluster1
  namespace: cluster-cluster1
spec:
  ...
  nodePoolUpgradeStrategy:
    concurrentNodePools: 0
  ...

How to perform a parallel upgrade

This section describes how to configure a cluster and a worker node pool for parallel upgrades.

To perform a parallel upgrade of worker node pools and nodes in a worker node pool, do the following:

  1. Add an upgradeStrategy section to the NodePool spec.

    You can apply this manifest separately or as part of the cluster configuration file when you perform a cluster update.

    Here's an example:

    ---
    apiVersion: baremetal.cluster.gke.io/v1
    kind: NodePool
    metadata:
      name: np1
      namespace: cluster-ci-bf8b9aa43c16c47
    spec:
      clusterName: ci-bf8b9aa43c16c47
      nodes:
      - address:  10.200.0.1
      - address:  10.200.0.2
      - address:  10.200.0.3
      ...
      - address:  10.200.0.30
      upgradeStrategy:
        parallelUpgrade:
          concurrentNodes: 5
          minimumAvailableNodes: 10
    

    In this example, the value of the field concurrentNodes is 5, which means that 5 nodes upgrade in parallel. The minimumAvailableNodes field is set to 10, which means that at least 10 nodes must remain available for workloads throughout the upgrade.

  2. Add an nodePoolUpgradeStrategy section to the Cluster spec in the cluster configuration file.

    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: cluster-user001
    ---
    apiVersion: baremetal.cluster.gke.io/v1
    kind: Cluster
    metadata:
      name: user001
      namespace: cluster-user001
    spec:
      type: user
      profile: default
      anthosBareMetalVersion: 1.28.300-gke.131
      ...
      nodePoolUpgradeStrategy:
        concurrentNodePools: 0
      ...
    

    In this example, the concurrentNodePools field is set to 0, which means that all worker node pools upgrade concurrently during the cluster upgrade. The upgrade strategy for the nodes in the node pools is defined in the NodePool specs.

  3. Upgrade the cluster as described in the preceding Upgrade admin, standalone, hybrid, or user clusters section.

Parallel upgrade default values

Parallel upgrades are disabled by default and the fields related to parallel upgrades are mutable. At any time, you can either remove the fields or set them to their default values to disable the feature before a subsequent upgrade.

The following table lists the parallel upgrade fields and their default values:

Field Default value Meaning
nodePoolUpgradeStrategy.concurrentNodePools (Cluster spec) 1 Upgrade worker node pools sequentially, one after the other.
upgradeStrategy.parallelUpgrade.concurrentNodes (NodePool spec) 1 Upgrade nodes sequentially, one after the other.
upgradeStrategy.parallelUpgrade.minimumAvailableNodes (NodePool spec) The default minimumAvailableNodes value depends on the value of concurrentNodes.
  • If you don't specify concurrentNodes, then minimumAvailableNodes by default is 2/3 the node pool size.
  • If you do specify concurrentNodes, then minimumAvailableNodes by default is the node pool size minus concurrentNodes.
Upgrade stalls once minimumAvailableNodes is reached and only continues once the number of available nodes is greater than minimumAvailableNodes.

Start the cluster upgrade

This section contains instructions for upgrading clusters.

bmctl

When you download and install a new version of bmctl, you can upgrade your admin, hybrid, standalone, and user clusters created with an earlier version. For a given version of bmctl, a cluster can be upgraded to the same version only.

  1. Download the latest bmctl as described in GKE on Bare Metal downloads.

  2. Update anthosBareMetalVersion in the cluster configuration file to the upgrade target version.

    The upgrade target version must match the version of the downloaded bmctl file. The following cluster configuration file snippet shows the anthosBareMetalVersion field updated to the latest version:

    ---
    apiVersion: baremetal.cluster.gke.io/v1
    kind: Cluster
    metadata:
      name: cluster1
      namespace: cluster-cluster1
    spec:
      type: admin
      # Anthos cluster version.
      anthosBareMetalVersion: 1.28.300-gke.131
    
  3. Use the bmctl upgrade cluster command to complete the upgrade:

    bmctl upgrade cluster -c CLUSTER_NAME --kubeconfig ADMIN_KUBECONFIG
    

    Replace the following:

    • CLUSTER_NAME: the name of the cluster to upgrade.
    • ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.

    The cluster upgrade operation runs preflight checks to validate cluster status and node health. The cluster upgrade doesn't proceed if the preflight checks fail. For troubleshooting information, see Troubleshoot cluster install or upgrade issues.

    When all of the cluster components have been successfully upgraded, the cluster upgrade operation performs cluster health checks. This last step verifies that the cluster is in good operating condition. If the cluster doesn't pass all health checks, they continue to run until they pass. When all health checks pass, the upgrade finishes successfully.

    For more information about the sequence of events for cluster upgrades, see Lifecycle and stages of cluster upgrades.

kubectl

To upgrade a cluster with kubectl, perform the following steps:

  1. Edit the cluster configuration file to set anthosBareMetalVersion to the upgrade target version.

  2. To initiate the upgrade, run the following command:

    kubectl apply -f CLUSTER_CONFIG_PATH
    

    Replace CLUSTER_CONFIG_PATH with the path of the edited cluster configuration file.

    As with the upgrade process with bmctl, preflight checks are run as part of the cluster upgrade to validate cluster status and node health. If the preflight checks fail, the cluster upgrade is halted. To troubleshoot any failures, examine the cluster and related logs, since no bootstrap cluster is created. For more information, see Troubleshoot cluster install or upgrade issues.

Although you don't need the latest version of bmctl to upgrade cluters with kubectl, we recommend that you download the latest bmctl. You need bmctl to perform other tasks, such as health checks and backups, to ensure that your cluster stays in good working order.

Pause and resume upgrades

With GKE on Bare Metal minor release 1.28, you can pause and resume a cluster upgrade. When a cluster upgrade is paused, no new node upgrades are triggered, until the upgrade is resumed. This capability is only available for clusters with all control plane nodes at minor version 1.28 or higher.

You might want to pause an upgrade for the following reasons:

  • You've detected something wrong with cluster workloads during the upgrade and you want to pause the upgrade to look into the issue

  • You have short maintenance windows, so you want to pause the upgrade in between windows

While a cluster upgrade is paused, the following operations are supported:

When a new node is added while an upgrade is paused, machine check jobs don't run on it until the upgrade is resumed and completed.

While the cluster upgrade is paused, the following cluster operations aren't supported:

You can't initiate a new cluster upgrade while an active cluster upgrade is paused.

Enable upgrade pause and resume

While the upgrade pause and resume capability is in Preview, it's can be enabled with an annotation in the Cluster resource.

To enable upgrade pause and resume, use the following steps:

  1. Add the preview.baremetal.cluster.gke.io/upgrade-pause-and-resume annotation to your cluster configuration file:

    apiVersion: baremetal.cluster.gke.io/v1
    kind: Cluster
    metadata:
      name: baremetal-demo
      namespace: cluster-baremetal-demo
      annotations:
        preview.baremetal.cluster.gke.io/upgrade-pause-and-resume
    spec:
    ...
    
  2. To apply the change, update your cluster:

    bmctl update CLUSTER_NAME
    

    The nodePoolUpgradeStrategy.pause field is mutable. You can add and update it at any time.

Pause an upgrade

You pause a cluster upgrade by setting nodePoolUpgradeStrategy.pause to true in the Cluster spec.

To pause an active cluster upgrade, use the following steps:

  1. Add nodePoolUpgradeStrategy.pause to the cluster configuration file and set it to true:

    apiVersion: baremetal.cluster.gke.io/v1
    kind: Cluster
    metadata:
      name: baremetal-demo
      namespace: cluster-baremetal-demo
      annotations:
            preview.baremetal.cluster.gke.io/upgrade-pause-and-resume
    spec:
      ...
      nodePoolUpgradeStrategy:
        pause: true
      ...
    

    If you used bmctl to initiate the upgrade, you need a new terminal window to perform the next step.

  2. To apply the change, update your cluster:

    bmctl update CLUSTER_NAME
    

    The upgrade operation is paused. No new node upgrades are triggered.

  3. If you used bmctl to initiate the upgrade and you're planning for a long-lasting pause, press Control+C to exit bmctl, otherwise, keep bmctl running.

    The bmctl CLI doesn't detect changes in the upgrade pause status, so it doesn't exit automatically. However, when you exit bmctl, it stops logging upgrade progress to the cluster-upgrade-TIMESTAMP log file in the cluster folder on your admin workstation and to Cloud Logging. Therefore, for short pauses, you may want to keep bmctl running. If you leave bmctl running for an extended period while the upgrade is paused, it eventually times out.

Resume a paused upgrade

You resume a paused cluster upgrade by either setting nodePoolUpgradeStrategy.pause to false in the Cluster spec or removing nodePoolUpgradeStrategy.pause from the spec.

To resume a cluster upgrade that's been paused, use the following steps:

  1. Set nodePoolUpgradeStrategy.pause to the cluster configuration file and set it to false:

    apiVersion: baremetal.cluster.gke.io/v1
    kind: Cluster
    metadata:
      name: baremetal-demo
      namespace: cluster-baremetal-demo
      annotations:
            preview.baremetal.cluster.gke.io/upgrade-pause-and-resume
    spec:
      ...
      nodePoolUpgradeStrategy:
        pause: false
      ...
    

    Alternatively, you can remove the pause field, because it defaults to false.

  2. To apply the change, update your cluster:

    bmctl update CLUSTER_NAME
    

    The upgrade operation resumes where it left off.

  3. To check the status of the upgrade, first get a list of the resources that have anthosBareMetalVersion in their status:

    kubectl get RESOURCE --kubeconfig ADMIN_KUBECONFIG --all_namespaces
    

    Replace the following:

    • RESOURCE: the name of the resource that you want to get. Cluster, NodePool, and BareMetalMachine resources all contain anthosBareMetalVersion status information.

    • ADMIN_KUBECONFIG: the path of the admin cluster kubeconfig file.

    The following sample shows the format of the response for BareMetalMachine custom resources. Each BareMetalMachine corresponds to a cluster node.

    NAMESPACE              NAME         CLUSTER        READY   INSTANCEID               MACHINE      ABM VERSION   DESIRED ABM VERSION
    cluster-nuc-admin001   192.0.2.52   nuc-admin001   true    baremetal://192.0.2.52   192.0.2.52   1.28.0        1.28.0
    cluster-nuc-user001    192.0.2.53   nuc-user001    true    baremetal://192.0.2.53   192.0.2.53   1.16.2        1.16.2
    cluster-nuc-user001    192.0.2.54   nuc-user001    true    baremetal://192.0.2.54   192.0.2.54   1.16.2        1.16.2
    
  4. To check the status.anthosBareMetalVersion (current version of the resource), retrieve details for individual resources:

    kubectl describe RESOURCE RESOURCE_NAME \
        --kubeconfig ADMIN_KUBECONFIG \
        --namespace CLUSTER_NAMESPACE
    

    The following sample shows the BareMetalMachine details for the cluster node with IP address 192.0.2.53:

    Name:         192.0.2.53
    Namespace:    cluster-nuc-user001
    ...
    API Version:  infrastructure.baremetal.cluster.gke.io/v1
    Kind:         BareMetalMachine
    Metadata:
      Creation Timestamp:  2023-09-22T17:52:09Z
      ...
    Spec:
      Address:                    192.0.2.53
      Anthos Bare Metal Version:  1.16.2
      ...
    Status:
      Anthos Bare Metal Version:  1.16.2
    

    In this example, the node is at GKE on Bare Metal version 1.16.2.