Upgrade clusters

When you install a new version of bmctl, you can upgrade your existing clusters that were created with an earlier version. Upgrading a cluster to the latest GKE on Bare Metal version brings added features and fixes to your cluster. It also ensures that your cluster remains supported. You can upgrade admin, hybrid, standalone, or user clusters with the bmctl upgrade cluster command, or you can use kubectl.

Starting with GKE on Bare Metal version 1.15.0, the default upgrade behavior for self-managed (admin, hybrid, or standalone) clusters is an in-place upgrade. In-place upgrades use lifecycle controllers, instead of a bootstrap cluster, to manage the entire upgrade process. This change simplifies the process and reduces resource requirements, which makes cluster upgrades more reliable and scalable.

To learn more about the upgrade process, see Lifecycle and stages of cluster upgrades.

Upgrade considerations

This section contains information and links to information that you should consider before you upgrade a cluster.

Best practices

For information to help you prepare for a cluster upgrade, see Best practices for Anthos clusters on bare metal cluster upgrades.

Upgrade preflight checks

Preflight checks are run as part of the cluster upgrade to validate cluster status and node health. The cluster upgrade doesn't proceed if the preflight checks fail. For more information on preflight checks, see Understand preflight checks.

You can check if the clusters are ready for an upgrade by running the preflight check before running the upgrade. For more information, see Preflight checks for upgrades.

Known issues

For information about potential problems related to cluster upgrades, see Anthos clusters on bare metal known issues and select the Upgrades and updates problem category.

Upgrade admin, standalone, hybrid, or user clusters

This section contains instructions for upgrading clusters.

bmctl

When you download and install a new version of bmctl, you can upgrade your admin, hybrid, standalone, and user clusters created with an earlier version. For a given version of bmctl, a cluster can be upgraded to the same version only.

  1. Download the latest bmctl as described in GKE on Bare Metal downloads.

  2. Update anthosBareMetalVersion in the cluster configuration file to the upgrade target version.

    The upgrade target version must match the version of the downloaded bmctl file. The following cluster configuration file snippet shows the anthosBareMetalVersion field updated to the latest version:

    ---
    apiVersion: baremetal.cluster.gke.io/v1
    kind: Cluster
    metadata:
      name: cluster1
      namespace: cluster-cluster1
    spec:
      type: admin
      # Anthos cluster version.
      anthosBareMetalVersion: 1.15.11
    
  3. Use the bmctl upgrade cluster command to complete the upgrade:

    bmctl upgrade cluster -c CLUSTER_NAME --kubeconfig ADMIN_KUBECONFIG
    

    Replace the following:

    • CLUSTER_NAME: the name of the cluster to upgrade.
    • ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.

    The cluster upgrade operation runs preflight checks to validate cluster status and node health. The cluster upgrade doesn't proceed if the preflight checks fail. For troubleshooting information, see Troubleshoot cluster install or upgrade issues.

    When all of the cluster components have been successfully upgraded, the cluster upgrade operation performs cluster health checks. This last step verifies that the cluster is in good operating condition. If the cluster doesn't pass all health checks, they continue to run until they pass. When all health checks pass, the upgrade finishes successfully.

    For more information about the sequence of events for cluster upgrades, see Lifecycle and stages of cluster upgrades.

kubectl

To upgrade a cluster with kubectl, perform the following steps:

  1. Edit the cluster configuration file to set anthosBareMetalVersion to the upgrade target version.

  2. To initiate the upgrade, run the following command:

    kubectl apply -f CLUSTER_CONFIG_PATH
    

    Replace CLUSTER_CONFIG_PATH with the path to the edited cluster configuration file.

As with the upgrade process with bmctl, preflight checks are run as part of the cluster upgrade to validate cluster status and node health. If the preflight checks fail, the cluster upgrade is halted. To troubleshoot any failures, examine the cluster and related logs, since no bootstrap cluster is created. For more information, see Troubleshoot cluster install or upgrade issues.

Although you don't need the latest version of bmctl to upgrade cluters with kubectl, we recommend that you download the latest bmctl. You need bmctl to perform other tasks, such as health checks and backups, to ensure that your cluster stays in good working order.

Parallel upgrades

In a typical, default cluster upgrade, each cluster node is upgraded sequentially, one after the other. This section shows you how to configure your cluster and worker node pools so that multiple nodes upgrade in parallel when you upgrade your cluster. Upgrading nodes in parallel speeds up cluster upgrades significantly, especially for clusters that have hundreds of nodes.

There are two parallel upgrade strategies that you can use to speed up your cluster upgrade:

  • Concurrent node upgrade: you can configure your worker node pools so that multiple nodes upgrade in parallel. Parallel upgrades of nodes are configured in the NodePool spec (spec.upgradeStrategy.parallelUpgrade) and only nodes in a worker node pool can be upgraded in parallel. Nodes in control plane or load balancer node pools can only be upgraded one at a time. For more information, see Node upgrade strategy.

  • Concurrent node pool upgrade: you can configure your cluster so that multiple node pools upgrade in parallel. Only worker node pools can be upgraded in parallel. Control plane and load balancer node pools can only be upgraded one at a time. The ability to upgrade multiple node pools concurrently is available for Public Preview. For more information, see Node pool upgrade strategy.

Node upgrade strategy

You can configure worker node pools so that multiple nodes upgrade concurrently (concurrentNodes). You can also set a minimum threshold for the number of nodes able to run workloads throughout the upgrade process (minimumAvailableNodes). This configuration is made in the NodePool spec. For more information about these fields, see the Cluster configuration field reference.

The node upgrade strategy applies to worker node pools only. You can't specify a node upgrade strategy for control plane or load balancer node pools. During a cluster upgrade, nodes in control plane and load balancer node pools upgrade sequentially, one at a time. Control plane node pools and load balancer node pools are specified in the Cluster spec (controlPlane.nodePoolSpec.nodes and loadBalancer.nodePoolSpec.nodes).

When you configure parallel upgrades for nodes, note the following restrictions:

  • The value of concurrentNodes can't exceed either 20 percent of the number of nodes in the node pool, or 10, whichever is smaller. For example, if your node pool has 40 nodes, you can't specify a value greater than 8. If your node pool has 100 nodes, 10 is the maximum value you can specify.

  • When you use concurrentNodes together with minimumAvailableNodes, the combined values can't exceed the total number of nodes in the node pool. For example, if your node pool has 20 nodes and minimumAvailableNodes is set to 18, concurrentNodes can't exceed 2. Likewise, if concurrentNodes is set to 10, minimumAvailableNodes can't exceed 10.

The following example shows a worker node pool np1 with 10 nodes. In an upgrade, nodes upgrade two at a time and at least 5 nodes must remain available for the upgrade to proceed:

apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
  name: np1
  namespace: cluster-cluster1
spec:
  clusterName: cluster1
  nodes:
  - address:  10.200.0.1
  - address:  10.200.0.2
  - address:  10.200.0.3
  - address:  10.200.0.4
  - address:  10.200.0.5
  - address:  10.200.0.6
  - address:  10.200.0.7
  - address:  10.200.0.8
  - address:  10.200.0.9
  - address:  10.200.0.10 
  upgradeStrategy:
    parallelUpgrade:
      concurrentNodes: 2
      minimumAvailableNodes: 5 

Node pool upgrade strategy

You can configure a cluster so that multiple worker node pools upgrade in parallel. The nodePoolUpgradeStrategy.concurrentNodePools Boolean field in the cluster spec specifies whether or not to upgrade all worker node pools for a cluster concurrently. By default (1), node pools upgrade sequentially, one after the other. When you set concurrentNodePools to 0, every worker node pool in the cluster upgrades in parallel.

Control plane and load balancing node pools are not affected by this setting. These node pools always upgrade sequentially, one at a time. Control plane node pools and load balancer node pools are specified in the Cluster spec (controlPlane.nodePoolSpec.nodes and loadBalancer.nodePoolSpec.nodes).

The capability to upgrade all worker node pools concurrently is available for Preview only. Don't use this feature on your production clusters.

apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: cluster1
  namespace: cluster-cluster1
spec:
  ...
  nodePoolUpgradeStrategy:
    concurrentNodePools: 0
  ...

How to perform a parallel upgrade

This section describes how to configure a cluster and a worker node pool for parallel upgrades.

To perform a parallel upgrade of worker node pools and nodes in a worker node pool, do the following:

  1. Add an upgradeStrategy section to the NodePool spec.

    You can apply this manifest separately or as part of the cluster configuration file when you perform a cluster update.

    Here's an example:

    ---
    apiVersion: baremetal.cluster.gke.io/v1
    kind: NodePool
    metadata:
      name: np1
      namespace: cluster-ci-bf8b9aa43c16c47
    spec:
      clusterName: ci-bf8b9aa43c16c47
      nodes:
      - address:  10.200.0.1
      - address:  10.200.0.2
      - address:  10.200.0.3
      ...
      - address:  10.200.0.30
      upgradeStrategy:
        parallelUpgrade:
          concurrentNodes: 5
          minimumAvailableNodes: 10
    

    In this example, the value of the field concurrentNodes is 5, which means that 5 nodes upgrade in parallel. The minimumAvailableNodes field is set to 10, which means that at least 10 nodes must remain available for workloads throughout the upgrade.

  2. Add an nodePoolUpgradeStrategy section to the Cluster spec in the cluster configuration file.

    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: cluster-user001
    ---
    apiVersion: baremetal.cluster.gke.io/v1
    kind: Cluster
    metadata:
      name: user001
      namespace: cluster-user001
    spec:
      type: user
      profile: default
      anthosBareMetalVersion: 1.15.0
      ...
      nodePoolUpgradeStrategy:
        concurrentNodePools: 0
      ...
    

    In this example, the concurrentNodePools field is set to 0, which means that all worker node pools upgrade concurrently during the cluster upgrade. The upgrade strategy for the nodes in the node pools is defined in the NodePool specs.

  3. Upgrade the cluster as described in the preceding Upgrade admin, standalone, hybrid, or user clusters section.

How to disable parallel upgrades of nodes

Parallel upgrades are disabled by default and the fields related to parallel upgrades are mutable. At any time, you can either remove the fields or set them to their default values to disable the feature before a subsequent upgrade.

The following table lists the parallel upgrade fields and their default values:

Field Default value Meaning
nodePoolUpgradeStrategy.concurrentNodePools (Cluster spec) 1 Upgrade worker node pools sequentially, one after the other.
upgradeStrategy.parallelUpgrade.concurrentNodes (NodePool spec) 1 Upgrade nodes sequentially, one after the other.
upgradeStrategy.parallelUpgrade.minimumAvailableNodes (NodePool spec) 0 There is no requirement to ensure that any nodes are available during an upgrade.