Skip a version when upgrading node pools

In version 1.29 and higher, Google Distributed Cloud allows a user cluster's control plane to be up to two minor versions higher than the node pools in the cluster. For example, if a user cluster's control plane is at 1.29, the node pools in the cluster can be at version 1.16, 1.28, or 1.29. Additionally, Google Distributed Cloud lets you skip one minor version when upgrading node pools. Using the previous example, you can upgrade node pools that are at version 1.16 directly to version 1.29 and skip the upgrade to 1.28. Skipping a minor version when upgrading node pools is referred to as a skip-version upgrade.

Skip-version upgrades are supported only for Ubuntu and COS node pools. Because of Kubernetes constraints, a user cluster's control plane must be upgraded one minor version at a time. Note, however, that upgrading only the control plane takes significantly less time and is less risky than upgrading node pools where your workloads run.

This page explains some of the benefits of a skip-version upgrade and provides steps on how to perform a skip-version upgrade by making configuration file changes and running gkectl upgrade cluster.

This page is for IT administrators and Operators who manage the lifecycle of the underlying tech infrastructure. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE Enterprise user roles and tasks. This page assumes that you are somewhat familiar with planning and executing Google Distributed Cloud upgrades as described in the following:

Benefits of skip-version upgrades

This section describes some benefits of using skip-version upgrades.

Easier to keep your clusters in a supported version

A new Google Distributed Cloud minor version is released every four months, and each minor version has a one-year support window. For your clusters to stay within the supported window, you must perform a minor version upgrade approximately every four months, as shown in the following:

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

1.14 Upgrade
1.15 Upgrade
1.16 Upgrade
1.28 Upgrade
1.29 Upgrade

This requirement imposes challenges when you need a long validation window to verify a new minor version and a short maintenance window to upgrade your clusters to the new minor version. To overcome these challenges, you can use a skip-version upgrade, which allows your clusters to stay within the supported window by upgrading a cluster every eight months instead of every four months. The following table shows how skipping the upgrade for version 1.15 means you only upgrade after eight months instead of four.

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

1.14 Upgrade
1.15
1.16 Upgrade
1.28
1.29

Skipping over one minor version when upgrading your node pools reduces the number of upgrades required to stay on a supported version. Additionally, you don't need to qualify the skipped minor version because it is only used by the control plane temporarily.

Shorter maintenance window

With a skip-version upgrade, you don't need to enlarge your maintenance window. Skipping a minor version when upgrading node pools takes the same amount of time as upgrading the node pools to the next minor version because each node in a node pool is drained and recreated once. Therefore, a skip-version upgrade saves time overall and reduces workload disruption.

Summary

In summary, a skip-version upgrade provides the following benefits:

  • Get clusters to a supported version: Google Distributed Cloud supports the three most recent minor versions. If your clusters are on an unsupported version, depending on the cluster version, skipping a minor version when upgrading node pools could get your clusters to a supported version with fewer upgrades.

  • Save time: Skipping a minor version when upgrading node pools takes the same amount of time as upgrading the node pools to the next minor version. Therefore, a skip-version upgrade takes approximately half the time of upgrading node pools twice. Similarly, with a skip-version upgrade, you have just one validation window, compared to two with regular upgrades.

  • Reduce disruptions: Longer spans between upgrades and less time spent upgrading and validating means that your workloads run longer with fewer disruptions.

Controlling the control plane and node pool versions during an upgrade

In the user cluster configuration file, the field nodePools[i].gkeOnPremVersion allows a specific node pool to use a different version than the top level gkeOnPremVersion field. By changing the value of the nodePools[i].gkeOnPremVersion field, you control when a node pool is upgraded when you run gkectl upgrade cluster. If you don't include nodePools[i].gkeOnPremVersion in the configuration file, or if you set the field to an empty string, node pools are upgraded to the same target version that you specify in gkeOnPremVersion.

Skip-version upgrade sequence

Suppose your cluster control plane and all node pools are at minor version 1.N. At a high level, upgrading your cluster from 1.N to 1.N+2 using a skip-version upgrade works as follows:

  1. Upgrade only the control plane from the source version (1.N), to an intermediate version (1.N+1). Leave the node pools at the source version. The intermediate version is needed because the control plane must be upgraded one minor version at a time.
  2. Upgrade the control plane and the node pools to the target version (1.N+2).

Perform a skip-version upgrade

This section provides the steps for performing a skip-version upgrade.

Before you begin

  1. Make sure the current version (the source version) of the cluster is at version 1.16 or higher. Be sure to check the version of the control plane (gkeOnPremVersion) and all node pools (nodePools[i].gkeOnPremVersion).

  2. In version 1.29 and later, server-side preflight checks are enabled by default. Make sure to review your firewall rules to make any needed changes.

  3. To upgrade to version 1.28 and later, you must enable kubernetesmetadata.googleapis.com and grant the kubernetesmetadata.publisher IAM role to the logging-monitoring service account. For details, see Google API and IAM requirements.

Perform the upgrade

  1. Define the source version (1.N), the intermediate version (1.N+1), and the target version (1.N+2) in the following placeholder variables. All versions must be the full version number in the form x.y.z-gke.N such as 1.16.11-gke.25.

    Version
    Get the current cluster version. This is the source version (1.N). SOURCE_VERSION
    Pick an intermediate version (1.N+1). INTERMEDIATE_VERSION
    Pick the target version (1.N+2). Select the recommended patch from the target minor version. TARGET_VERSION
  2. Upgrade your admin workstation to the intermediate version, INTERMEDIATE_VERSION. Wait for a message indicating the upgrade was successful.

  3. Install the corresponding bundle:

    gkectl prepare \
        --bundle-path /var/lib/gke/bundles/gke-onprem-vsphere-INTERMEDIATE_VERSION.tgz \
        --kubeconfig ADMIN_CLUSTER_KUBECONFIG
    

    Replace ADMIN_CLUSTER_KUBECONFIG with the path of your admin cluster kubeconfig file.

  4. Upgrade your admin workstation again, but this time to the target version, TARGET_VERSION. Wait for a message indicating the upgrade was successful.

  5. Install the corresponding bundle:

    gkectl prepare \
        --bundle-path /var/lib/gke/bundles/gke-onprem-vsphere-TARGET_VERSION.tgz \
        --kubeconfig ADMIN_CLUSTER_KUBECONFIG
    
  6. Upgrade only the control plane to the intermediate version as follows:

    1. Make the following changes in the user cluster configuration file:

      • Set the gkeOnPremVersion field to the intermediate version, INTERMEDIATE_VERSION.

      • Set all the node pool versions in nodePools[i].gkeOnPremVersion to the source version, SOURCE_VERSION.

      After updating your configuration file, it should look similar to the following:

      gkeOnPremVersion: INTERMEDIATE_VERSION
      ...
      nodePools:
      - name: pool-1
        gkeOnPremVersion: SOURCE_VERSION
        ...
      - name: pool-2
        gkeOnPremVersion: SOURCE_VERSION
        ...
      
    2. Upgrade the control plane:

      gkectl upgrade cluster \
          --kubeconfig ADMIN_CLUSTER_KUBECONFIG \
          --config USER_CLUSTER_CONFIG_FILE
      

      Replace USER_CLUSTER_CONFIG with the path of your user cluster configuration file.

  7. Upgrade the control plane and the node pools to the target version as follows:

    1. Make the following changes in the user cluster configuration file:

      • Set the gkeOnPremVersion field to the target version, TARGET_VERSION.

      • Set all nodePools[i].gkeOnPremVersion to an empty string.

      After updating your configuration file, it should look similar to the following:

      gkeOnPremVersion: TARGET_VERSION
      ...
      nodePools:
      - name: pool-1
        gkeOnPremVersion: ""
        ...
      - name: pool-2
        gkeOnPremVersion: ""
        ...
      
    2. Upgrade the control plane and the node pools:

      gkectl upgrade cluster \
          --kubeconfig ADMIN_CLUSTER_KUBECONFIG \
          --config USER_CLUSTER_CONFIG_FILE
      

What's next