Version 1.6. This version is supported as outlined in the Anthos version support policy, offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware (GKE on-prem). Refer to the release notes for more details. This is not the most recent version.

Upgrading Anthos clusters on VMware

This page explains how to upgrade Anthos clusters on VMware (GKE on-prem).

Target versions

Starting with Anthos clusters on VMware version 1.3.2, you can upgrade directly to any version that is in the same minor release or the next minor release. For example, you can upgrade from 1.3.2 to 1.3.5, or from 1.5.2 to 1.6.1.

If your current version is lower than 1.3.2, then you must do sequential upgrades to reach version 1.3.2 first. For example, to upgrade from 1.3.0 to 1.3.2, you must first upgrade from 1.3.0 to 1.3.1, and then from 1.3.1 to 1.3.2.

If you are upgrading from version 1.3.2 or later to a version that is not part of the next minor release, you must upgrade through one version of each minor release between your current version and your desired version. For example, if you are upgrading from version 1.3.2 to version 1.6.1, it is not possible to upgrade directly. You must first upgrade from version 1.3.2 to version 1.4.x, where x represents any patch release under that minor release. You can then upgrade to version 1.5.x, and finally to version 1.6.1.

Overview of the upgrade process

  1. Download the gkeadm tool. The version of gkeadm must be the same as the target version of your upgrade.

  2. Use gkeadm to upgrade your admin workstation.

  3. From your admin workstation, upgrade your admin cluster.

  4. From your admin workstation, upgrade your user clusters.

Upgrade policy

After you upgrade your admin cluster:

  • Any new user clusters that you create must have the same version as your admin cluster.

  • If you upgrade an existing user cluster, you must upgrade to the same version as your admin cluster.

  • Before you upgrade your admin cluster again, you must upgrade all of your user clusters to the same version as your current admin cluster.

Locating your configuration and information files

When you created your current admin workstation, you filled in an admin workstation configuration file that was generated by gkeadm create config. The default name for this file is admin-ws-config.yaml.

When you created your current admin workstation, gkeadm created an information file for you. The default name of this file is the same as the name of your current admin workstation.

Locate your admin workstation configuration file and your information file. You need them to do the steps in this guide. If these files are in your current directory and they have their default names, then you won't need to specify them when you run gkeadm upgrade admin-workstation. If these files are in another directory, or if you have changed the filenames, then you specify them by using the --config and --info-file flags.

Upgrading your admin workstation

To upgrade your admin workstation, first download a new version of the gkeadm tool, and then use it to upgrade the configuration of your admin workstation. The version of gkeadm must match the target version of your upgrade.

Downloading gkeadm

To download the appropriate version of gkeadm, follow the instructions on the Downloads page.

Upgrading your admin workstation

gkeadm upgrade admin-workstation --config [AW_CONFIG_FILE] --info-file [INFO_FILE]

where:

  • [AW_CONFIG_FILE] is the path of your admin workstation configuration file. You can omit this flag if the file is in your current directory and has the name admin-ws-config.yaml.

  • [INFO_FILE] is the path of your information file. You can omit this flag if the file is in your current directory. The default name of this file is the same as the name of your admin workstation.

The preceding command performs the following tasks:

  • Back up all files in the home directory of your current admin workstation. These include:

    • Your Anthos clusters on VMware configuration file. The default name of this file is config.yaml.

    • The kubeconfig files for your admin cluster and your user clusters.

    • The root certificate for your vCenter server. Note that this file must have owner read and owner write permission.

    • The JSON key file for your component access service account. Note that this file must have owner read and owner write permission.

    • The JSON key files for your connect-register, connect-agent, and logging-monitoring service accounts.

  • Create a new admin workstation, and copy all the backed-up files to the new admin workstation.

  • Delete the old admin workstation.

Removing the old admin workstation from known_hosts

If your admin workstation has a static IP address, you need to remove your old admin workstation from the known_hosts file after upgrading your admin workstation.

To remove the old admin workstation from known_hosts:

ssh-keygen -R [ADMIN_WS_IP]

where [ADMIN_WS_IP] is the IP address of your admin workstation.

Updating the node OS image and Docker images

On your new admin workstation, run the following command:

gkectl prepare --config [ADMIN_CONFIG] [FLAGS]

where:

  • [ADMIN_CONFIG] is the path of the admin cluster configuration file.

  • [FLAGS] is an optional set of flags. For example, you could include the --skip-validation-infra flag to skip checking of your vSphere infrastructure.

The preceding command performs the following tasks:

  • If necessary, copy a new node OS image to your vSphere environment, and mark the OS image as a template.

  • If you have configured a private Docker registry, push updated Docker images to your private Docker registry.

Verify that enough IP addresses are available

Do the steps in this section on your new admin workstation.

Before you upgrade, be sure that you have enough IP addresses available for your clusters.

DHCP

When you upgrade the admin cluster, Anthos clusters on VMware creates one temporary node in the admin cluster. When you upgrade a user cluster, Anthos clusters on VMware creates a temporary node in that user cluster. The purpose of the temporary node is to ensure uninterrupted availability. Before you upgrade a cluster, make sure that your DHCP server can provide enough IP addresses for the temporary node. For more information, see IP addresses needed for admin and user clusters.

Static IPs

When you upgrade the admin cluster, Anthos clusters on VMware creates one temporary node in the admin cluster. When you upgrade a user cluster, Anthos clusters on VMware creates a temporary node in that user cluster. The purpose of the temporary node is to ensure uninterrupted availability. Before you upgrade a cluster, verify that you have reserved enough IP addresses. For each cluster, you must reserve at least one more IP address than the number of cluster nodes. For more information, see   Configuring static IP addresses.

Determine the number of nodes in your admin cluster:

kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] get nodes

where [ADMIN_CLUSTER_KUBECONFIG] is the path of your admin cluster's kubeconfig file.

Next, view the addresses reserved for your admin cluster:

kubectl get cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -o yaml

In the output, in the reservedAddresses field, you can see the number of IP addresses that are reserved for the admin cluster nodes. For example, the following output shows that there are five IP addresses reserved for the admin cluster nodes:

...
reservedAddresses:
- gateway: 21.0.135.254
  hostname: admin-node-1
  ip: 21.0.133.41
  netmask: 21
- gateway: 21.0.135.254
  hostname: admin-node-2
  ip: 21.0.133.50
  netmask: 21
- gateway: 21.0.135.254
  hostname: admin-node-3
  ip: 21.0.133.56
  netmask: 21
- gateway: 21.0.135.254
  hostname: admin-node-4
  ip: 21.0.133.47
  netmask: 21
- gateway: 21.0.135.254
  hostname: admin-node-5
  ip: 21.0.133.44
  netmask: 21

The number of reserved IP addresses should be at least one more than the number of nodes in the admin cluster. If this is not the case, you can reserve an additional address by editing the Cluster object.

Open the Cluster object for editing:

kubectl edit cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]

Under reservedAddresses, add an additional block that has gateway, hostname, ip, and netmask.

To determine the number of nodes in a user cluster:

kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes

where [USER_CLUSTER_KUBECONFIG] is the path of your user cluster's kubeconfig file.

To view the addresses reserved for a user cluster:

kubectl get cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
-n [USER_CLUSTER_NAME] [USER_CLUSTER_NAME] -o yaml

where:

  • [ADMIN_CLUSTER_KUBECONFIG] is the path of your admin cluster's kubeconfig file.

  • [USER_CLUSTER_NAME] is the name of the user cluster.

The number of reserved IP addresses should be at least one more than the number of nodes in the user cluster. If this is not the case, perform the following steps:

  • Open the user cluster's IP block file for editing.

  • Add an additional IP addresses to the block, and close the file.

  • Update the user cluster:

    gkectl update cluster --kubeconfig [ADMIN_CLUSTER_KUBECONIFG] \
      --config [USER_CLUSTER_CONFIG]
    

(Optional) Disabling new vSphere features

A new Anthos clusters on VMware version might include new features or support for specific VMware vSphere features. Sometimes, upgrading to a Anthos clusters on VMware version automatically enables such features. You learn about new features in Anthos clusters on VMware's Release notes. New features are sometimes surfaced in the Anthos clusters on VMware configuration file.

If you need to disable a new feature that is automatically enabled in a new Anthos clusters on VMware version and driven by the configuration file, perform the following steps before you upgrade your cluster:

  1. From your upgraded admin workstation, create a new configuration file with a different name from your current configuration file:

    gkectl create-config --config [CONFIG_NAME]
  2. Open the new configuration file and make a note of the feature's field. Close the file.

  3. Open your current configuration file and add the new feature's field. Set the value of the field to false or equivalent.

  4. Save the configuration file.

Review the Release notes before you upgrade your clusters. You cannot declaratively change an existing cluster's configuration after you upgrade it.

Upgrading your admin cluster

Do the steps in this section on your new admin workstation.

Recall that the target version of your upgrade must be the same as your gkeadm version.

Run the following command:

gkectl upgrade admin \
    --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
    --config [CONFIG_FILE] \
    [FLAGS]

where:

  • [ADMIN_CLUSTER_KUBECONFIG] is the path of your admin cluster's kubeconfig file.

  • [CONFIG_FILE] is the path of your admin cluster configuration file.

  • [FLAGS] is an optional set of flags. For example, you could include the --skip-validation-infra flag to skip checking of your vSphere infrastructure.

Upgrading a user cluster

Do the steps in this section on your new admin workstation.

Recall that the target version of your upgrade must be the same as your gkeadm version.

gkectl

gkectl upgrade cluster \
    --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
    --config [CONFIG_FILE] \
    --cluster-name [CLUSTER_NAME] \
    [FLAGS]

where:

  • [ADMIN_CLUSTER_KUBECONFIG] is the path of the admin cluster's kubeconfig file.

  • [CLUSTER_NAME] is the name of the user cluster you're upgrading.

  • [CONFIG_FILE] is the path of the user cluster configuration file.

  • [FLAGS] is an optional set of flags. For example, you could include the --skip-validation-infra flag to skip checking of your vSphere infrastructure.

Console

You can choose to register your user clusters with Cloud Console during installation or after you've created them. You can view and log in to your registered Anthos clusters and your GKE clusters in the Cloud Console.

When an upgrade becomes available for a use cluster, a notification appears in Cloud Console. Clicking this notification displays a list of available versions and a gkectl command you can run to upgrade the cluster:

  1. Visit the Google Kubernetes Engine page the in the Cloud Console.

    Visit the Google Kubernetes Engine page

  2. Under the Notifications column for the user cluster, click Upgrade available, if available.

  3. Copy the gkectl upgrade cluster command.

  4. On your admin workstation, run the gkectl upgrade cluster command, where [ADMIN_CLUSTER_KUBECONFIG] is the path of the admin cluster's kubeconfig file, [CLUSTER_NAME] is the name of the user cluster you're upgrading, and [CONFIG_FILE] is the path of the user cluster configuration file.

Resuming an upgrade

If a user cluster upgrade is interrupted, you can resume the upgrade by running the same upgrade command with the --skip-validation-all flag:

gkectl upgrade cluster \
    --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
    --config [CONFIG_FILE] \
    --cluster-name [CLUSTER_NAME] \
    --skip-validation-all

Resuming an admin cluster upgrade

You shouldn't interrupt an admin cluster upgrade. Admin cluster upgrades aren't always resumable. If an admin cluster upgrade is interrupted for any reason, you should contact support for assistance.

Creating new user cluster after an upgrade

After you upgrade your admin workstation and your admin cluster, any new user clusters that you create must have the same version as the upgrade target version.

VMware DRS rules

Anthos clusters on VMware automatically creates VMware Distributed Resource Scheduler (DRS) anti-affinity rules for your user cluster's nodes, causing them to be spread across at least three physical hosts in your datacenter. This feature is automatically enabled for new and existing clusters.

Before you upgrade, be sure that your vSphere environment meets the following conditions:

If your vSphere environment does not meet the preceding conditions, you can still upgrade, but for upgrading a user cluster from 1.3.x to 1.4.x, you need to disable anti-affinity groups. For more information, see the release notes for version 1.4.0.

Downtime

About downtime during upgrades

Resource Description
Admin cluster

When an admin cluster is down, user cluster control planes and workloads on user clusters continue to run, unless they were affected by a failure that caused the downtime.

User cluster control plane

Typically, you should expect no noticeable downtime to user cluster control planes. However, long-running connections to the Kubernetes API server might break and would need to be re-established. In those cases, the API caller should retry until it establishes a connection. In the worst case, there can be up to one minute of downtime during an upgrade.

User cluster nodes

If an upgrade requires a change to user cluster nodes, Anthos clusters on VMware recreates the nodes in a rolling fashion, and reschedules Pods running on these nodes. You can prevent impact to your workloads by configuring appropriate PodDisruptionBudgets and anti-affinity rules.

Known issues

See Known issues.

Troubleshooting

See Troubleshooting cluster creation and upgrade