Version 1.8. This version is supported as outlined in the Anthos version support policy, offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware (GKE on-prem). Refer to the release notes for more details. This is the most recent version.

Upgrading Anthos clusters on VMware

This page explains how to upgrade Anthos clusters on VMware (GKE on-prem).

Target versions

Starting with Anthos clusters on VMware version 1.3.2, you can upgrade directly to any version that is in the same minor release or the next minor release. For example, you can upgrade from 1.3.2 to 1.3.5, or from 1.5.2 to 1.6.1.

If your current version is lower than 1.3.2, then you must do sequential upgrades to reach version 1.3.2 first. For example, to upgrade from 1.3.0 to 1.3.2, you must first upgrade from 1.3.0 to 1.3.1, and then from 1.3.1 to 1.3.2.

If you are upgrading from version 1.3.2 or later to a version that is not part of the next minor release, you must upgrade through one version of each minor release between your current version and your desired version. For example, if you are upgrading from version 1.3.2 to version 1.6.1, it is not possible to upgrade directly. You must first upgrade from version 1.3.2 to version 1.4.x, where x represents any patch release under that minor release. You can then upgrade to version 1.5.x, and finally to version 1.6.1.

From version 1.7, you can use any patch version within the two minor version range. The admin cluster can be the same minor version as user clusters, or one minor version lower than user clusters.

Overview of the current upgrade process

Starting from version 1.7, the default upgrade process has changed. You first upgrade the admin workstation, and then the user clusters, and lastly, the admin cluster. Also, starting from version 1.7, you do not have to upgrade the admin cluster immediately after upgrading the user clusters if you want to keep the admin cluster on its current version.

  1. Download the gkeadm tool. The version of gkeadm must be the same as the target version of your upgrade.
  2. Upgrade your admin workstation.
  3. From your admin workstation, upgrade your user clusters.
  4. From your admin workstation, upgrade your admin cluster.

Upgrade process for 1.6.x and earlier

For 1.6.x and earlier, the upgrade process is as follows. You can still follow this process order in version 1.7 or later by using parameter flags, but this process is deprecated and will not work in some future version. You can also follow this upgrade process if your current setup is for 1.5.x or lower, and you want to bring your setup to 1.6.x so you can proceed with a version 1.7 or later upgrade.

  1. Download the gkeadm tool. The version of gkeadm must be the same as the target version of your upgrade.
  2. Use gkeadm to upgrade your admin workstation.
  3. From your admin workstation, upgrade your admin cluster.
  4. From your admin workstation, upgrade your user clusters.

Suppose your admin workstation, admin cluster, and user clusters currently use version 1.7.x, and you want to upgrade both your admin cluster and your user clusters to version 1.8.x. If you follow an upgrade path like the following, with the use of a canary cluster for testing before you proceed further, then you minimize the risk of disruption.

The following is a high-level overview of a recommended upgrade process. Before you begin, create a canary user cluster that uses version 1.7.x, if you have not done so already.

  1. Test version 1.8.x in a canary cluster.
    • Upgrade the admin workstation to version 1.8.x.
    • Run gkectl prepare command, as described subsequently, to set up the upgrade.
      • Upgrade the canary user cluster to version 1.8.x.
  2. Upgrade all production user clusters to version 1.8.x when you are confident with version 1.8.x.
  3. Upgrade the admin cluster to version 1.8.x.

Locating your configuration and information files to prepare for upgrade

When you created your current admin workstation prior to version 1.8, you filled in an admin workstation configuration file that was generated by gkeadm create config. The default name for this file is admin-ws-config.yaml.

In addition, gkeadm created an information file for you. The default name of this file is the same as the name of your current admin workstation.

Locate your admin workstation configuration file and your information file. You need them to do the steps in this guide. If these files are in your current directory and they have their default names, then you won't need to specify them when you run the upgrade commands. If these files are in another directory, or if you have changed the filenames, then you specify them by using the --config and --info-file flags.

Upgrading your admin workstation

Make sure your gkectl and clusters are at the appropriate version level for an upgrade, and that you have downloaded the appropriate bundle.

Upgrading your admin workstation configuration

gkeadm upgrade admin-workstation --config [AW_CONFIG_FILE] --info-file [INFO_FILE]

where:

  • [AW_CONFIG_FILE] is the path of your admin workstation configuration file. You can omit this flag if the file is in your current directory and has the name admin-ws-config.yaml.

  • [INFO_FILE] is the path of your information file. You can omit this flag if the file is in your current directory. The default name of this file is the same as the name of your admin workstation.

The preceding command performs the following tasks:

  • Back up all files in the home directory of your current admin workstation. These include:

    • Your Anthos clusters on VMware configuration file. The default name of this file is config.yaml.

    • The kubeconfig files for your admin cluster and your user clusters.

    • The root certificate for your vCenter server. Note that this file must have owner read and owner write permission.

    • The JSON key file for your component access service account. Note that this file must have owner read and owner write permission.

    • The JSON key files for your connect-register, connect-agent, and logging-monitoring service accounts.

  • Create a new admin workstation, and copy all the backed-up files to the new admin workstation.

  • Delete the old admin workstation.

Verify that enough IP addresses are available

Do the steps in this section on your new admin workstation.

Before you upgrade, be sure that you have enough IP addresses available for your clusters. You can set aside additional IPs as needed, as described for each of DHCP and static IPs.

DHCP

When you upgrade the admin cluster, Anthos clusters on VMware creates one temporary node in   the admin cluster. When you upgrade a user cluster, Anthos clusters on VMware creates a temporary node in that user cluster.   The purpose of the temporary node is to ensure uninterrupted availability. Before you upgrade a cluster, make sure that your DHCP server can provide enough IP addresses for the temporary node. For more information, see   IP addresses needed for admin and user clusters.

Static IPs

When you upgrade the admin cluster, Anthos clusters on VMware creates one temporary node in   the admin cluster. When you upgrade a user cluster, Anthos clusters on VMware creates a temporary node in that user cluster.   The purpose of the temporary node is to ensure uninterrupted availability. Before you upgrade a cluster, verify that you have reserved enough IP addresses. For each cluster, you must reserve at   least one more IP address than the number of cluster nodes. For more information, see   Configuring static IP addresses.

Determine the number of nodes in your admin cluster:

kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] get nodes

where [ADMIN_CLUSTER_KUBECONFIG] is the path of your admin cluster's kubeconfig file.

Next, view the addresses reserved for your admin cluster:

kubectl get cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -o yaml

In the output, in the reservedAddresses field, you can see the number of IP addresses that are reserved for the admin cluster nodes. For example, the following output shows that there are five IP addresses reserved for the admin cluster nodes:

...
reservedAddresses:
- gateway: 21.0.135.254
  hostname: admin-node-1
  ip: 21.0.133.41
  netmask: 21
- gateway: 21.0.135.254
  hostname: admin-node-2
  ip: 21.0.133.50
  netmask: 21
- gateway: 21.0.135.254
  hostname: admin-node-3
  ip: 21.0.133.56
  netmask: 21
- gateway: 21.0.135.254
  hostname: admin-node-4
  ip: 21.0.133.47
  netmask: 21
- gateway: 21.0.135.254
  hostname: admin-node-5
  ip: 21.0.133.44
  netmask: 21

The number of reserved IP addresses should be at least one more than the number of nodes in the admin cluster.

For version 1.7, to add IP addresses to the admin cluster:

First, edit the IP block file, as shown in this example.

blocks:
- netmask: "255.255.252.0"
  ips:
  - ip: 172.16.20.10
    hostname: admin-host1
  - ip: 172.16.20.11
    hostname: admin-host2
  # Newly-added IPs.
  - ip: 172.16.20.12
    hostname: admin-host3

Next, run this command to update the configuration.

gkectl update admin --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] --config [ADMIN_CONFIG_FILE]
  • [ADMIN_CLUSTER_KUBECONFIG] is the path of your kubeconfig file.

  • [ADMIN_CONFIG_FILE] is the path of your admin config file. You can omit this flag if the file is in your current directory and has the name admin-config.yaml.

You cannot remove IP addresses, but only add them.

For versions prior to 1.7, you can add an additional address by editing the Cluster object directly.

Open the Cluster object for editing:

kubectl edit cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]

Under reservedAddresses, add an additional block that has gateway, hostname, ip, and netmask.

Important: Starting from 1.5.0, the same procedure does not work for user clusters and you must use gkectl update cluster for each of them.

To determine the number of nodes in a user cluster:

kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes

where [USER_CLUSTER_KUBECONFIG] is the path of your user cluster's kubeconfig file.

To view the addresses reserved for a user cluster:

kubectl get cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] 
-n [USER_CLUSTER_NAME] [USER_CLUSTER_NAME] -o yaml

where:

  • [ADMIN_CLUSTER_KUBECONFIG] is the path of your admin cluster's kubeconfig file.

  • [USER_CLUSTER_NAME] is the name of the user cluster.

    The number of reserved IP addresses should be at least one more than the number of nodes in the user cluster. If this is not the case, you can open the user cluster's IP block file for editing:

  • If any of the addresses reserved for a user cluster are included in the hostconfig file, add them to the corresponding block based on netmask and gateway.

  • Add as many additional static IP addresses to the corresponding block as required, and then run gkectl update cluster.

(Optional) Disabling new vSphere features

A new Anthos clusters on VMware version might include new features or support for specific VMware vSphere features. Sometimes, upgrading to a Anthos clusters on VMware version automatically enables such features. You learn about new features in Anthos clusters on VMware's Release notes. New features are sometimes surfaced in the Anthos clusters on VMware configuration file.

If you need to disable a new feature that is automatically enabled in a new Anthos clusters on VMware version and driven by the configuration file, perform the following steps before you upgrade your cluster:

  1. From your upgraded admin workstation, create a new configuration file with a different name from your current configuration file:

    gkectl create-config --config [CONFIG_NAME]
  2. Open the new configuration file and make a note of the feature's field. Close the file.

  3. Open your current configuration file and add the new feature's field. Set the value of the field to false or equivalent.

  4. Save the configuration file.

Review the Release notes before you upgrade your clusters. You cannot declaratively change an existing cluster's configuration after you upgrade it.

Install bundle for upgrade

To make a version available for cluster creation or upgrade, you must install the corresponding bundle. Follow these steps to install a bundle for TARGET_VERSION, which is the number of the version to which you want to upgrade.

To check the current gkectl and cluster versions, run this command. Use the flag --details/-d for more detailed information.

gkectl version --kubeconfig ADMIN_CLUSTER_KUBECONFIG --details

Here is example output:

gkectl version: 1.7.2-gke.2 (git-5b8ef94a3)onprem user cluster controller version: 1.6.2-gke.0
current admin cluster version: 1.6.2-gke.0
current user cluster versions (VERSION: CLUSTER_NAMES):
- 1.6.2-gke.0: user-cluster1
available admin cluster versions:
- 1.6.2-gke.0
available user cluster versions:
- 1.6.2-gke.0
- 1.7.2-gke.2
Info: The admin workstation and gkectl is NOT ready to upgrade to "1.8" yet, because there are "1.6" clusters.
Info: The admin cluster can't be upgraded to "1.7", because there are still "1.6" user clusters.

Based on the output you get, look for the following issues, and fix them as needed.

  • If the gkectl version is lower than 1.7, the new upgrade flow is not available directly. Follow the original upgrade flow to upgrade all your clusters to 1.6, and then upgrade your admin workstation to 1.7 to start using the new upgrade flow.

  • If the current admin cluster version is more than one minor version lower than the TARGET_VERSION, upgrade all your clusters to be one minor version lower than the TARGET_VERSION.

  • If the gkectl version is lower than the TARGET_VERSION, upgrade the admin workstation to the TARGET_VERSION, following the instructions.

When you have determined that your gkectl and cluster versions are appropriate for an upgrade, download the bundle.

Check whether the bundle tarball already exists on the admin workstation.

stat /var/lib/gke/bundles/gke-onprem-vsphere-TARGET_VERSION.tgz

If the bundle is not on the admin workstation, download it.

gsutil cp gs://gke-on-prem-release/gke-onprem-bundle/TARGET_VERSION/gke-onprem-vsphere-TARGET_VERSION.tgz /var/lib/gke/bundles/

Install the bundle.

gkectl prepare --bundle-path /var/lib/gke/bundles/gke-onprem-vsphere-TARGET_VERSION.tgz --kubeconfig ADMIN_CLUSTER_KUBECONFIG

where:

  • [ADMIN_CLUSTER_KUBECONFIG] is the path of your kubeconfig file. You can omit this flag if the file is in your current directory and has the name kubeconfig.

List available cluster versions, and make sure the target version is included in the available user cluster versions.

gkectl version --kubeconfig ADMIN_CLUSTER_KUBECONFIG --details

You can now create a user cluster at the target version, or upgrade a user cluster to the target version.

Upgrading a user cluster

Do the steps in this section on your admin workstation.

gkectl

gkectl upgrade cluster \
--kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
--config [USER_CLUSTER_CONFIG_FILE] \

[FLAGS]

where:

  • [ADMIN_CLUSTER_KUBECONFIG] is the admin cluster's kubeconfig file.

  • [USER_CLUSTER_CONFIG_FILE] is the Anthos clusters on VMware user cluster configuration file on your new admin workstation.

  • [FLAGS] is an optional set of flags. For example, you could include the --skip-validation-infra flag to skip checking of your vSphere infrastructure.

Console

You can choose to register your user clusters with Cloud Console during installation or after you've created them. You can view and log in to your registered Anthos clusters on VMware clusters and your Google Kubernetes Engine clusters from Cloud Console's GKE menu.

When an upgrade becomes available for Anthos clusters on VMware user clusters, a notification appears in Cloud Console. Clicking this notification displays a list of available versions and a gkectl command you can run to upgrade the cluster:

  1. Visit the GKE menu in Cloud Console.

    Visit the GKE menu

  2. Under the Notifications column for the user cluster, click Upgrade available, if available.

  3. Copy the gkectl upgrade cluster command.

  4. From your admin workstation, run the gkectl upgrade cluster command, where [ADMIN_CLUSTER_KUBECONFIG] is the admin cluster's kubeconfig file, and [USER_CLUSTER_CONFIG_FILE] is the Anthos clusters on VMware user cluster configuration file on your new admin workstation.

Resuming an upgrade

If a user cluster upgrade is interrupted, you can resume the user cluster upgrade by running the same upgrade command with the --skip-validation-all flag:

gkectl upgrade cluster \
--kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
--config [USER_CLUSTER_CONFIG_FILE] \
--skip-validation-all

Upgrading your admin cluster

Do the steps in this section on your new admin workstation. Make sure your gkectl and clusters are at the appropriate version level for an upgrade, and that you have downloaded the appropriate bundle.

The target version of your upgrade must not be higher than your gkectl version, and at most one minor version lower than your gkectl version. Thus, if your gkectl version is 1.7, the target version for your upgrade can be 1.6.x to 1.7. The admin cluster can only be upgraded to a minor version, when all user clusters have been upgraded to that minor version. For example, if you attempt to upgrade the admin cluster to version 1.7, when there are still 1.6.2 user clusters, you will get an error:

admin cluster can't be upgraded to 
"1.7.0-gke.0" yet, because there are still user clusters at "1.6.2-gke.0".

Run the following command:

gkectl upgrade admin \
--kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
--config [ADMIN_CLUSTER_CONFIG_FILE] \
[FLAGS]

where:

  • [ADMIN_CLUSTER_KUBECONFIG] is the admin cluster's kubeconfig file.

  • [ADMIN_CLUSTER_CONFIG_FILE] is the Anthos clusters on VMware admin cluster configuration file on your new admin workstation.

  • [FLAGS] is an optional set of flags. For example, you could include the --skip-validation-infra flag to skip checking of your vSphere infrastructure. Use the --force-upgrade-admin flag to revert to the old upgrade flow where the admin cluster is updated first, and then the user clusters.

If you downloaded a full bundle, and you have successfully run the gkectl prepare and gkectl upgrade admin commands, you should now delete the full bundle to save disk space on the admin workstation. Use this command:

rm /var/lib/gke/bundles/gke-onprem-vsphere-${TARGET_VERSION}-full.tgz

Resuming an admin cluster upgrade

You shouldn't interrupt an admin cluster upgrade. Currently, admin cluster upgrades aren't always resumable. If an admin cluster upgrade is interrupted for any reason, you should contact Google Support for assistance.

Troubleshooting the upgrade process

If you experience an issue when following the recommended upgrade process, follow these recommendations to resolve them. These suggestions assume that you have begun with a version 1.6.2 setup, and are proceeding through the recommended upgrade process.

Troubleshooting a user cluster upgrade issue

Suppose you find an issue with 1.7 when testing the canary cluster, or upgrading a user cluster. You determine from Google Support that the issue will be fixed in an upcoming patch release 1.7.x. You can proceed as follows:

  1. Continue using 1.6.2 for production;
  2. Test the 1.7.x patch release in a canary cluster when it is released. 1; Upgrade all production user clusters to 1.7.x when you are confident with it.
  3. Upgrade the admin cluster to 1.7.x.

Managing a 1.6.x patch release when testing 1.7

Suppose you are in the process of testing or migrating to 1.7, but not confident with it yet, and your admin cluster still uses 1.6.2. You find that a significant 1.6.x patch release has been released. You can still take advantage of this 1.6.x patch release while continuing to test 1.7. Follow this upgrade process:

  1. Install the 1.6.x-gke.0 bundle.
  2. Upgrade all 1.6.2 production user clusters to 1.6.x.
  3. Upgrade the admin cluster to 1.6.x.

Troubleshooting an admin cluster upgrade issue

If you encounter an issue when upgrading the admin cluster, you must contact Google Support to resolve the issue with the admin cluster.

In the meantime, with the new upgrade flow, you can still benefit from new user cluster features without being blocked by the admin cluster upgrade, which allows you to reduce the upgrade frequency of the admin cluster if you want. For example, you might want to use the Container-Optimized OS nodepool released in version 1.7. Your upgrade process can proceed as follows:

  1. Upgrade production user clusters to 1.7.
  2. Keep the admin cluster at 1.6 and continue receiving security patches;
  3. Test admin cluster upgrade from 1.6 to 1.7 in a test environment, and report issues if there are any;
  4. If your issue is solved by a 1.7 patch release, you can then choose to upgrade the production admin cluster from 1.6 to this 1.7 patch release if desired.

Known issues

The following known issues affect upgrading clusters.

Upgrading the admin workstation might fail if the data disk is nearly full

If you upgrade the admin workstation with the gkectl upgrade admin-workstation command, the upgrade might fail if the data disk is nearly full, because the system attempts to back up the current admin workstation locally while upgrading to a new admin workstation. If you cannot clear sufficient space on the data disk, use the gkectl upgrade admin-workstation command with the additional flag --backup-to-local=false to prevent making a local backup of the current admin workstation.

Version 1.7.0: Changes to Anthos Config Management updates

In versions earlier than 1.7.0, Anthos clusters on VMware included the images required to install and upgrade Anthos Config Management. Beginning with 1.7.0, the Anthos Config Management software is no longer included in the Anthos clusters on VMware bundle, and you need to add it separately. If you were previously using Anthos Config Management on your cluster or clusters, the software is not upgraded until you take action.

To learn more about installing Anthos Config Management, see Installing Anthos Config Management.

Version 1.1.0-gke.6, 1.2.0-gke.6: stackdriver.proxyconfigsecretname field removed

The stackdriver.proxyconfigsecretname field was removed in version 1.1.0-gke.6. Anthos clusters on VMware's preflight checks will return an error if the field is present in your configuration file.

To work around this, before you upgrade to 1.2.0-gke.6, delete the proxyconfigsecretname field from your configuration file.

Stackdriver references old version

Before version 1.2.0-gke.6, a known issue prevents Stackdriver from updating its configuration after cluster upgrades. Stackdriver still references an old version, which prevents Stackdriver from receiving the latest features of its telemetry pipeline. This issue can make it difficult for Google Support to troubleshoot clusters.

After you upgrade clusters to 1.2.0-gke.6, run the following command against admin and user clusters:

kubectl --kubeconfig=[KUBECONFIG] \
-n kube-system --type=json patch stackdrivers stackdriver \
-p '[{"op":"remove","path":"/spec/version"}]'

where [KUBECONFIG] is the path to the cluster's kubeconfig file.

Disruption for workloads with PodDisruptionBudgets

Currently, upgrading clusters can cause disruption or downtime for workloads that use PodDisruptionBudgets (PDBs).

Version 1.2.0-gke.6: Prometheus and Grafana disabled after upgrading

In user clusters, Prometheus and Grafana get automatically disabled during upgrade. However, the configuration and metrics data are not lost. In admin clusters, Prometheus and Grafana stay enabled.

For instructions, refer to the Anthos clusters on VMware release notes.

Version 1.1.2-gke.0: Deleted user cluster nodes aren't removed from vSAN datastore

For instructions, refer to the Anthos clusters on VMware release notes.

Version 1.1.1-gke.2: Data disk in vSAN datastore folder can be deleted

If you're using a vSAN datastore, you need to create a folder in which to save the VMDK. A known issue requires that you provide the folder's universally unique identifier (UUID) path, rather than its file path, to vcenter.datadisk. This mismatch can cause upgrades to fail.

For instructions, refer to the Anthos clusters on VMware release notes.

Upgrading to version 1.1.0-gke.6 from version 1.0.2-gke.3: OIDC issue

Version 1.0.11, 1.0.1-gke.5, and 1.0.2-gke.3 clusters that have OpenID Connect (OIDC) configured cannot be upgraded to version 1.1.0-gke.6. This issue is fixed in version 1.1.1-gke.2.

If you configured a version 1.0.11, 1.0.1-gke.5, or 1.0.2-gke.3 cluster with OIDC during installation, you are not able to upgrade it. Instead, you should create new clusters.

Upgrading to version 1.0.2-gke.3 from version 1.0.11

Version 1.0.2-gke.3 introduces the following OIDC fields (usercluster.oidc). These fields enable logging in to a cluster from Cloud Console:

  • usercluster.oidc.kubectlredirecturl
  • usercluster.oidc.clientsecret
  • usercluster.oidc.usehttpproxy

If you want to use OIDC, the clientsecret field is required even if you don't want to log in to a cluster from Cloud Console. To use OIDC, you might need to provide a placeholder value for clientsecret:

oidc:
  clientsecret: "secret"

Nodes fail to complete their upgrade process

If you have Anthos Service Mesh or OSS Istio installed on your cluster, depending on your PodDisruptionBudget settings for the Istio components, user nodes might fail to upgrade to the control plane version after repeated attempts. To prevent this failure, we recommend that you increase the Horizontal Pod Autoscaling minReplicas setting from 1 to 2 for the components in the istio-system namespace before you upgrade. This will ensure that you always have an instance of the ASM control plane running.

If you have Anthos Service Mesh 1.5+ or OSS Istio 1.5+:

kubectl patch hpa -n istio-system istio-ingressgateway -p '{"spec":{"minReplicas": 2}}' --type=merge
kubectl patch hpa -n istio-system istiod -p '{"spec":{"minReplicas": 2}}' --type=merge

If you have Anthos Service Mesh 1.4.x or OSS Istio 1.4.x:

kubectl patch hpa -n istio-system istio-galley -p '{"spec":{"minReplicas": 2}}' --type=merge
kubectl patch hpa -n istio-system istio-ingressgateway -p '{"spec":{"minReplicas": 2}}' --type=merge
kubectl patch hpa -n istio-system istio-nodeagent -p '{"spec":{"minReplicas": 2}}' --type=merge
kubectl patch hpa -n istio-system istio-pilot -p '{"spec":{"minReplicas": 2}}' --type=merge
kubectl patch hpa -n istio-system istio-sidecar-injector -p '{"spec":{"minReplicas": 2}}' --type=merge

Appendix

About VMware DRS rules enabled in version 1.1.0-gke.6

As of version 1.1.0-gke.6, Anthos clusters on VMware automatically creates VMware Distributed Resource Scheduler (DRS) anti-affinity rules for your user cluster's nodes, causing them to be spread across at least three physical hosts in your datacenter. As of version 1.1.0-gke.6, this feature is automatically enabled for new clusters and existing clusters.

Before you upgrade, be sure that your vSphere environment meets the following conditions:

  • VMware DRS is enabled. VMware DRS requires vSphere Enterprise Plus license edition. To learn how to enable DRS, see Enabling VMware DRS in a cluster

  • The vSphere username provided in your credentials confituration file has the Host.Inventory.EditCluster permission.

  • There are at least three physical hosts available.

If your vSphere environment does not meet the preceding conditions, you can still upgrade, but for upgrading a user cluster from 1.3.x to 1.4.x, you need to disable anti-affinity groups. For more information, see this known issue in the Anthos clusters on VMware release notes.

Down time

About downtime during upgrades

Resource Description
Admin cluster

When an admin cluster is down, user cluster control planes and workloads on user clusters continue to run, unless they were affected by a failure that caused the downtime.

User cluster control plane

Typically, you should expect no noticeable downtime to user cluster control planes. However, long-running connections to the Kubernetes API server might break and would need to be re-established. In those cases, the API caller should retry until it establishes a connection. In the worst case, there can be up to one minute of downtime during an upgrade.

User cluster nodes

If an upgrade requires a change to user cluster nodes, Anthos clusters on VMware recreates the nodes in a rolling fashion, and reschedules Pods running on these nodes. You can prevent impact to your workloads by configuring appropriate PodDisruptionBudgets and anti-affinity rules.

Known issues

See Known issues.

Troubleshooting

See Troubleshooting cluster creation and upgrade