Best practices for Google Distributed Cloud cluster upgrades

This document describes best practices and considerations to upgrade Google Distributed Cloud. You learn how to prepare for cluster upgrades, and the best practices to follow before the upgrade. These best practices help to reduce the risks associated with cluster upgrades.

If you have multiple environments such as test, development, and production, we recommend that you start with the least critical environment, such as test, and verify the upgrade functionality. After you verify that the upgrade was successful, move on to the next environment. Repeat this process until you upgrade your production environments. This approach lets you move from one critical point to the next, and verify that the upgrade and your workloads all run correctly.

Upgrade checklist

To make the upgrade process as smooth as possible, review and complete the following checks before you start to upgrade your clusters:

Plan the upgrade

Updates can be disruptive. Before you start the upgrade, plan carefully to make sure that your environment and applications are ready and prepared. You might also need to schedule the upgrade after normal business hours when traffic is at its lightest.

Estimate the time commitment and plan a maintenance window

By default, all node pools are upgraded in parallel. But within each node pool, the nodes are upgraded sequentially because each node must be drained and recreated. So the total time for an upgrade depends on the number of nodes in the largest node pool. To calculate a rough estimate for the upgrade time, multiply 15 minutes times the number of nodes in the largest node pool. For example, if you have 10 nodes in the largest pool, the total upgrade time would be about 15 * 10 = 150 minutes or 2.5 hours.

These are several ways to reduce upgrade time and make it easier to plan and schedule upgrades:

  • In version 1.28 and later, you can accelerate an upgrade by setting the value of maxSurge for individual node pools. When you upgrade notes with maxSurge, multiple nodes upgrade in the same time that it takes to upgrade a single node.

  • If your clusters are at version 1.16 or higher, you can skip a minor version when upgrading node pools. Performing a skip-version upgrade halves the time that it would take to sequentially upgrade node pools two versions. Additionally, skip-version upgrades lets you increase the time between upgrades needed to stay on a supported version. Reducing the number of upgrades reduces workload disruptions and verification time. For more information, see Skip a version when upgrading node pools.

  • You can upgrade a user cluster's control plane separately from node pools. Having this flexibility can help you plan multiple, shorter maintenance windows instead of one long maintenance window to upgrade the entire cluster. For details, see Upgrade node pools.

Back up the user and admin cluster

Before you start an upgrade, back up your user and admin clusters.

A user cluster backup is a snapshot of the user cluster's etcd store. The etcd store contains all of the Kubernetes objects and custom objects required to manage cluster state. The snapshot contains the data required to recreate the cluster's components and workloads. For more information, see how to back up a user cluster.

With Google Distributed Cloud version 1.8 and later, you can set up automatic backup with clusterBackup.datastore in the admin cluster configuration file. To enable this feature in an existing cluster, edit the admin cluster configuration file and add the clusterBackup.datastore field, then run gkectl update admin.

After clusterBackup.datastore is enabled, your admin cluster is automatically backed up in etcd on the configured vSphere datastore. This backup process repeats every time there's a change to the admin cluster. When you start a cluster upgrade, a backup task runs before upgrading the cluster.

To restore an admin cluster from its backup if you have problems, see Back up and restore an admin cluster with gkectl.

Review the use of PodDisruptionBudgets

In Kubernetes, PodDisruptionBudgets (PDBs) can help prevent unwanted application downtime or outages. PDBs instruct the scheduler to always keep a number of Pods running while other Pods might be failing. This behavior is a useful way to provide for application availability.

  1. To check what PDBs are configured in your cluster, use the kubectl get pdb command:

    kubectl get pdb -A --kubeconfig KUBECONFIG
    

    Replace KUBECONFIG with the name of your kubeconfig file.

    The following example output shows PDBs named istio-ingress, istiod, and kube-dns:

    NAMESPACE     NAME            MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
    gke-system    istio-ingress   1               N/A               1                     16d
    gke-system    istiod          1               N/A               1                     16d
    kube-system   kube-dns        1               N/A               1                     16d
    

In the preceding table, each PDB specifies that at least one Pod must always be available. This availability becomes critical during upgrades when nodes are drained.

Check for PDBs that can't be fulfilled. For example, you might set a minimum availability of 1, when the Deployment only features 1 replica. In this example, the draining operation is disrupted because the PDB can't be satisfied by the resource controller.

To make sure that the PDBs don't interfere with the upgrade procedure, check all PDBs on a given cluster before you start the upgrade. You might need to coordinate with the development teams and application owners to temporarily change or disable PDBs during a cluster upgrade.

Google Distributed Cloud runs a preflight check during the upgrade process to warn about PDBs. However, you should also manually verify the PDBs to ensure a smooth upgrade experience. To learn more about PDBs, see Specifying a Disruption Budget for your Application.

Review the available IP addresses

The following IP address considerations apply during cluster upgrades:

  • The cluster upgrade process creates a new node and drains the resources before it deletes the old node. We recommend that you always have N+1 IP addresses for the admin or user cluster, where N is the number of nodes in the cluster.
  • When using static IP addresses, the required IP addresses must be listed in the IP block files.
  • If you use DHCP, make sure that new VMs can get additional IP leases in the desired subnet during an upgrade.
    • If you need to add IP addresses, update the IP block file, then run the gkectl update command. For more information, see Plan your IP addresses.
  • If you use static IP addresses and want to speed up the user cluster upgrade process, list enough IP addresses in your IP block file so that each node pool can have an extra IP address available. This approach lets the process speed up the VM addition and removal procedure as it's performed on a per node pool basis.
    • Although this approach is a good option to speed up user cluster upgrades, consider the resource and performance availability of your vSphere environment before you proceed.
  • If there is only one spare IP for the entire user cluster, this limitation slows the upgrade process to only one VM at a time, even when multiple node pools are used.

Check cluster utilization

Make sure that Pods can be evacuated when the node drains and that there are enough resources in the cluster being upgraded to manage the upgrade. To check the current resource usage of the cluster, you can use custom dashboards in Google Cloud Observability, or directly on the cluster using commands such as kubectl top nodes.

Commands you run against the cluster show you a snapshot of the current cluster resource usage. Dashboards can provide a more detailed view of resources being consumed over time. This resource usage data can help indicate when an upgrade would cause the least disruption, such as during weekends or evenings, depending on the running workload and use cases.

The timing for the admin cluster upgrade might be less critical than for the user clusters, because an admin cluster upgrade usually does not introduce application downtime. However, it's still important to check for free resources in vSphere before you begin an admin cluster upgrade. Also, upgrading the admin cluster might imply some risk, and therefore might be recommended during less active usage periods when management access to the cluster is less critical.

For more information, see what services are impacted during a cluster upgrade.

Check vSphere utilization

Check that there are enough resources on the underlying vSphere infrastructure. To check this resource usage, select a cluster in vCenter and review the Summary tab.

The summary tab shows the overall memory, CPU, and storage consumption of the entire cluster. Because Google Distributed Cloud upgrades demand additional resources, you should also check if the cluster can handle these additional resource requests.

As a general rule, your vSphere cluster must be able to support the following additional resources:

  • +1 VM per admin cluster upgrade
  • +1 VM per node pool per user cluster upgrade

For example, assume that a user cluster has 3 node pools where each node pool has nodes using 8 vCPUs and 32GB or more of RAM. Because the upgrade happens in parallel for the 3 node pools by default, the upgrade procedure consumes the following additional resources for the 3 additional surge nodes:

  • 24 vCPUs
  • 256GB of RAM
  • VM disk space + 256GB of vSwap

The upgrade process creates VMs using the vSphere clone operation. Cloning multiple VMs from a template can introduce stress to the underlying storage system in the form of rising I/O operations. The upgrade can be severely slowed down if the underlying storage subsystem is incapable of providing sufficient performance during an upgrade.

While vSphere is designed for simultaneous resource usage and has mechanisms to provide resources, even when overcommitted, we strongly recommend not overcommitting the VM memory. Memory overcommitment can lead to serious performance impacts that affect the entire cluster as vSphere provides the "missing RAM" from swapping pages out to the datastore. This behavior can lead to problems during an upgrade of a cluster, and cause performance impacts on other running VMs on the vSphere cluster.

If the available resources are already scarce, power down unneeded VMs to help satisfy these additional requirements and prevent a potential performance hit.

Check the cluster health and configuration

Run the following tools on all clusters before the upgrade:

  • The gkectl diagnose command: gkectl diagnose ensures all clusters are healthy. The command runs advanced checks, such as to identify nodes that aren't configured properly, or that have Pods that are in a stuck state. If the gkectl diagnose command shows a Cluster unhealthy warning, fix the issues before you attempt an upgrade. For more information, see Diagnose cluster issues.

  • The pre-upgrade tool: in addition to checking the cluster health and configuration, the pre-upgrade tool checks for potential known issues that could happen during a cluster upgrade.

Additionally, when you are upgrading user clusters to 1.29 and higher, we recommend that you run the gkectl upgrade cluster command with the --dry-run flag. The --dry-run flag runs preflight checks but doesn't start the upgrade process. Although earlier versions of Google Distributed Cloud run preflight checks, they can't be run separately from the upgrade. By adding the --dry-run flag, you can find and fix any issues that the preflight checks find with your user cluster before the upgrade.

Use Deployments to minimize application disruption

As nodes need to be drained during updates, cluster upgrades can lead to application disruptions. Draining the nodes means that all running Pods must be shut down and restarted on the remaining nodes in the cluster.

If possible, your applications should use Deployments. With this approach, applications are designed to handle interruptions. Any impact should be minimal to Deployments that have multiple replicas. You can still upgrade your cluster if applications don't use Deployments.

There are also rules for Deployments to make sure that a set number of replicas always keep running. These rules are known as PodDisruptionBudgets (PDBs). PDBs allow you to limit the disruption to a workload when its Pods must be rescheduled for some reason, such as upgrades or maintenance on the cluster nodes, and are important to check before an upgrade.

Use a high availability load balancer pair

If you use Seesaw as a load balancer on a cluster, the load balancers are upgraded automatically when you upgrade the cluster. This upgrade can cause a service disruption. To reduce the impact of an upgrade and an eventual load balancer failure, you can use a high-availability pair (HA pair). In this configuration, the system creates and configures two load balancer VMs so that a failover to the other peer can happen.

To increase service availability (that is, to the Kubernetes API server), we recommend that you always use an HA pair in front of the admin cluster. To learn more about Seesaw and its HA configuration, see the version 1.16 documentation Bundled load balancing with Seesaw.

To prevent a service disruption during an upgrade with an HA pair, the cluster initiates a failover before it creates the new load balancer VM. If a user cluster only uses a single load balancer instance, a service disruption occurs until the upgrade for the load balancer is complete.

We recommend that you have an HA load balancer pair if the user cluster itself is also configured to be highly available. This best practices series assumes that an HA user cluster uses an HA load balancer pair.

If you use MetalLB as a bundled load balancer, no pre-upgrade setup is required. The load balancer is upgraded during the cluster upgrade process.

Decide how to upgrade each user cluster

In version 1.14 and later, you can choose to upgrade a user cluster as a whole (meaning you can upgrade the control plane and all node pools in the cluster), or you can upgrade the user cluster's control plane and leave the node pools at the current version. For information on why you might want to upgrade the control plane separately, see User cluster upgrades.

In a multi-cluster environment, keep track of which user clusters have been upgraded and record their version number. If you decide to upgrade the control plane and node pools separately, record the version of the control plane and each node pool in each cluster.

Check user and admin cluster versions

gkectl

  • To check the version of user clusters:

    gkectl list clusters --kubeconfig ADMIN_CLUSTER_KUBECONFIG

    Replace ADMIN_CLUSTER_KUBECONFIG with the path of the kubeconfig file for your admin cluster.

  • To check the version of admin clusters:

    gkectl list admin --kubeconfig ADMIN_CLUSTER_KUBECONFIG

gcloud CLI

For clusters that are enrolled in the GKE On-Prem API, you can use the gcloud CLI to get the versions of user clusters, node pools on the user cluster, and admin clusters.

  1. Ensure that you have the latest version of the gcloud CLI. Update the gcloud CLI components, if needed:

    gcloud components update
    
  2. Run the following commands to check versions:

  • To check the version of user clusters:

    gcloud container vmware clusters list \
        --project=PROJECT_ID \
        --location=-

    Replace PROJECT_ID The project ID of your fleet host project.

    When you set --location=-, that means to list all clusters in all regions. If you need to scope down the list, set --location to the region you specified when you enrolled the cluster.

    The output of the command includes the cluster version.

  • To check the version of admin clusters:

    gcloud container vmware admin-clusters list \
        --project=PROJECT_ID \
        --location=-

Check the version of cluster nodes:

You can use kubectl for to get the version of cluster nodes, but kubectl returns the Kubernetes version. To get the corresponding Google Distributed Cloud version for a Kubernetes version, see Versioning.

kubectl get nodes --kubeconfig USER_CLUSTER_KUBECONFIG

Replace USER_CLUSTER_KUBECONFIG with the path of the kubeconfig file for your user cluster.

Check if CA certificates need to be rotated

During an upgrade, leaf certificates are rotated, but CA certificates aren't. You must manually rotate your CA certificates at least once every five years. For more information, see Rotate user cluster certificate authorities and Rotate admin cluster CA certificates.

Differences between cluster types

There are two different types of clusters:

  • User cluster
  • Admin cluster

Depending on how you create a user cluster, it might contains both worker nodes and control plane nodes (Controlplane V2) or only worker nodes (kubeception). With kubeception, the control plane for a user cluster runs on one or more nodes in an admin cluster. In both cases, in version 1.14 and later, you can upgrade a user cluster's control plane separately from the node pools that run your workloads.

Different effects of user cluster versus admin cluster upgrades

The Google Distributed Cloud upgrade procedure involves a node drain process that removes all Pods from a node. The process creates a new VM for each drained worker node and adds it to the cluster. The drained worker nodes are then removed from VMware's inventory. During this process, any workload that runs on these nodes is stopped and restarted on other available nodes in the cluster.

Depending on the chosen architecture of the workload, this procedure might have an impact on an application's availability. To avoid too much strain on the cluster's resource abilities, Google Distributed Cloud upgrades one node at a time.

User cluster disruption

The following table describes the impact of an in-place user cluster upgrade:

Function Admin cluster Non-HA user cluster HA user cluster
Kubernetes API access Not affected Not affected Not affected
User workloads Not affected Not affected Not affected
PodDisruptionBudgets* Not affected Not affected Not affected
Control-plane node Not affected Affected Not affected
Pod autoscaler (VMware) Not affected Not affected Not affected
Auto repair Not affected Not affected Not affected
Node autoscaling (VMware) Not affected Not affected Not affected
Horizontal Pod autoscaling Affected Affected Not affected
  • * : PDBs might cause the upgrade to fail or stop.
  • Affected: a service disruption during the upgrade is noticeable until the upgrade is finished.
  • Not affected: a service disruption might occur during a very short amount of time, but is almost unnoticeable.

The user cluster control plane nodes, whether they run on the admin cluster (kubeception) or the user cluster itself (Controlplane V2), don't run any user workloads. During an upgrade, these control plane nodes are drained and then updated accordingly.

In environments with high availability (HA) control planes, upgrading a user cluster's control plane doesn't disrupt user workloads. In a HA environment, upgrading an admin cluster doesn't disrupt user workloads. For user clusters using Controlplane V2, upgrading only the control plane doesn't disrupt user workloads.

During an upgrade in a non-HA control plane environment, the control plane can't control Pod-scaling, recovery, or deployment actions. During the short disruption of the control plane during the upgrade, user workloads can be affected if they are in a scaling, deployment or recovery state. This means that rollouts will fail during an upgrade in a non-HA environment.

To improve availability and reduce disruption of production user clusters during upgrades, we recommend that you use three control plane nodes (high availability mode).

Admin cluster disruption

The following table describes the impact of an in-place admin cluster upgrade:

Function Admin cluster Non-HA user cluster HA user cluster
Kubernetes API access Affected Affected Not affected
User workloads Not affected Not affected Not affected
Control-plane node Affected Affected Not affected
Pod Autoscaler Affected Affected Not affected
Auto Repair Affected Affected Not affected
Node autoscaling Affected Affected Not affected
Horizontal Pod autoscaling Affected Affected Not affected
  • Affected: a service disruption during the upgrade is noticeable until the upgrade is finished.
  • Not affected: a service disruption might occur during a very short amount of time, but is almost unnoticeable.

What's next