Upgrading clusters

This page explains how to upgrade your admin and user clusters from one GKE On-Prem patch version to the next higher patch version. To learn about the available versions, see Versions.

See also:

Overview

GKE On-Prem supports sequential upgrading. For example, suppose these are the only versions that exist:

  • 1.0.10
  • 1.0.X, where X came after .10
  • 1.0.Y, where Y came after X

In this case, 1.0.Y is the latest version. To upgrade a version 1.0.10 cluster to 1.0.Y, you'd follow these steps:

  1. Upgrade the cluster from 1.0.10 to 1.0.X.
  2. Then, upgrade the cluster from 1.0.X to 1.0.Y.

Before you begin

  1. SSH in to your admin workstation:

    ssh -i ~/.ssh/vsphere_workstation ubuntu@[IP_ADDRESS]
    
  2. Authorize gcloud to access Google Cloud:

    gcloud auth login
  3. Activate your access service account:

    gcloud auth activate-service-account --project [PROJECT_ID] \
    --key-file [ACCESS_KEY_FILE]
    

    where:

    • [PROJECT_ID] is your project ID.
    • [ACCESS_KEY_FILE] is the path to the JSON key file for your access service account, such as /home/ubuntu/access.json.

Upgrading to version 1.0.2

In version 1.0.2-gke.3, the following required OIDC fields (usercluster.oidc) have been added. These fields enable logging in to a cluster from Google Cloud console:

  • usercluster.oidc.kubectlredirecturl
  • usercluster.oidc.clientsecret
  • usercluster.oidc.usehttpproxy

If you don't want to log in to a cluster from Google Cloud console, but you want to use OIDC, you can pass in placeholder values for these fields:

oidc:
  kubectlredirecturl: "redirect.invalid"
  clientsecret: "secret"
  usehttpproxy: "false"

Determining your cluster upgrade scenario

Before you upgrade your cluster, decide which of these scenarios is relevant to the version to which you're upgrading:

Scenario Action required Notes
Version has no security updates.
  1. Download the latest gkectl.
  2. Download the latest bundle.
  3. Follow the instructions on this page.
Version has security updates.
  1. Download the latest admin workstation OVA.
  2. Upgrade your admin workstation.
  3. Follow the instructions on this page.
You only need to upgrade your admin workstation if the new version has security updates. When you upgrade your admin workstation, it includes the latest gkectl and bundle.

Determining the platform version

To upgrade your clusters, you need to determine the platform version for your clusters:

From documentation

See Versions.

From bundle

Run the following command to extract the bundle to a temporary directory:

tar -xvzf /var/lib/gke/bundles/gke-onprem-vsphere-[VERSION].tgz -C [TEMP_DIR]

Look through the extracted YAML files to get a general sense of what's in the bundle.

In particular, open gke-onprem-vsphere-[VERSION]-images.yaml. Look at the osimages field. You can see the GKE platform version in the name of the OS image file. For example, in the following OS image, you can see that the GKE platform version is 1.12.7-gke.19.

osImages:
  admin: "gs://gke-on-prem-os-ubuntu-release/gke-on-prem-osimage-1.12.7-gke.19-20190516-905ef43658.ova"

Modifying the configuration file

On your admin workstation VM, edit your configuration file. Set the values of gkeplatformversion and bundlepath. For example:

gkeplatformversion: 1.12.7-gke.19
bundlepath: /var/lib/gke/bundles/gke-onprem-vsphere-1.0.10.tgz

Running gkectl prepare

Run the following command:

gkectl prepare --config [CONFIG_FILE]

The gkectl prepare command performs the following tasks:

  • If necessary, copy a new node OS image to your vSphere environment, and mark the OS image as a template.

  • Push updated Docker images, specified in the new bundle, to your private Docker registry, if you have configured one.

Upgrading your clusters

To upgrade a user cluster, your admin cluster must have a version at least as high as the target version of the user cluster upgrade. If your admin cluster version is not that high, upgrade your admin cluster before you upgrade your user cluster.

Admin cluster

Run the following command:

gkectl upgrade admin \
--kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
--config [CONFIG_FILE]

where [ADMIN_CLUSTER_KUBECONFIG] is the admin cluster's kubeconfig file, and [CONFIG_FILE] is the GKE On-Prem configuration file you're using to perform the upgrade.

User cluster

Run the following command:

gkectl upgrade cluster \
--kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
--config [CONFIG_FILE] \
--cluster-name [CLUSTER_NAME]

where [ADMIN_CLUSTER_KUBECONFIG] is the admin cluster's kubeconfig file, [CLUSTER_NAME] is the name of the user cluster you're upgrading, and [CONFIG_FILE] is the GKE On-Prem configuration file you're using to perform the upgrade.

About downtime during upgrades

Resource Description
Admin cluster

When an admin cluster is down, user cluster control planes and workloads on user clusters continue to run, unless they were affected by a failure that caused the downtime

User cluster control plane

Typically, you should expect no noticeable downtime to user cluster control planes. However, long-running connections to the Kubernetes API server might break and would need to be re-established. In those cases, the API caller should retry until it establishes a connection. In the worst case, there can be up to one minute of downtime during an upgrade.

User cluster nodes

If an upgrade requires a change to user cluster nodes, GKE On-Prem recreates the nodes in a rolling fashion, and reschedules Pods running on these nodes. You can prevent impact to your workloads by configuring appropriate PodDisruptionBudgets and anti-affinity rules.

Known issues

  • Currently, upgrading clusters can cause disruption or downtime for workloads that use PodDisruptionBudgets (PDBs).

Troubleshooting

For more information, refer to Troubleshooting.

New nodes created but not healthy

Symptoms

New nodes don't register themselves to the user cluster control plane when using manual load balancing mode.

Possible causes

In-node Ingress validation might be enabled that blocks the boot up process of the nodes.

Resolution

To disable the validation, run:

kubectl patch machinedeployment [MACHINE_DEPLOYMENT_NAME] -p '{"spec":{"template":{"spec":{"providerSpec":{"value":{"machineVariables":{"net_validation_ports": null}}}}}}}' --type=merge

Diagnosing cluster issues using gkectl

Use gkectl diagnosecommands to identify cluster issues and share cluster information with Google. See Diagnosing cluster issues.

Default logging behavior

For gkectl and gkeadm it is sufficient to use the default logging settings:

  • By default, log entries are saved as follows:

    • For gkectl, the default log file is /home/ubuntu/.config/gke-on-prem/logs/gkectl-$(date).log, and the file is symlinked with the logs/gkectl-$(date).log file in the local directory where you run gkectl.
    • For gkeadm, the default log file is logs/gkeadm-$(date).log in the local directory where you run gkeadm.
  • All log entries are saved in the log file, even if they are not printed in the terminal (when --alsologtostderr is false).
  • The -v5 verbosity level (default) covers all the log entries needed by the support team.
  • The log file also contains the command executed and the failure message.

We recommend that you send the log file to the support team when you need help.

Specifying a non-default location for the log file

To specify a non-default location for the gkectl log file, use the --log_file flag. The log file that you specify will not be symlinked with the local directory.

To specify a non-default location for the gkeadm log file, use the --log_file flag.

Locating Cluster API logs in the admin cluster

If a VM fails to start after the admin control plane has started, you can try debugging this by inspecting the Cluster API controllers' logs in the admin cluster:

  1. Find the name of the Cluster API controllers Pod in the kube-system namespace, where [ADMIN_CLUSTER_KUBECONFIG] is the path to the admin cluster's kubeconfig file:

    kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system get pods | grep clusterapi-controllers
  2. Open the Pod's logs, where [POD_NAME] is the name of the Pod. Optionally, use grep or a similar tool to search for errors:

    kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system logs [POD_NAME] vsphere-controller-manager