GDCV for VMware release notes

This page documents production updates to GKE on VMware, which is a component of Google Distributed Cloud Virtual for VMware. You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated features.

See also:

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly: https://cloud.google.com/feeds/gkeonprem-release-notes.xml

February 29, 2024

GKE on VMware 1.16.6-gke.40 is now available. To upgrade, see Upgrading GKE on VMware. GKE on VMware 1.16.6-gke.40 runs on Kubernetes v1.27.8-gke.1500.

If you are using a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issues are fixed in1.16.6-gke.40:

  • Fixed the known issue that caused kubelet to be flooded with logs stating that /etc/kubernetes/manifests does not exist on the worker nodes.
  • Fixed the known issue that caused a preflight check to fail when the hostname isn't in the IP block file.
  • Fixed the manual load balancer issue where the IngressIP is overwritten with the Spec.LoadBalancerIP even if it is empty.
  • Fixed the known issue where a 1.15 user master machine encountered an unexpected recreation when the user cluster controller was upgraded to 1.16.

The following vulnerabilities are fixed in1.16.6-gke.40:

February 27, 2024

The following vulnerabilities were discovered in the Linux kernel that can lead to a privilege escalation on Container-Optimized OS and Ubuntu nodes:

GKE on VMware 1.15.9-gke.20 is now available. To upgrade, see Upgrading GKE on VMware. GKE on VMware 1.15.9-gke.20 runs on Kubernetes v1.26.10-gke.2000.

If you are using a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

February 26, 2024

GKE on VMware 1.28.200-gke.111 is now available. To upgrade, see Upgrading Anthos clusters on VMware. GKE on VMware 1.28.200-gke.111 runs on Kubernetes v1.28.4-gke.1400.

If you are using a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issues are fixed in 1.28.200-gke.111:

  • Fixed the known issue that caused a preflight check to fail when the hostname isn't in the IP block file.
  • Fixed the known issue where the storage policy field is missing in the admin cluster configuration template.
  • Fixed the manual load balancer issue where the IngressIP is overwritten with the Spec.LoadBalancerIP even if it is empty.
  • Fixed the issue that preflight jobs might be stuck in the pending state.
  • Fixed the known issue where nfs-common is missing from the Ubuntu OS image.

The following vulnerabilities are fixed in 1.28.200-gke.111:

February 24, 2024

The following vulnerabilities were discovered in the Linux kernel that can lead to a privilege escalation on Container-Optimized OS and Ubuntu nodes:

  • CVE-2024-0193

For more information, see the GCP-2024-013 security bulletin.

February 16, 2024

The following vulnerability was discovered in the Linux kernel that can lead to a privilege escalation on Container-Optimized OS and Ubuntu nodes:

  • CVE-2023-6932

For more information, see the GCP-2024-011 security bulletin.

February 14, 2024

The following vulnerability was discovered in the Linux kernel that can lead to a privilege escalation on Container-Optimized OS and Ubuntu nodes:

  • CVE-2023-6931

For more information, see the GCP-2024-010 security bulletin.

February 01, 2024

GKE on VMware 1.15.8-gke.41 is now available. To upgrade, see Upgrading Anthos clusters on VMware. GKE on VMware 1.15.8-gke.41 runs on Kubernetes v1.26.10-gke.2000.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

Upgraded etcd to v3.4.27-0-gke.1.

The following issues are fixed in 1.15.8-gke.41:

  • Fixed Seesaw crashing on duplicated service IP.
  • Fixed a warning in the storage preflight check.

The following vulnerabilities are fixed in 1.15.8-gke.41:

January 31, 2024

A security vulnerability, CVE-2024-21626, has been discovered in runc where a user with permission to create Pods on Container-Optimized OS and Ubuntu nodes might be able to gain full access to the node filesystem.

For instructions and more details, see the GCP-2024-005 security bulletin.

January 25, 2024

GKE for VMware 1.28.100-gke.131 is now available. To upgrade, see Upgrading GKE on VMware. GDCV for VMware 1.28.100-gke.131 runs on Kubernetes v1.28.3-gke.1600.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issues are fixed in 1.28.100-gke.131:

  • Fixed an issue where duplicate Service IP addresses caused the Seesaw load balancer to fail.

  • Fixed an issue where egress NAT erroneously broke long-lived connections.

The following vulnerabilities are fixed in 1.28.100-gke.131:

GKE for VMware 1.16.5-gke.28 is now available. To upgrade, see Upgrading GKE on VMware. GDCV for VMware 1.16.5-gke.28 runs on Kubernetes 1.27.6-gke.2500.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issues are fixed in 1.16.5-gke.28:

  • Fixed an issue where duplicate Service IP addresses caused the Seesaw load balancer to fail.

The following vulnerabilities are fixed in 1.16.5-gke.28:

There is an issue that affects upgrading from 1.16.x to 1.28.100. If the 1.16.x cluster relies on an NFS volume, the upgrade will fail. Clusters that don't use an NFS volume are not affected.

December 18, 2023

GKE on VMware, formerly Anthos clusters on VMware, is a component of Google Distributed Cloud Virtual, software that brings Google Kubernetes Engine (GKE) to on-premises data centers. We are in the process of updating documentation and the Google Cloud Console UI with the new name.

GKE on VMware 1.28.0-gke.651 is now available. GKE on VMware 1.28.0-gke.651 runs on Kubernetes v1.28.3-gke.700. To upgrade, see Upgrading GKE on VMware clusters.

For easier identification of the Kubernetes version for a given release, we are
aligning GKE on VMware version numbering with GKE version numbering. This change starts with the December 2023 minor release, which is version 1.28. Additionally, GKE on VMware patch versions (z in the semantic version numbering scheme x.y.z-gke.N) will increment by 100.

Example version numbers for GKE on VMware:

  • Minor release: 1.28.0-gke.651
  • First patch release (example): 1.28.100-gke.27
  • Second patch release (example): 1.28.200-gke.19

This change affects numbering only. Upgrades from 1.16 to 1.28 follow the same process as upgrades between prior minor releases.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

New features in GKE on VMware 1.28.0-gke.651:

Breaking change in GKE on VMware 1.28.0-gke.651:

Cloud Monitoring now requires projects to enable the kubernetesmetadata.googleapis.com API and grant the kubernetesmetadata.publisher IAM role to the logging-monitoring service account. This applies to both creating new 1.28 clusters and upgrading existing clusters to 1.28. If your organization has set up an allowlist that lets traffic from Google APIs and other addresses pass through your proxy server, add [kubernetesmetadata.googleapis.com](kubernetesmetadata.googleapis.com) to the allowlist.

Version changes in GKE on VMware 1.28.0-gke.651:

  • Bumped etcd to version v3.4.27-0-gke.1.
  • Bumped istio-ingress to version 1.19.3.
  • Bumped the AIS version to hybrid_identity_charon_20230830_0730_RC00.

Other changes in GKE on VMware 1.28.0-gke.651:

  • HA admin clusters now have a long running controller to perform reconciliation periodically.
  • The command gkectl repair admin-master --restore-from-backup now supports restoration of etcd data for HA admin clusters.
  • When upgrading user clusters to version 1.28, we validate all changes made in the configuration file, and return an error for unsupported changes. See Remove unsupported changes to unblock upgrade.
  • The vSphere cloud controller manager is enabled in Controlplane V2 user clusters.
  • We now always write the local k8s audit log file, even when Cloud audit logging is enabled. This allows for easier third party logging system integration.
  • MetalLB will be the default load balancer for 1.29 user and admin clusters. The ability to use Seesaw as a load balancer will be removed with 1.29. We recommend migrating to the MetalLB load-balancer. Upgrades from existing Seesaw clusters will continue to work for a few more releases.
  • The loadBalancer.manualLB.addonsNodePort field is deprecated. The field was used for the in-cluster Prometheus and Grafana add-ons, which was deprecated in version 1.16.
  • The loadBalancer.vips.addonsVIP field is deprecated. The field was used for the in-cluster Prometheus and Grafana add-ons, which was deprecated in version 1.16.
  • yq is no longer pre-installed on the admin workstation.
  • Control-plane nodes now have the node-role.kubernetes.io/control-plane taint.
  • In-tree GlusterFS is removed from Kuberentes 1.27. Add storage validation to detect in-tree GlusterFS volumes.
  • Metrics data are now gzip compressed when they are sent to Cloud monitoring.

The following issues are fixed in 1.28.0-gke.651:

  • Fixed an issue where disable_bundled_ingress failed user cluster load balancer validation.
  • Fixed an issue where the cluster-health-controller sometimes leaked vSphere sessions.
  • Fixed an etcd hostname mismatch issue when using FQDN.
  • Fixed a known issue where admin cluster update or upgrade failed if the projects or locations of add-on services didn't match each other.
  • Fixed a known issue where the CSI workload preflight check failed due to a Pod startup failure.
  • Fixed an issue where deleting a user cluster with a volume attached might get stuck.
  • Fixed a known issue where deleting a Controlplane V2 user cluster might get stuck.
  • Fixed a logrotate error on the ubuntu_containerd image.
  • Fixed a disk full issue on Seesaw VMs due to no log rotation for fluent-bit.
  • Fixed a known issue where Seesaw didn't set the target IP in GARP replies.
  • Fixed a flaky SSH error on non-HA admin control-plane nodes after update/upgrade.

The following vulnerabilities are fixed in 1.28.0-gke.651:

There is an issue that affects upgrading from 1.16.x to 1.28.0. If the 1.16.x cluster relies on an NFS volume, the upgrade will fail. Clusters that don't use an NFS volume are not affected.

Anthos clusters on VMware 1.16.4-gke.37 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.16.4-gke.37 runs on Kubernetes 1.27.6-gke.2500.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issues are fixed in 1.16.4-gke.37:

  • Fixed a warning in the storage preflight check.
  • Fixed an issue where control plane creation failed for a user cluster when using a FQDN hostname for a HA admin cluster.
  • Fixed an issue where the cluster-health-controller might leak vSphere sessions.
  • Fixed an issue where disable_bundled_ingress failed user cluster load balancer validation.

The following vulnerabilities are fixed in 1.16.4-gke.37:

December 12, 2023

Anthos clusters on VMware 1.15.7-gke.40 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 11.15.7-gke.40 runs on Kubernetes 1.26.9-gke.700.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issues are fixed in 1.15.7-gke.40:

  • Fixed the etcd hostname mismatch issue when using a FQDN.
  • Fixed an issue where the cluster-health-controller might leak vSphere sessions.
    Fixed the known issue where the CSI workload preflight check fails due to Pod startup failure.

The following vulnerabilities are fixed in 1.15.7-gke.40:

December 04, 2023

The StatefulSet CSI Migration Tool is now available. To learn how to migrate stateful workloads from an in-tree vSphere volume plugin to the vSphere CSI Driver, see Using the StatefulSet CSI Migration Tool.

November 22, 2023

A vulnerability (CVE-2023-5717) has been discovered in the Linux kernel that can lead to a privilege escalation on Container-Optimized OS and Ubuntu nodes.

For more information, see the GCP-2023-046 security bulletin.

November 20, 2023

Anthos clusters on VMware 1.14.10-gke.35 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.8-gke.37 runs on Kubernetes v1.25.13-gke.200.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issues are fixed in 1.14.10-gke.35:

  • Fixed the etcd hostname mismatch issue when using FQDN
  • Fixed the issue where deleting a user cluster with a volume attached stalls, in which case the cluster can't be deleted and can't be used.

The following vulnerabilities are fixed in 1.14.10-gke.35:

November 16, 2023

Anthos clusters on VMware 1.16.3-gke.45 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.16.1-gke.44 runs on Kubernetes 1.27.4-gke.1600.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The Prometheus and Grafana add-ons field, loadBalancer.vips.addonsVIP, is deprecated. This change is because Google Managed Service for Prometheus replaced the Prometheus and Grafana add-ons.

The following issues are fixed in 1.16.3-gke.45:

  • Fixed a Cilium issue causing egress NAT to erroneously break long-lived connections.
  • Fixed the etcd hostname mismatch issue when using a FQDN.
  • Fixed the known issue that caused admin cluster updates or upgrades to fail if the projects or locations of add-on services don't match each other.
  • Fixed the issue that external cluster snapshot won't be taken after gkectl update admin fails.
  • Fixed an issue that caused the CSI workload preflight to fail when Istio is enabled.
  • Fixed the issue that deleting a user cluster with a volume attached may be stuck forever.
  • Fixed the known issue that caused user cluster deletion to fail when using a user-managed admin workstation.

The following vulnerabilities are fixed in 1.16.3-gke.45:

November 13, 2023

The following vulnerabilities were discovered in the Linux kernel that can lead to a privilege escalation on Container-Optimized OS and Ubuntu nodes.

  • CVE-2023-4147

For more information, see the GCP-2023-042 security bulletin.

November 08, 2023

A vulnerability (CVE-2023-4004) has been discovered in the Linux kernel that can lead to a privilege escalation on Container-Optimized OS and Ubuntu nodes. For more information, see the GCP-2023-041 security bulletin.

October 31, 2023

Anthos clusters on VMware 1.15.6-gke.25 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.15.6-gke.25 runs on Kubernetes 1.26.9-gke.700.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following vulnerabilities are fixed in 1.15.6-gke.25:

October 19, 2023

Anthos clusters on VMware 1.16.2-gke.28 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.16.2-gke.28 runs on Kubernetes 1.27.4-gke.1600.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issue is fixed in 1.16.2-gke.28:

  • Fixed the known issue where a non-HA Controlplane V2 cluster is stuck at node deletion until it timesout.

The following vulnerabilities are fixed in 1.16.2-gke.28:

Anthos clusters on VMware 1.14.9-gke.21 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.9-gke.21 runs on Kubernetes 1.25.13-gke.200.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issues are fixed in 1.14.9-gke.21:

  • Fixed the known issue where a non-HA Controlplane V2 cluster is stuck at node deletion until it timesout.

The following vulnerabilities are fixed in 1.14.9-gke.21:

October 12, 2023

Anthos clusters on VMware 1.15.5-gke.41 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.15.5-gke.41 runs on Kubernetes 1.26.7-gke.2500.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issues are fixed in 1.15.5-gke.41:

  • Fixed the issue that server-side preflight checks fail to validate container registry access on clusters with a private network and no private registry.
  • Fixed the known issue where a non-HA Controlplane V2 cluster is stuck at node deletion until it timesout.
  • Fixed the known issue where upgrading or updating an admin cluster with a CA version greater than 1 fails.
  • Fixed the issue where the Controlplane V1 stackdriver operator has --is-kubeception-less=true specified by mistake.
  • Fixed the known issue that causes the secrets encryption key to be regenerated when upgrading the admin cluster from 1.14 to 1.15, resulting in the upgrade being blocked.

The following vulnerabilities are fixed in 1.15.5-gke.41:

October 02, 2023

Upgrading an admin cluster with always-on secrets encryption enabled might fail.

An admin cluster upgrade from 1.14.x to 1.15.0 - 1.15.4 with always-on secrets encryption enabled might fail depending on whether the feature was enabled during cluster creation or during cluster update.

We recommend that you don't upgrade your admin cluster until a fix is available in 1.15.5. If you must upgrade to 1.15.0-1.15.4, do the steps in Preventing the upgrade failure before upgrading the cluster.

For information on working around an admin cluster failure because of this issue, see Upgrading an admin cluster with always-on secrets encryption enabled fails. Note that the workaround relies on you having the old encryption key backed up. If the old key is no longer available, you will have to recreate the admin cluster and all user clusters.

September 29, 2023

Anthos clusters on VMware 1.16.1-gke.45 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.16.1-gke.44 runs on Kubernetes 1.27.4-gke.1600.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The Prometheus and Grafana add-ons field, loadBalancer.vips.addonsVIP is deprecated in 1.16 and later. This change is because Google Managed Service for Prometheus replaced the Prometheus and Grafana add-ons in 1.16.

The following issues are fixed in 1.16.1-gke.45:

  • Fixed the known issue that gkectl repair admin-master returns kubeconfig unmarshall error.
  • Fixed the known issue that GARP reply sent by Seesaw doesn't set target IP
  • Fixed the known issue that Seesaw VM may be broken due to low disk space
  • Fixed the known issue that false warnings might be generated against persistent volume claims.
  • Fixed the known issue that caused CNS attachvolume tasks to appear every minute for in-tree PVC/PV after upgrading to Anthos 1.15+.

The following vulnerabilities are fixed in 1.16.1-gke.44:

Anthos clusters on VMware 1.14.8-gke.37 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.8-gke.37 runs on Kubernetes 1.25.12-gke.2400.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

The following issues are fixed in 1.14.8-gke.37:

  • Fixed the disk full known issue on Seesaw VM due to no log rotation for fluent-bit.

The following vulnerabilities are fixed in 1.14.8-gke.37:

September 14, 2023

A standalone tool that you run before upgrading an admin or user cluster is now available. The pre-upgrade tool is supported for Anthos clusters on VMware version 1.9 through 1.13. The tool runs the applicable preflight checks for the version that you are upgrading to and also checks for specific known issues. Before upgrading a 1.9 - 1.13 cluster, we recommend that you run the pre-upgrade tool.

For details on running the tool, see the documentation for the version that you are upgrading to:

September 01, 2023

Anthos clusters on VMware 1.15.4-gke.37 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.15.4-gke.37 runs on Kubernetes 1.26.7-gke.2500.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

Upgrading an admin cluster with always-on secrets encryption enabled might fail.

An admin cluster upgrade from 1.14.x to 1.15.0 - 1.15.4 with always-on secrets encryption enabled might fail depending on whether the feature was enabled during cluster creation or during cluster update.

We recommend that you don't upgrade your admin cluster until a fix is available in 1.15.5. If you must upgrade to 1.15.0-1.15.4, do the steps in Preventing the upgrade failure before upgrading the cluster.

For information on working around an admin cluster failure because of this issue, see Upgrading an admin cluster with always-on secrets encryption enabled fails. Note that the workaround relies on you having the old encryption key backed up. If the old key is no longer available, you will have to recreate the admin cluster and all user clusters.

The following issues are fixed in 1.15.4-gke.37:

  • Fixed a known issue where incorrect log rotation configuration for fluent-bit caused low disk space on the Seesaw VM.

  • Fixed a known issue that GARP reply sent by Seesaw doesn't set target IP.

  • Fixed an issue where /etc/vsphere/certificate/ca.crt wasn't updated after vsphere CA rotation on the Controlplane v2 user cluster control plane machines.

  • Fixed a known issue where the admin SSH public key has error after admin cluster upgrade or update.

The following vulnerabilities are fixed in 1.15.4-gke.37:

August 23, 2023

Anthos clusters on VMware 1.16.0-gke.669 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.16.0-gke.669 runs on Kubernetes 1.27.4-gke.1600.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

Version changes:

  • Upgraded VMware vSphere Container Storage Plug-in from 3.0 to 3.0.2.
  • The crictl command-line tool was updated to 1.27.
  • The containerd config was updated to version 2.

Other changes:

  • The output of the gkectl diagnose cluster command has been updated to provide a summary that customers can copy and paste when opening support cases.
  • In-tree GlusterFS is removed from Kuberentes 1.27. Add storage validation to detect in-tree glusterFS volumes.

  • Metrics data are now gzip compressed when sending to Cloud Monitoring.

  • The stackdriver-log-forwarder (fluent-bit) now sends logs to Cloud Logging with gzip compression to reduce egress bandwidth needed.

  • Prometheus and Grafana are no longer bundled for in-cluster monitoring and they are replaced with Google Cloud Managed Service for Prometheus.

  • The following flags in the stackdriver custom resource are deprecated and changes to their values aren't honored:

    • scalableMonitoring
    • enableStackdriverForApplications (replaced by enableGMPForApplications and enableCloudLoggingForApplications)
    • enableCustomMetricsAdapter
  • Deploying the vSphere cloud controller manager in both admin and user clusters, and enabling it for admin and kubeception user clusters is now supported.

  • The audit-proxy now sends audit logs to Cloud Audit Logging with gzip compressed to reduce egress bandwidth needed.

  • Removed accounts.google.com from the internet preflight check requirement.

  • The pre-defined dashboards are automatically present based on the presence of metrics.

  • Enabled auto repair on ReadonlyFilesystem node condition

  • Support the d character when using --log-since flag to take cluster snapshot. For example: gkectl diagnose snapshot --log-since=1d

  • A new CSI Workload preflight check was added to verify that workloads using vSphere PVs can work through CSI.

  • Preflight check failures for gkectl prepare now block install and upgrade operations.

  • The kubelet readonly port is now disabled by default for security enhancement. See Enable kubelet readonly port for instructions if you need to re-enable it for legacy reasons.

  • AIS Pods are now scheduled to run on control plane nodes instead of worker nodes.

The following issues are fixed in 1.16.0-gke.669:

  • Fixed the known issue that caused intermittent ssh errors on non-HA admin master after update or upgrade.
  • Fixed the known issue where upgrading enrolled admin cluster could fail due to membership update failure.
  • Fixed the issue where the CPv1 stackdriver operator had --is-kubeception-less=true specified by mistake.

  • Fixed the issue where clusters used the non-high-availability (HA) Connect Agent after an upgrade to 1.15.

  • Fixed the known issue of Cloud Audit Logging failure due to permission denied.

  • Fixed a known issue where the update operation cannot be fulfilled due to KSA signing key version unmatched.

  • Fixed a known issue where $ in the private registry username caused admin control plane machine startup failure.

  • Fixed a known issue where gkectl diagnose snapshot failed to limit the time window for journalctlcommands running on the cluster nodes when you take a cluster snapshot with the --log-since flag.

  • Fixed a known issue where node ID verification failed to handle hostnames with dots.

  • Fixed continuous increase of logging agent memory.

  • Fixed the issue that caused gcloud to fail to update the platform when the required-platform-version is already the current platform version.

  • Fixed an issue where cluster-api-controllers in a high-availability admin cluster had no Pod anti-affinity. This could allow the three clusterapi-controllers Pods not to be scheduled on different control-plane nodes.

  • Fixed the wrong admin cluster resource link annotation key that can cause the cluster to be enrolled again by mistake.

  • Fixed a known issue where node pool creation failed because of duplicated VM-Host affinity rules.

  • The preflight check for StorageClass parameter validations now throws a warning instead of a failure on ignored parameters after CSI Migration. StorageClass parameter diskformat=thin is now allowed and does not generate a warning.

  • Fixed a false error message for gkectl prepare when using a high-availability admin cluster.

  • Fixed an issue during the migration from the Seesaw load balancer to MetalLB that caused 'DeprecatedKubeception' always shows up in the diff.

  • Fixed a known issue where some cluster nodes couldn't access the HA control plane when the underlying network performs ARP suppression.

  • Removed unused Pod disruption budgets (such as kube-apiserver-pdb, kube-controller-manager-pdb, and kube-etcd-pdb) for Controlplane V2 user clusters

The following vulnerabilities are fixed in 1.16.0-gke.669:

August 17, 2023

Anthos clusters on VMware 1.14.7-gke.42 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.7-gke.42 runs on Kubernetes 1.25.10-gke.2100.

Upgraded VMware vSphere Container Storage Plug-in from 2.7.0 to 2.7.2.

The following issues are fixed in 1.14.7-gke.42:

  • Fixed a known issue that admin SSH public key has error after admin cluster upgrade or update.
  • Fixed a known issue that GARP reply sent by Seesaw doesn't set target IP.
  • Fixed an issue that /etc/vsphere/certificate/ca.crt was not updated after vsphere CA rotation on the Controlplane v2 user cluster control plane machines.
  • Fixed an issue that the CPv1 stackdriver operator had --is-kubeception-less=true specified by mistake.

The following vulnerabilities are fixed in 1.14.7-gke.42:

August 10, 2023

Anthos clusters on VMware 1.15.3-gke.47 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.15.3-gke.47 runs on Kubernetes 1.26.5-gke.2100.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

Upgrading an admin cluster with always-on secrets encryption enabled might fail.

An admin cluster upgrade from 1.14.x to 1.15.0 - 1.15.4 with always-on secrets encryption enabled might fail depending on whether the feature was enabled during cluster creation or during cluster update.

We recommend that you don't upgrade your admin cluster until a fix is available in 1.15.5. If you must upgrade to 1.15.0-1.15.4, do the steps in Preventing the upgrade failure before upgrading the cluster.

For information on working around an admin cluster failure because of this issue, see Upgrading an admin cluster with always-on secrets encryption enabled fails. Note that the workaround relies on you having the old encryption key backed up. If the old key is no longer available, you will have to recreate the admin cluster and all user clusters.

Anthos clusters on VMware 1.15.3 supports adding the gkeOnPremAPI section to your admin cluster configuration file and user cluster configuration file to enroll the clusters in the Anthos On-Prem API.

Upgraded VMware vSphere Container Storage Plug-in from 3.0 to 3.0.2. For more information, see the Plug-in release notes.

The following issues are fixed in 1.15.3-gke.47:

  • Fixed a known issue. that caused upgrading an admin cluster enrolled in the Anthos On-Prem API to fail.
  • Fixed an issue where audit logs are duplicated into an offline buffer even when they are successfully sent to Cloud Audit Logging.

The following vulnerabilities are fixed in 1.15.3-gke.47:

July 20, 2023

Anthos clusters on VMware 1.13.10-gke.42 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.10-gke.42 runs on Kubernetes 1.24.14-gke.2100.

  • Upgraded VMware vSphere Container Storage Plug-in from 2.6.2 to 2.7.2.
  • Added short names for Volume Snapshot CRDs.

The following issues are fixed in 1.13.10-gke.42:

  • Fixed an issue that CPv1 stackdriver operator has --is-kubeception-less=true specified by mistake.
  • Fixed an issue that /etc/vsphere/certificate/ca.crt is not updated after vsphere CA rotation on the Controlplane v2 user cluster control plane machines.
  • Fixed an issue where audit logs are duplicated into an offline buffer even when they are successfully sent to Cloud Audit Logs.
  • Fixed a known issue where $ in the private registry user name would cause admin control plane machine startup failure.
  • Fixed a known issue where the update operation cannot be fulfilled due to KSA signing key version unmatched.

The following vulnerabilities are fixed in 1.13.10-gke.42:

July 10, 2023

Anthos clusters on VMware 1.15.2-gke.44 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware. 1.15.2-gke.44 runs on Kubernetes 1.26.2-gke.1001.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

Upgrading an admin cluster with always-on secrets encryption enabled might fail.

An admin cluster upgrade from 1.14.x to 1.15.0 - 1.15.4 with always-on secrets encryption enabled might fail depending on whether the feature was enabled during cluster creation or during cluster update.

We recommend that you don't upgrade your admin cluster until a fix is available in 1.15.5. If you must upgrade to 1.15.0-1.15.4, do the steps in Preventing the upgrade failure before upgrading the cluster.

For information on working around an admin cluster failure because of this issue, see Upgrading an admin cluster with always-on secrets encryption enabled fails. Note that the workaround relies on you having the old encryption key backed up. If the old key is no longer available, you will have to recreate the admin cluster and all user clusters.

The following issues are fixed in 1.15.2-gke.44:

  • Fixed a bug where after an upgrade to 1.15, clusters used the non-high-availability (HA) Connect Agent.
  • Fixed a known issue where $ in the private registry username caused admin control plane machine startup failure.
  • Fixed a known issue where user cluster update failed after KSA signing key rotation.
  • Fixed a known issue where gkectl diagnose snapshot failed to limit the time window for journalctl commands running on the cluster nodes when you take a cluster snapshot with the --log-since flag.

The following vulnerabilities are fixed in 1.15.2-gke.44:

July 06, 2023

Anthos clusters on VMware 1.14.6-gke.23 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.6-gke.23 runs on Kubernetes 1.25.10-gke.1200.

The following issues are fixed in 1.14.6-gke.23:

  • Fixed a known issue where $ in the private registry username caused admin control plane machine startup failure.
  • Fixed a known issue where gkectl diagnose snapshot failed to limit the time window for journalctl commands running on the cluster nodes when you take a cluster snapshot with the --log-since flag.
  • Fixed a known issue where user cluster update failed after KSA signing key rotation.

The following vulnerabilities are fixed in 1.14.6-gke.23:

High-severity container vulnerabilities:

June 27, 2023

Security bulletin

A number of vulnerabilities have been discovered in Envoy, which is used in Anthos Service Mesh (ASM). These were reported separately as GCP-2023-002.

For more information, see the GCP-2023-016 security bulletin.

Security bulletin

With CVE-2023-31436, an out-of-bounds memory access flaw was found in the Linux kernel's traffic control (QoS) subsystem in how a user triggers the qfq_change_class function with an incorrect MTU value of the network device used as lmax. This flaw allows a local user to crash or potentially escalate their privileges on the system.

For more information, see the GCP-2023-017 security bulletin.

Security bulletin

A new vulnerability (CVE-2023-2235) has been discovered in the Linux kernel that can lead to a privilege escalation on the node. For more information, see the GCP-2023-018 security bulletin.

June 20, 2023

Security bulletin

A new vulnerability, CVE-2023-0468, has been discovered in the Linux kernel that could allow an unprivileged user to escalate privileges to root when io_poll_get_ownership will keep increasing req->poll_refs on every io_poll_wake then overflow to 0 which will fput req->file twice and cause a struct file refcount issue. GKE clusters, including Autopilot clusters, with Container-Optimized OS using Linux Kernel version 5.15 are affected. GKE clusters using Ubuntu images or using GKE Sandbox are unaffected.

For more information, see the GCP-2023-015 security bulletin.

June 16, 2023

Security bulletin

Two new security issues were discovered in Kubernetes where users may be able to launch containers that bypass policy restrictions when using ephemeral containers and either ImagePolicyWebhook (CVE-2023-2727) or the ServiceAccount admission plugin (CVE-2023-2728).

For more information, see the GCP-2023-014 security bulletin

June 14, 2023

Anthos clusters on VMware 1.14.5-gke.41 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.5-gke.41 runs on Kubernetes 1.25.8-gke.1500.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.15, 1.14, and 1.13.

The component access service account key for an admin cluster using a private registry can be updated in 1.14.5 and later. See
Rotating service account keys for details.

The following issues are fixed in 1.14.5-gke.41:

  • Fixed a known issue where the kind cluster downloads container images from docker.io. These container images are now preloaded in the kind cluster container image.
  • Fixed a bug where disks may be out of order in the first boot, causing node bootstrap failure.
  • Fixed a known issue where node ID verification failed to handle hostnames with dots.
  • Fixed an issue where gcloud fails to update the platform when the required-platform-version is already the current platform version.
  • Fixed the Anthos Config Management gcloud issue that the policy controller state might be falsely reported as pending.
  • Fixed continuously increasing memory usage of the logging agent stackdriver-log-forwarder.
  • Fixed the wrong admin cluster resource link annotation key that can cause the cluster to be enrolled in the Anthos On-Prem API again by mistake.
  • Fixed a known issue where some cluster nodes couldn't access the HA control plane when the underlying network performs ARP suppression.
  • Fixed a known issue where vsphere-csi-secret is not updated during gkectl update credentials vsphere for admin cluster

The following vulnerabilities are fixed in 1.14.5-gke.41

Anthos clusters on VMware 1.13.9-gke.29 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.9-gke.29 runs on Kubernetes 1.24.11-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.15, 1.14, and 1.13.

The following issues are fixed in 1.13.9-gke.29:

  • Fixed a known issue where the kind cluster downloads container images from docker.io. These container images are now preloaded in the kind cluster container image.
  • Fixed the issue where gkectl failed to limit the time window for journalctl commands running on the cluster nodes when you take a cluster snapshot with the --log-since flag.
  • Fixed an issue where gcloud fails to update the platform when the required-platform-version is already the current platform version.
  • Fixed a known issue where nodes fail to register if the configured hostname contains a period.
  • Fixed the wrong admin cluster resource link annotation key that can cause the cluster to be enrolled again by mistake.

The following high-severity container vulnerabilities are fixed in 1.13.9-gke.29:

June 06, 2023

Security bulletin

A new vulnerability (CVE-2023-2878) has been discovered in the secrets-store-csi-driver where an actor with access to the driver logs could observe service account tokens. These tokens could then potentially be exchanged with external cloud providers to access secrets stored in cloud vault solutions. The severity of this Security Bulletin is None. For more information, see the GCP-2023-009 security bulletin.

June 05, 2023

Known issue

If you create a version 1.13.8 or version 1.14.4 admin cluster, or upgrade an admin cluster to version 1.13.8 or 1.14.4, the kind cluster pulls the following container images from docker.io:

  • docker.io/kindest/kindnetd
  • docker.io/kindest/local-path-provisioner
  • docker.io/kindest/local-path-helper

If docker.io isn't accessible from your admin workstation, the admin cluster creation or upgrade fails to bring up the kind cluster.

This issue affects the following versions of Anthos clusters on VMware:

  • 1.14.4
  • 1.13.8

For more information, including a workaround, see kind cluster pulls container images from docker.io on the Known issues page.

Security bulletin

A new vulnerability (CVE-2023-1872) has been discovered in the Linux kernel that can lead to a privilege escalation to root on the node. For more information, see the GCP-2023-008.

June 01, 2023

Anthos clusters on VMware 1.15.1-gke.40 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.15.1-gke.40 runs on Kubernetes 1.26.2-gke.1001.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.15, 1.14, and 1.13.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

Upgrading an admin cluster with always-on secrets encryption enabled might fail.

An admin cluster upgrade from 1.14.x to 1.15.0 - 1.15.4 with always-on secrets encryption enabled might fail depending on whether the feature was enabled during cluster creation or during cluster update.

We recommend that you don't upgrade your admin cluster until a fix is available in 1.15.5. If you must upgrade to 1.15.0-1.15.4, do the steps in Preventing the upgrade failure before upgrading the cluster.

For information on working around an admin cluster failure because of this issue, see Upgrading an admin cluster with always-on secrets encryption enabled fails. Note that the workaround relies on you having the old encryption key backed up. If the old key is no longer available, you will have to recreate the admin cluster and all user clusters.

  • Fixed a known issue where node ID verification failed to handle hostnames with dots.

  • Fixed continuous increase of logging agent memory.

  • Fixed an issue where cluster-api-controllers in a high-availability admin cluster had no Pod anti-affinity. This could allow the three clusterapi-controllers Pods not to be scheduled on different control-plane nodes.

  • Fixed the wrong admin cluster resource link annotation key that can cause the cluster to be enrolled again by mistake.

  • Fixed a known issue where node pool creation failed because of duplicated VM-Host affinity rules.

  • The preflight check for StorageClass parameter validations now throws a warning instead of a failure on ignored parameters after CSI Migration. StorageClass parameter diskformat=thin is now allowed and does not generate a warning.

  • Fixed an issue where gkectl repair admin-master might fail with Failed to repair: failed to delete the admin master node object and reboot the admin master VM.

  • Fixed a race condition where some cluster nodes couldn't access the high-availability control plane when the underlying network performed ARP suppression.

  • Fixed a false error message for gkectl prepare when using a high-availability admin cluster.

  • Fixed an issue where during user cluster update, DeprecatedKubeception always shows up in the diff.

  • Fixed an issue where there were leftover Pods with failed status due to Predicate NodeAffinity failed during node re-creation.

Fixed the following vulnerabilities:

May 18, 2023

Security bulletin

Two new vulnerabilities (CVE-2023-1281, CVE-2023-1829) have been discovered in the Linux kernel that can lead to a privilege escalation to root on the node. For more information, see the GCP-2023-005 security bulletin.

May 15, 2023

Anthos clusters on VMware 1.13.8-gke.42 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.8-gke.42 runs on Kubernetes 1.24.11-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.15, 1.14, and 1.13.

  • Fixed a race condition where some cluster nodes couldn't access the HA control plane when the underlying network performed ARP suppression.

  • Fixed an issue where vsphere-csi-secret was not updated during gkectl update credentials vsphere for an admin cluster.

  • Disabled motd news on the ubuntu_containerd image to avoid unexpected connections to Canonical.

  • Fixed an issue where the Connect Agent continued using the older image after registry credential update.

  • Fixed an issue where cluster autoscaler ClusterRoleBindings in the admin cluster were accidentally deleted upon user cluster deletion. This fix removes dependency on ClusterRole, ClusterRoleBinding and ServiceAccount objects in the admin cluster.

  • Fixed an issue where Connect Agent in admin clusters might fail to be upgraded during cluster upgrade.

  • Fixed an issue where a cluster might not be registered when the initial membership creation attempt failed.

Fixed the following vulnerabilities:

May 02, 2023

Anthos clusters on VMware 1.15.0-gke.581 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.15.0-gke.581 runs on Kubernetes 1.26.2-gke.1001.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.15, 1.14, and 1.13.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on VMware.

Upgrading an admin cluster with always-on secrets encryption enabled might fail.

An admin cluster upgrade from 1.14.x to 1.15.0 - 1.15.4 with always-on secrets encryption enabled might fail depending on whether the feature was enabled during cluster creation or during cluster update.

We recommend that you don't upgrade your admin cluster until a fix is available in 1.15.5. If you must upgrade to 1.15.0-1.15.4, do the steps in Preventing the upgrade failure before upgrading the cluster.

For information on working around an admin cluster failure because of this issue, see Upgrading an admin cluster with always-on secrets encryption enabled fails. Note that the workaround relies on you having the old encryption key backed up. If the old key is no longer available, you will have to recreate the admin cluster and all user clusters.

  • CSI migration for the vSphere storage driver is enabled by default. A new storage preflight check and a new CSI workload preflight check verify that PersistentVolumes that used the old in-tree vSphere storage driver will continue to work with the vSphere CSI driver. There is a known issue during admin cluster upgrade. If you see a preflight check about a StorageClass diskformat parameter, you can use --skip-validation-cluster-health to skip the check. This issue will be fixed in a future release.

  • The minimum required version of vCenter and ESXi is 7.0 Update 2.

  • Preview: Support for vSphere 8.0

  • Preview: Support for VM-Host affinity for user cluster node pools

  • Preview: Support for High availability control plane for admin clusters

  • Preview: Support for system metrics collection using Google Cloud Managed Service for Prometheus

  • Preview: You can now filter application logs by namespace, Pod labels and content regex.

  • Preview: Support for storage policy in user clusters

  • Preview: You can now use gkectl diagnose snapshot --upload=true to upload a snapshot. And gkectl helps generate the Cloud Storage bucket with the format gs://anthos-snapshot[uuid]/vmware/$snapshot-name.

  • GA: Support for upgrade and rollback of node pool version

  • GA: gkectl get-config is a new command that locally generates cluster configuration files from an existing admin or user cluster.

  • GA: Support for multi-line parsing of Go and Java logs

  • GA: Support for manual load balancing in user clusters that enable ControlplaneV2

  • GA: Support for update of private registry credentials

  • GA: Metrics and logs in the bootstrap cluster are now uploaded to Google Cloud through Google Cloud's operations suite to provide better observability on admin cluster operations.

  • GA: vSphere CSI is now enabled for Windows node pools.

  • Fully managed Cloud Monitoring Integration dashboards. The new Integration Dashboard is automatically installed. You cannot make changes to the following dashboards, because they are fully managed by Google. However, you can make a copy of a dashboard and customize the copied version:

    • Anthos Cluster Control Plane Uptime
    • Anthos Cluster Node Status
    • Anthos Cluster Pod Status
    • Anthos Cluster Utilization Metering
    • Anthos Cluster on VMware VM Status
  • Admin cluster update operations are now managed by an admin cluster controller.

  • The Connect Agent now runs in high availability mode.

  • The metrics server now runs in high-availability mode.

  • Upgraded the VMware vSphere Container Storage Plug-in from 2.7 to 3.0. This includes support for Kubernetes version 1.26. For more information, see the plug-in release notes.

  • Upgraded Anthos Identity Service to hybrid_identity_charon_20230313_0730_RC00.

  • Switched the node selector from node-role.kubernetes.io/master to node-role.kubernetes.io/control-plane and added toleration node-role.kubernetes.io/control-plane to system components.

  • Controlplane V2 is now the default for new user clusters.

  • Now when you delete a Controlplane V2 user cluster , the data disk is automatically deleted.

  • Cluster DNS now supports ordering policy for upstream servers.

  • Added admin cluster CA certificate validation to the admin cluster upgrade preflight check.

  • Upgraded Anthos Network Gateway to 1.4.4.

  • Updated anthos-multinet.

  • When you upload and share a snapshot using gkectl diagnose snapshot with a Google Support team service account service-[GOOGLE_CLOUD_PROJECT_NUMBER]@gcp-sa-anthossupport.iam.gserviceaccount.com, gkectl helps provision the service account automatically.

  • Upgraded node-exporter from 1.0.1 to 1.4.1.

  • Upgraded Managed Service for Prometheus for application metrics from 0.4 to 0.6.

  • We now allow storage DRS to be enabled in manual mode.

  • GKE connect is now required for admin clusters, and you cannot skip the corresponding validation. You can register existing admin clusters by using gkectl update admin.

  • We no longer silently skip saving empty files in diagnose snapshots, but instead collect the names of those files in a new empty_snapshots file in the snapshot tarball.

  • We now mount /opt/data using disk label data.

  • In the vSphere CSI driver, enabled improved-csi-idempotency and async-query-volume, and disabled trigger-csi-fullsync. This enhances the vSphere CSI driver to ensure volume operations are idempotent.

  • Changed the relative file path fields in the admin cluster configuration file to use absolute paths

  • Removed kubectl describe events in cluster snapshots for a better user experience. kubectl describe events fail when the target event expires. In contrast kubectl get events survive and provide enough debugging information.

Deprecations

  • Support for gkeadm on MAC and Windows is deprecated.

  • The enableWindowsDataplaneV2 field in the user cluster configuration file is deprecated.

  • The gkectl enroll cluster command is deprecated. Use gcloud to enroll a user cluster instead.

  • The following dashboards in the Cloud Monitoring Sample Library will be deprecated in a future release:

    • Anthos cluster control plane uptime
    • Anthos cluster node status
    • Anthos cluster pod status
    • Anthos utilization metering
    • GKE on-prem node status
    • GKE on-prem control plane uptime
    • GKE on-prem pod status
    • GKE on-prem vSphere vm health status
  • In a future release, the following customized dashboards will not be created when you create a new cluster:

    • GKE on-prem node status
    • GKE on-prem control plane uptime
    • GKE on-prem pod status
    • GKE on-prem vSphere vm health status
    • GKE on-prem Windows pod status
    • GKE on-prem Windows node status
  • Fixed the false error message generated by the cluster autoscaler about a missing ClusterRoleBinding. After a user cluster is deleted, that ClusterRoleBinding is no longer needed.

  • Fixed an issue where gkectl check-config failed (nil pointer error) during validation for Manual load balancing.

  • Fixed an issue where the cluster autoscaler did not work when Controlplane V2 was enabled.

  • Fixed an issue where using gkectl update to enable Cloud Audit Logs did not work.

  • Fixed an issue where a preflight check for Seesaw load balancer creation failed if the Seesaw group file already existed.

  • We now backfill the OnPremAdminCluster OSImageType field to prevent an unexpected diff during update.

  • Fixed an issue where disks might be out of order during the first boot.

  • Fixed an issue where the private registry credentials file for the user cluster could not be loaded.

  • Fixed an issue where the user-cluster node options and startup script used the cluster version instead of the node pool version.

  • Fixed an issue where gkectl diagnose cluster didn't check the health of control-plane Pods for kubeception user clusters.

  • Fixed an issue where KSASigningKeyRotation always showed as an unsupported change during user cluster update.

  • Fixed an issue where a cluster might not be registered when the initial membership creation attempt failed.

  • Fixed an issue where user cluster data disk validation used the cluster-level vCenter.datastore instead of masterNode.vsphere.datastore.

  • Fixed an issue where component-access-sa-key was missing in the admin-cluster-creds Secret after admin cluster upgrade.

  • Fixed an issue where during user cluster upgrade, the cluster state indicated that upgrade had completed before CA rotation had completed.

  • Fixed an issue where advanced networking components were evicted or not scheduled on nodes because of Pod priority.

  • Fixed a known issue where the calico-node Pod was unable to renew the auth token in the calico CNI kubeconfig file.

  • Fixed Anthos Identity Service metric exporting issues.

  • During preflight checks and cluster diagnosis, we now skip PersistentVolumes and PersistentVolumeClaims that use non-vSphere drivers.

  • Fixed a known issue where CIDR ranges could not be used in the IP block file.

  • Fixed an issue where auto resizing of CPU and memory for an admin cluster add-on node got reset by an admin cluster controller.

  • anet-operator can now be scheduled to a Windows node in a user cluster that has Controlplane V2 enabled.

May 01, 2023

Anthos clusters on VMware 1.14.4-gke.54 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.4-gke.54 runs on Kubernetes 1.25.8-gke.1500.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

Added admin cluster CA certificate validation to the admin cluster upgrade preflight check.

  • Fixed an issue where the Connect Agent continued using the older image after registry credential update.

  • Fixed an issue where the cluster autoscaler did not work when Controlplane V2 was enabled.

  • Fixed an issue where a cluster might not be registered when the initial membership creation attempt failed.

  • Fixed an issue where ClusterRoleBindings in the admin cluster were accidentally deleted upon user cluster deletion. This fix removes dependency on ClusterRole, ClusterRoleBinding and ServiceAccount objects in the admin cluster.

  • Fixed an issue where a preflight check for Seesaw load balancer creation failed if the Seesaw group file already existed.

  • Disabled motd news on the ubuntu_containerd image.

  • Fixed an issue where gkectl check-config failed at Manual LB slow validation with a nil pointer error.

  • Fix an issue where enabling Cloud Audit Logs with gkectl update did not work.

Fixed the following vulnerabilities:

April 13, 2023

Anthos clusters on VMware 1.12.7-gke.20 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.12.7-gke.20 runs on Kubernetes 1.23.17-gke.900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

  • Added admin cluster CA certificate validation to the admin cluster upgrade preflight check.

  • We now allow storage DRS to be enabled in manual mode.

  • Fixed an issue where using gkectl update to enable Cloud Audit Logs did not work.

  • We now backfill the OnPremAdminCluster OSImageType field to prevent an unexpected diff during update.

  • Fixed an issue where a preflight check for Seesaw load balancer creation failed if the Seesaw group file already existed.

April 12, 2023

Kubernetes image registry redirect

As of March 21, 2023, traffic to k8s.gcr.io is redirected to registry.k8s.io, following the community announcement. This change is happening gradually to reduce disruption, and should be transparent for most Anthos clusters.

To check for edge cases and mitigate potential impact to your clusters, follow the step-by-step guidance in k8s.gcr.io Redirect to registry.k8s.io - What You Need to Know.

April 11, 2023

1.13.7 patch release

Anthos clusters on VMware 1.13.7-gke.29 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.7-gke.29 runs on Kubernetes 1.24.11-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

Fixed for 1.13.7

  • Fixed an issue where gkectl check-config fails at Manual LB slow validation with a nil pointer error.

  • Fixed a bug where enabling Cloud Audit Logs with gkectl update did not work.

  • Fixed an issue where a preflight check for Seesaw load balancer creation failed if the Seesaw group file already existed.

  • We now backfill the OnPremAdminCluster OSImageType field to prevent an unexpected diff during update.

Security bulletin

Two new vulnerabilities, CVE-2023-0240 and CVE-2023-23586, have been discovered in the Linux kernel that could allow an unprivileged user to escalate privileges. For more information, see the GCP-2023-003 security bulletin.

1.12.7-gke.19 bad release

Anthos clusters on VMware 1.12.7-gke.19 is a bad release and you should not use it. The artifacts have been removed from the Cloud Storage bucket.

April 03, 2023

Anthos clusters on VMware 1.14.3-gke.25 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.3-gke.25 runs on Kubernetes 1.25.5-gke.100.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

We now allow storage DRS to be enabled in manual mode.

  • We now backfill the OnPremAdminCluster OSImageType field to prevent an unexpected diff during cluster update.

  • Fixed an issue where gkectl diagnose cluster didn't check the health of control-plane Pods for kubeception user clusters.

  • Fixed an issue where the user-cluster node options and startup script used the cluster version instead of the node pool version.

Fixed the following vulnerabilities:

March 17, 2023

Anthos clusters on VMware 1.13.6-gke.32 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.6-gke.32 runs on Kubernetes 1.24.10-gke.2200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

  • Fixed an issue with Anthos Identity Service to better scale and handle concurrent authentication requests.

  • Fixed an issue where component-access-sa-key was missing in the admin-cluster-creds Secret after admin cluster upgrade.

Fixed the following vulnerabilities:

March 07, 2023

Anthos clusters on VMware 1.14.2-gke.37 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.2-gke.37 runs on Kubernetes 1.25.5-gke.100.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

We no longer silently skip saving empty files in diagnose snapshots, but instead collect the names of those files in a new empty_snapshots file in the snapshot tarball.

  • Fixed an issue where user cluster data disk validation used the cluster-level datastore vsphere.datastore instead of masterNode.vsphere.datastore.

  • Fixed an issue with Anthos Identity Service to better scale and handle concurrent authentication requests.

  • Fixed an issue where component-access-sa-key was missing in the admin-cluster-creds Secret after admin cluster upgrade.

  • Fixed an issue where user cluster upgrade triggered through the Google Cloud console might flap between ready and non-ready states until CA rotation fully completes.

  • Fixed an issue where gkectl diagnose cluster might generate false failure signals with non-vSphere CSI drivers.

  • Fixed an issue where admin cluster update doesn't wait for user control-plane machines to be re-created when using ControlPlaneV2.

Fixed the following vulnerabilities:

March 06, 2023

Cluster lifecycle improvements versions 1.13.1 and later

You can use the Google Cloud console or the gcloud CLI to upgrade user clusters managed by the Anthos On-Prem API. The upgrade steps differ depending on your admin cluster version. For more information, see the version of the documentation that corresponds to your admin cluster version:

1.12.6 patch release

Anthos clusters on VMware 1.12.6-gke.35 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.12.6-gke.35 runs on Kubernetes v1.23.16-gke.2400.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

  • Fixed a bug where KSASigningKeyRotation always shows as an unsupported change during user cluster update.
  • Fixed an issue with Anthos Identity Service to better scale and handle concurrent authentication requests.

  • Fixed an issue where component-access-sa-key was missing in the admin-cluster-creds Secret after admin cluster upgrade.

Fixed the following vulnerabilities:

March 01, 2023

A new vulnerability (CVE-2022-4696) has been discovered in the Linux kernel that can lead to a privilege escalation on the node. Anthos clusters on VMware running v1.12 and v1.13 are impacted. Anthos clusters on VMware running v1.14 or later are not affected.

For instructions and more details, see the Anthos clusters on VMware security bulletin.

February 13, 2023

Anthos clusters on VMware 1.13.5-gke.27 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.5-gke.27 runs on Kubernetes 1.24.9-gke.2500.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

  • Updated the Ubuntu image to ubuntu-gke-op-2004-1-13-v20230201 using node kernel version 5.4.0.1062.60.

  • Instead of ignoring snapshots files with empty content, we save their names in a new file named empty_snapshots.

During preflight checks and cluster diagnosis, we now skip PVs and PVCs that use non-vSphere drivers.

Fixed the following vulnerabilities:

January 31, 2023

Anthos clusters on VMware 1.14.1-gke.39 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.1-gke.39 runs on Kubernetes 1.25.5-gke.100.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

  • In the admin cluster configuration file, gkeadm now prepopulates caCertPath and the service account key paths with absolute paths instead of relative paths.

  • In the vSphere CSI driver, enabled improved-csi-idempotency, and async-query-volume, and disabled trigger-csi-fullsync. This enhances the vSphere CSI driver to ensure volume operations are idempotent.

  • Fixed a known issue where the calico-node Pod is unable to renew the auth token in the calico CNI kubeconfig file.

  • Fixed a known issue where CIDR ranges cannot be used in the IP block file.

Fixed the following vulnerabilities:

January 26, 2023

Anthos clusters on VMware 1.12.5-gke.34 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.12.5-gke.34 runs on Kubernetes 1.23.15-gke.2400.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

In the vSphere CSI driver, enabled improved-csi-idempotency, and async-query-volume, and disabled trigger-csi-fullsync. This enhances the vSphere CSI driver to ensure volume operations are idempotent.

  • If you specify a CIDR range (subnet) in the IP block file for your cluster nodes, the broadcast IP of the subnet, the network CIDR IP, and the network gateway IP will be excluded from the pool of addresses that get assigned to nodes.

  • Fixed a known issue where CIDR ranges cannot be used in the IP block file.

  • Fixed a bug where CA rotation appeared as an unsupported change during admin cluster update.

Fixed the following vulnerabilities:

January 25, 2023

Anthos clusters on VMware version 1.14.0 has a known issue where the calico-node Pod is unable to renew the auth token in the calico CNI kubeconfig file. For more information, see Pod create or delete errors due to Calico CNI service account auth token issue.

Because of this issue, you cannot use Anthos On-Prem API clients (Google Cloud console and gcloud CLI) to create and manage 1.14.0 clusters.

January 12, 2023

Anthos clusters on VMware 1.13.4-gke.19 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.4-gke.19 runs on Kubernetes 1.24.9-gke.100

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

  • In the vSphere CSI driver, enabled improved-csi-idempotency, and async-query-volume, and disabled trigger-csi-fullsync. This enhances the vSphere CSI driver to ensure volume operations are idempotent.

  • In the admin cluster configuration file, gkeadm now prepopulates caCertPath and the service account key paths with absolute paths instead of relative paths.

  • If you specify a CIDR range (subnet) in the IP block file for your cluster nodes, the broadcast IP of the subnet, the network CIDR IP, and the network gateway IP will be excluded from the pool of addresses that get assigned to nodes.
  • Fixed a bug where CIDR ranges cannot be used in an IP block file.

December 22, 2022

A new vulnerability (CVE-2022-2602) has been discovered in the io_uring subsystem in the Linux kernel that can allow an attacker to potentially execute arbitrary code.

For more information see the GCP-2022-025 security bulletin.

December 21, 2022

Anthos clusters on VMware 1.14.0-gke.430 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.14.0-gke.430 runs on Kubernetes 1.25.5-gke.100.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.14, 1.13, and 1.12.

  • Support for user cluster creation with Controlplane V2 enabled is now generally available. For more details on how to create a user cluster with this model, see Create a user cluster with Controlplane V2.
  • Preview: You can now roll back node pools to a previous working version if you detect an issue in the new version after a cluster upgrade. For more information, see Rolling back a node pool after an upgrade.
  • Preview: The following private registry updates are now available:
    • Support for private registry credentials using prepared Secrets is now available as a preview feature. A new privateRegistry field has been added in the Secrets configuration file.
    • Added a new privateRegistry section in the user cluster configuration file. You can use different private registry credentials for the user cluster and admin cluster. You can also use a different private registry address for user clusters with Controlplane V2 enabled.
    • You can also update private registry credentials for an admin cluster or user cluster with the gkectl update credentials command. For more information, see Update private registry credentials.
  • Cluster names are now included in kubeconfig files when creating a new admin cluster or user cluster. If you are upgrading your existing cluster to 1.14.0 or higher, the existing kubeconfig file is updated with the cluster name.
  • cluster-health-controller is now integrated with health-check-exporter to emit metrics based on the periodic health check results, making it easy to monitor and detect cluster health problems.
  • GA: The node pool update policy is generally available. With this feature, you can configure the value of maximumConcurrentNodePoolUpdate in the user cluster configuration file to 1. This will configure the maximum number of additional nodes spawned during cluster upgrade or update, which can potentially avoid two issues — resource quota limit issue and PDB deadlock issue. For more information, see Configure node pool update policy.
  • Support for vSphere cluster/host/network/datastore folders is generally available. You can use folders to group objects of the same type for easier management. For more information, see Specify vSphere folders in cluster configuration and the relevant sections in the admin cluster and user cluster configuration files.
  • Added a feature enabling cluster administrators to configure RBAC policies based on Azure Active Directory (AD) groups. Group information for users belonging to more than 200 groups can now be retrieved.
  • Upgraded Kubernetes from 1.24 to 1.25:
    • Migrated PDB API version from policy/v1beta1 to policy/v1. You must ensure that any workload PDB API version is updated to policy/v1 before upgrading your cluster to 1.14.0.
    • Migrated autoscaling/v2beta1 to autoscaling/v2.
    • Disabled CSI Migration for vSphere as this is enabled by default in Kubernetes 1.25.
  • Added storage validation that checks if in-use Kubernetes PersistentVolumes (PV) have disks present in the configured datastore, and if node.Status.VolumesAttached is consistent with the actual PV/disk attachment states during admin and user cluster upgrade preflight checks.
  • Updated gcloud version to 410.0.0 on the admin workstation.
  • Upgraded VMware vSphere Container Storage Plug-in from 2.5 to 2.7. This version bump includes support for Kubernetes version 1.25. For more information, see VMware vSphere Container Storage Plug-in 2.7 Release Notes.
  • In the generated user cluster configuration template, the prepopulated value for enableDataplaneV2 is now true.
  • Removed unnecessary RBAC policies for managing the lifecycle of user clusters in the Google Cloud console.
  • Updated the parser of container logs to extract severity level.
  • Simplified the cluster snapshot uploading process by automatically retrieving GKE connect-register service account key, and making the flag --service-account-key-file optional. When the cluster is not registered correctly, and no additional service account key file is passed in through the flag, the gkectl diagnose snapshot command will use the GOOGLE_APPLICATION_CREDENTIALS environment variable to authenticate the request.
  • Upgraded Container-Optimized OS to m101.
  • In the admin cluster and user cluster configuration file templates, loadbalancer.kind field is now prepopulated with MetalLB.

A known issue has been discovered. See the January 25, 2023 release note.

December 20, 2022

Anthos clusters on VMware 1.12.4-gke.42 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.12.4-gke.42 runs on Kubernetes 1.23.13-gke.1700.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.13, 1.12, and 1.11.

  • Changed the relative file path fields in the admin cluster configuration file to use absolute paths.
  • Added yq tool in the admin workstation.

December 15, 2022

Anthos clusters on VMware 1.13.3-gke.26 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.3-gke.26 runs on Kubernetes 1.24.7-gke.1700.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.13, 1.12, and 1.11.

  • Added yq tool in the admin workstation to simplify troubleshooting.
  • Upgraded VMware vSphere Container Storage Plug-in from 2.5 to 2.6.2. This version bump includes support for Kubernetes version 1.24. For more information, see VMware vSphere Container Storage Plug-in 2.6 Release Notes.
  • Added storage validation that checks Kubernetes PersistentVolumes and vSphere virtual disks as part of admin and user cluster upgrade preflight checks.
  • Fixed an issue where anet-operator could be scheduled to a Windows node with enableControlplaneV2: true.
  • Fixed OOM events associated with monitoring-operator- Pods by increasing memory limit to 1GB.
  • Fixed the issue where deleting a user cluster also deleted cluster-health-controller and vsphere-metrics-exporter ClusterRole objects.
  • Fixed the following vulnerabilities:

December 08, 2022

Anthos clusters on VMware 1.11.6-gke.18 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.11.6-gke.18 runs on Kubernetes 1.22.15-gke.3300.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.13, 1.12, and 1.11.

November 17, 2022

Anthos clusters on VMware 1.13.2-gke.26 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.2-gke.26 runs on Kubernetes 1.24.7-gke.1400.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.13, 1.12, and 1.11.

  • Fixed a validation error where the GKE Hub membership is not found when using a gcloud version that is not bundled with the admin workstation.
  • Fixed the issue where the admin cluster might fail to register due to naming conflicts.
  • Fixed the issue where the Connect Agent in the admin cluster does not upgrade after a failure to upgrade nodes in the user cluster control plane.
  • Fixed a bug where running gkectl diagnose snapshot using system scenario did not capture Cluster API resources in the default namespace.
  • Fixed the issue during admin cluster creation where gkectl check-config fails due to missing OS images, if gkectl prepare is not run first.
  • Fixed the unspecified Internal Server error in ClientConfig when using the Anthos Identity Service (AIS) hub feature to manage the OpenID Connect (OIDC) configuration.
  • Fixed the issue of /var/log/audit/ filling up disk space on the admin workstation.
  • Fixed an issue where cluster deletion may be stuck at node draining when the user cluster control plane and node pools are on different datastores.
  • Fixed the issue where nodes fail to register if the configured hostname in the IP block file contains one or more periods.
  • Fixed the following vulnerabilities:

November 10, 2022

Anthos clusters on VMware 1.11.5-gke.14 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.11.5-gke.14 runs on Kubernetes 1.22.15-gke.2200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.13, 1.12, and 1.11.

November 09, 2022

Two new vulnerabilities, CVE-2022-2585 and CVE-2022-2588, have been discovered in the Linux kernel that can lead to a full container break out to root on the node.

For more information, see the GCP-2022-024 security bulletin.

November 07, 2022

A security vulnerability, CVE-2022-39278, has been discovered in Istio, which is used in Anthos Service Mesh, that allows a malicious attacker to crash the control plane.

For instructions and more details, see the Anthos clusters on VMware security bulletin.

November 01, 2022

Anthos clusters on VMware 1.13.1-gke.35 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.1-gke.35 runs on Kubernetes 1.24.2-gke.1900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.13, 1.12, and 1.11.

  • Increased logging granularity for the cluster backup operation including indicating status for each step of the process.

October 28, 2022

A new vulnerability, CVE-2022-20409, has been discovered in the Linux kernel that could allow an unprivileged user to escalate to system execution privilege.

For instructions and more details, see the Anthos clusters on VMware security bulletin.

October 27, 2022

A new vulnerability, CVE-2022-3176, has been discovered in the Linux kernel that can lead to local privilege escalation. This vulnerability allows an unprivileged user to achieve full container breakout to root on the node.

For instructions and more details, see the Anthos clusters on VMware security bulletin.

October 25, 2022

Anthos clusters on VMware 1.12.3-gke.23 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.12.3-gke.23 runs on Kubernetes 1.23.8-gke.1900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.13, 1.12, and 1.11.

  • Fixed the issue of a race condition that blocks the deletion of an old machine object during cluster upgrade or update.
  • Fixed an issue for clusters enabled with Anthos Network Gateway where the NetworkGatewayGroup object may erroneously report nodes as having NotHealthy status.
  • Fixed an issue where creating or updating NetworkGatewayGroup objects fails because of a webhook IP conflict error.
  • Fixed the following vulnerabilities:

October 13, 2022

Anthos clusters on VMware 1.11.4-gke.32 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.11.4-gke.32 runs on Kubernetes 1.22.8-gke.204.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.13, 1.12, and 1.11.

October 12, 2022

The Connect Agent version used in Anthos clusters on VMware versions 1.8 and earlier is no longer supported. If you upgrade your user cluster to these versions, the gkectl updgrade cluster command may fail. If you encounter this issue and need further assistance, you should contact Google Support.

October 11, 2022

If you use gcloud anthos version 1.4.2, and authenticate an Anthos cluster on VMware with gcloud anthos auth, the command fails with the following error:

Decryption failed, no keys in the current key set could decrypt the payload.

To resolve this, you must upgrade gcloud anthos to 1.4.3 or above (gcloud SDK 397.0.0 or above) to authenticate clusters with gcloud anthos auth.

September 29, 2022

Anthos clusters on VMware 1.13.0-gke.525 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.13.0-gke.525 runs on Kubernetes 1.24.2-gke.1900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.13, 1.12, and 1.11.

vSphere versions below 7.0 Update 1 are no longer supported in Anthos clusters on VMware. You must upgrade vSphere (both ESXi and vCenter) to version 7.0 Update 1 or above before you can upgrade to Anthos clusters on VMware 1.13.0. If you want to use the vSphere Container Storage Interface driver or NFSv3, then you must upgrade to vSphere 7.0 Update 2 or a later update of version 7.0.

Cluster life-cycle Improvements:

  • GA: A new asynchronous variation of the user cluster upgrade is now supported. With this variation, the gkectl upgrade cluster command starts the upgrade and completes. You don't need to watch the output of the command for the entire duration of the upgrade. For more details, see Upgrade a user cluster.
  • Preview: You can now update node pools either sequentially or maintain the default parallel behavior by specifying the value of maximumConcurrentNodePoolUpdate in your user cluster configuration file. Setting the value to 1 will configure the node pool update to be sequential, which can potentially avoid two issues — resource quota limit issue and PDB deadlock issue.
  • Introduced an admin cluster controller for managing the admin cluster lifecycle.
  • Added new preflight checks:
    • Check that node IPs are in the subnet for IPAM.
    • A new preflight check was added to validate the clusterLocation field under stackdriver and cloudAuditLogging. This preflight check requires the component access service account to have the compute.viewer role, and the compute.googleapis.com to be allowlisted in the HTTP proxy and firewall settings. If you use an invalid value in the clusterLocation, the preflight check will fail. You can correct the invalid clusterLocation by removing the stackdriver and/or cloudAuditLogging configurations from the admin or user cluster configuration files, applying the changes with gkectl update, and then add the corrected configurations back. Or, you can use --skip-validation-gcp to skip the check. Note that having an invalid clusterLocation will cause a failure to export logs and metrics.
    • For a cluster in static IP mode, you need to have one IP address for each node and an additional IP address. This additional IP address will be used for a temporary node during cluster update, upgrade and auto-repair.
    • Validate that IP addresses are not in docker IP range in IPAM mode.
    • Check to make sure there is no node port collision among different user clusters in manual load balancing mode.
    • Check datastore size to ensure it has enough capacity for surge machine.
    • Check for an available IP address for creating Windows VM template in IPAM mode.
    • PDB preflight check to prevent multiple PDBs from matching with the same pod.

Platform enhancements:

  • GA: Support for cos OS image type in admin cluster nodes is now generally available. You can update the admin node image type with the gkectl update admin command.
  • Preview: A new user cluster deployment model with support for multi-vCenter deployments is available as a preview feature. For more details on how to create a user cluster with this new model, see Create a user cluster with a new installation model.
  • Preview: vSphere CSI volume snapshot is now available as a preview feature. This feature provides the ability to create volume snapshots and restore volumes from snapshots using VMware Cloud Native Storage. To use this feature, you must update both vCenter Server and ESXi to version 7.0 Update 3 or later.

Security enhancements:

  • GA: Support for storing credentials for user clusters as Kubernetes Secrets is generally available.

    • With this feature, users can prepare credentials for the user cluster, and store them as Kubernetes Secrets in the admin cluster before a user cluster is created. After credential preparation, users can delete the Secrets configuration file which contains the user cluster credentials from the admin workstation. When creating a user cluster, the prepared credentials will be used. For more details, see Configure prepared credentials for user clusters.
  • Kubernetes service account (KSA) Signing Key rotation is supported on user clusters. For more details, see Rotate KSA signing keys.

  • GA: Component access SA key rotation for both admin and user clusters is generally available.

  • GA: You can set up Connect gateway to use Google Group membership for authorization. For more information, see Set up the Connect gateway with Google Groups.

  • Changed kube-scheduler, kube-etcd, kube-apiserver and Key Management Service (KMS) components to run in rootless mode in the user cluster.

Simplify day-2 operations:

  • Preview: Added support of multi-line parsing for Go and Java logs.
  • GA: Launched the enablement of Google Cloud Managed Service for Prometheus to track metrics in Anthos on vSphere clusters, and introduced two separate flags to enable logging and monitoring for user applications separately: EnableCloudLoggingForApplications and EnableGMPForApplications. You can monitor and alert on the applications using Prometheus with Google-managed Prometheus without managing and operating Prometheus. You can set enableGMPForApplications in the Stackdriver spec to enable Google Managed Prometheus for application metrics without any other manual steps, and the Google Managed Prometheus components are then set up automatically. See Enable Managed Service for Prometheus for user applications for details.

  • Added a new Anthos Utilization Metering dashboard in Cloud Monitoring to monitor cluster health. The dashboard shows CPU and memory utilization in the clusters by namespace and Pod labels.

  • Upgraded to Ubuntu 20.04 and containerd 1.6.
  • connectgateway.googleapis.com API is now required to create new clusters in 1.13.0.
  • Updated the gcloud version in the admin workstation to 401.0.0.
  • Increased the default boot disk size for the admin workstation to 100GB.
  • SImplified the gkectl diagnose snapshot scenario usage. The --scenario flag is no longer needed for the admin cluster snapshot. Use system (default) or all values to specify scenarios for the user cluster snapshot. For more details, see Diagnosing cluster issues.
  • Improved gkectl diagnose cluster to detect and diagnose two general issues:
    • Node draining issues can block cluster upgrade
    • Kubernetes Cluster API resource managed by an Anthos clusters on VMware bundle might be accidentally modified which can cause failure of system components, or cluster upgrade or update failure.
  • Enforced admin cluster registration with preflight checks.

    • This also applies to admin clusters to be upgraded to 1.13. You can run gkectl update admin to register existing 1.12 admin clusters.
    • You can skip this check with the --skip-validation-config flag if you cannot register admin clusters for certain reasons.
  • Configuration for Logging and Monitoring is now enforced in admin and user cluster configuration files during creation preflight checking. You can run gkectl update cluster and gkectl update admin to enable Logging and Monitoring in existing 1.12 user or admin clusters before upgrading to 1.13. Otherwise, upgrade preflight checks will emit a warning. You can skip these checks with the --skip-validation-stackdriver flag if you cannot enable Logging and Monitoring for certain reasons. However, enabling Logging and Monitoring is strongly recommended to get better Google support, and there is no charge for this service on Anthos.

  • When Logging and Monitoring is enabled, the values of the gkeConnect.projectID field, stackdriver.projectID field, and cloudAuditLogging.projectID field must all be the same in the cluster configuration files. Otherwise, cluster creation preflight checks would fail with an error, and upgrade preflight checks would emit a warning. You can also skip these checks with the --skip-validation-stackdriver flag, but this is not recommended as using different project IDs for stackdriver and gkeconnect may cause friction during support and fleet management. Note you can still send logs and metrics to a different project through Cloud Logging sinks and metric viewer scoping.

  • Migrated metrics-server and addon-resizer to a new namespace: gke-managed-metrics-server.

  • Refined kube-state-metrics so that only core metrics are collected by default. Fewer resources are needed to collect this optimized set of metrics, which improves overall performance and scalability.

  • Fixed the issue of cloud-init log not showing in the serial console for Ubuntu.
  • Fixed the issue where user cluster check-config fails when the admin cluster uses cos as the osImageType.
  • Updated virtual hardware version to version 15 for creating VMs in Anthos cluster on VMware 1.13.0.
  • Fixed the issue of two missing metrics, scheduler and controller-manager, in the admin and user cluster.
  • Fixed the issue of an empty CPU readiness chart in OOTB dashboards that was caused by deprecated metrics.
  • Fixed the issue where you may not be able to add a new user cluster if a user cluster is stuck in the deletion process, and your admin cluster is set up with a MetalLB load balancer configuration.
  • Fixed the following vulnerabilities:
  • In the configuration file template generated by gkectl create-config cluster, the pre-populated value for the commented field kubeception is shown as false, while the default value is true.
  • In the configuration file template generated by gkectl create-config admin, gkeConnect is shown as an optional section, however it is actually a required section.

September 28, 2022

Anthos clusters on VMware 1.12.2-gke.21 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.12.2-gke.21 runs on Kubernetes 1.23.8-gke.1900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

  • Fixed the issue where you may not be able to add a new user cluster if a user cluster is stuck in the deletion process, and your admin cluster is set up with a MetalLB load balancer configuration.
  • Fixed an issue where istiod starts up very slowly when connectivity to the Google Cloud metadata service is partially broken.
  • Fixed the issue where the admin control plane VM template is deleted after a resumed admin cluster upgrade attempt.
  • Fixed the issue where user cluster check-config fails when the admin cluster uses cos as the osImageType.
  • Fixed the following vulnerabilities:

September 08, 2022

Anthos clusters on VMware 1.10.7-gke.15 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.7-gke.15 runs on Kubernetes 1.21.14-gke.2100.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

Fixed for v1.10.7

Anthos clusters on VMware 1.11.3-gke.45 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.11.3-gke.45 runs on Kubernetes 1.22.8-gke.204.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

The gkectl diagnose cluster command automatically runs when gkectl diagnose snapshot is run, and the output is saved in a new folder in the snapshot called /diagnose-report.

Fixed for v1.11.3

August 25, 2022

Anthos clusters on VMware 1.12.1-gke.57 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.12.1-gke.57 runs on Kubernetes 1.23.5-gke.1505.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

  • GA: You can now have your GKE clusters in separate vSphere clusters. With this feature, you can deploy the admin cluster in one vSphere cluster, and a user cluster in a different vSphere cluster.
  • Fixed the issue where mounting emptyDir volume with exec option on Container-Optimized OS (COS) nodes fails with permission error.
  • Fixed the issue where enabling and disabling cluster autoscaler sometimes prevents nodepool replicas from being updated.
  • Fixed the manual node repair issue where manually adding the onprem.cluster.gke.io/repair-machine Machine annotation can trigger VM recreation without deleting the Machine object.
  • Switched back to cgroup v1 (hybrid) for Container Optimized OS (COS) nodes because cgroup v2 (unified) could potentially cause instability for your workloads in a COS cluster.
  • Fixed the issue where running gkectl repair admin-master after a failed admin cluster upgrade attempt caused subsequent admin upgrade attempts to fail. A preflight check has been added for gkectl repair admin-master to prevent the process from using a template that doesn't match the admin cluster checkpoint.
  • Fixed the issue where kubectl describe might error or timeout if resource number is too high during a cluster snapshot.
  • Fixed the following vulnerabilities:

August 12, 2022

Anthos clusters on VMware 1.10.6-gke.36 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.6-gke.36 runs on Kubernetes 1.21.14-gke.2100.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

  • Fixed the issue where mounting emptyDir volume with exec option on Container-Optimized OS (COS) nodes fails with permission error.
  • Fixed the issue where enabling and disabling cluster autoscaler sometimes prevents nodepool replicas from being updated.
  • Fixed the following vulnerabilities:

August 02, 2022

A new vulnerability CVE-2022-2327 has been discovered in the Linux kernel that can lead to local privilege escalation. This vulnerability allows an unprivileged user to achieve a full container breakout to root on the node.

For more information, see the GCP-2022-018 security bulletin.

July 27, 2022

Anthos clusters on VMware 1.11.2-gke.53 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.11.2-gke.53 runs on Kubernetes 1.22.8-gke.204.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

  • Fixed a known issue in which the cluster backup feature affected the inclusion of always-on secrets encryption keys in the backup.
  • Fixed a known issue of high-resource usage when AIDE runs as a cron job, by disabling AIDE by default. This fix affects compliance with CIS L1 Server benchmark 1.4.2: Ensure filesystem integrity is regularly checked. Customers can opt in to re-enable the AIDE if needed. To re-enable the AIDE cron job, see Configure AIDE cron job.
  • Fixed a known issue where gke-metrics-agent DaemonSet has frequent CrashLoopBackOff errors by upgrading to gke-metrics-agent v1.1.0-anthos.14.
  • Fixed the following vulnerabilities:

July 19, 2022

Anthos clusters on VMware 1.9.7-gke.8 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.7-gke.8 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

  • Fixed a known issue in which the cluster backup feature affected the inclusion of always-on secrets encryption keys in the backup.
  • Fixed a known issue of high-resource usage when AIDE runs as a cron job, by disabling AIDE by default. This fix affects compliance with CIS L1 Server benchmark 1.4.2: Ensure filesystem integrity is regularly checked. Customers can opt in to re-enable the AIDE if needed. To re-enable the AIDE cron job, see Configure AIDE cron job.
  • Fixed the following vulnerabilities:

July 07, 2022

Anthos clusters on VMware v1.12.0-gke.446 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware v1.12.0-gke.446 runs on Kubernetes v1.23.5-gke.1504.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

Announcements

  • vSphere releases for versions lower than version 7.0 Update 2 are deprecated in Kubernetes 1.24. VMware's General Support for vSphere 6.7 will end on October 15, 2022. Customers are recommended to upgrade vSphere (both ESXi and vCenter) to version 7.0 Update 2 or above. vSphere versions less than version 7.0 Update 2 will no longer be supported in Anthos clusters on VMware in an upcoming version. You must upgrade vSphere to 7.0 Update 2 or above before you can upgrade to Anthos clusters on VMware 1.13.0.

  • Beta versions of VolumeSnapshot CRDs are deprecated in Kubernetes v1.20 and are unsupported in the Kubernetes v1.24 release.
    The upcoming Anthos clusters on VMware version 1.13 release will no longer serve v1beta1 VolumeSnapshot CRDs. Make sure that you migrate manifests and API clients to use snapshot.storage.k8s.io/v1 API version, available since Kubernetes v1.20. All existing persisted objects remain accessible via the new snapshot.storage.k8s.io/v1 APIs.

  • The dockershim component in Kubernetes enables cluster nodes to use the Docker Engine container runtime. However, Kubernetes 1.24 removed the dockershim component. Starting from Anthos clusters on VMware version 1.12.0, you cannot create new clusters that use the Docker Engine container runtime. All new clusters must use the default container runtime Containerd. A cluster update will also be blocked if you want to switch from containerd node pool to docker node pool, or if you add new docker node pools. For existing version 1.11.x clusters with docker node pools, you can continue upgrading it to version 1.12.0, but you must update the node pools to use containerd before you can upgrade to version 1.13.0 in the future.

Breaking changes:

In Kubernetes 1.23, the rbac.authorization.k8s.io/v1alpha1 API version is removed. Instead, use the rbac.authorization.k8s.io/v1 API. See the Kubernetes 1.23.5 release notes.

Platform enhancements:

  • General Availability (GA): Separate vSphere data centers for the admin cluster and the user clusters are supported.
  • GA: Anthos Identity service LDAP authentication is supported.
  • GA: User cluster control-plane node and admin cluster add-on node auto sizing is supported.

Security enhancements:

  • Preview: Preparing credentials for user clusters as Kubernetes secrets before cluster creation.

    • The credential preparation feature prepares the credentials before a user cluster is created. After credential preparation, user cluster credentials are saved as versioned Kubernetes secrets in the admin cluster, and the template which is used for credential preparation can be deleted from the admin workstation. When creating a user cluster, it only needs to configure the namespace and the versions of the prepared secrets in the user cluster config file. Using this feature can help protect user cluster credentials.
  • Preview: The gkectl update credentials command supports rotating the component access SA key for both the admin and the user clusters.

  • The COS node image shipped in version 1.12.0 is qualified with the Center for Internet Security (CIS) L1 Server Benchmark.

  • The gkectl update credentials command supports register service account key rotation.

Cluster lifecycle Improvements:

  • Preview: You can configure the time duration of Pod Disruption Budget (PDB) violation timeout during a node drain. The default behavior is to always block on a PDB violation and to not force-delete pods during node drain, to avoid unexpected data corruption, and this default is unchanged. In certain cases, when users want to unblock the PDB violation deadlock with the bound timeout during cluster upgrade, they can apply the special annotation onprem.cluster.gke.io/pdb-violation-timeout: TIMEOUT on the machine objects.

Simplify day-2 operations

  • Preview: Launched the enablement of Google Cloud Managed Service for Prometheus to track metrics in Anthos on vSphere clusters, and introduced two separate flags to enable logging and monitoring for user applications separately: EnableCloudLoggingForApplications and EnableGMPForApplications. The legacy flag EnableStackdriverForApplications is deprecated, and will be removed in a future release. Customers can monitor and alert on the applications using Prometheus with Google-managed Prometheus without managing and operating Prometheus. Customers can set enableGMPForApplications in the Stackdriver spec to enable Google Managed Prometheus for application metrics without any other manual steps, and the Google Managed Prometheus components are then set up automatically. See Enable Managed Service for Prometheus for user applications for details.

  • All sample dashboards to monitor cluster health are available in Cloud Monitoring sample dashboards. Customers can install the dashboards with one click. See Install sample dashboards.

  • Improvements to cluster diagnosis: The gkectl diagnose cluster command automatically runs when gkectl diagnose snapshot is run, and the output is saved in a new folder in the snapshot called /diagnose-report.

  • The gkectl diagnose cluster command surfaces more detailed information for issues arising from virtual machine creation.

  • A validation check for the existence of an OS image has been added to the gkectl update admin and gkectl diagnose cluster commands.

  • A blocking preflight check has been added. This check validates that the vCenter.datastore specified in the cluster configuration file doesn't belong to a DRS-enabled datastore cluster.

Functionality changes:

  • Upgraded COS from m93 to m97, and containerd to 1.6 on COS.

  • Metrics agent: Upgraded gke-metrics-agent from 1.1.0 to 1.8.3, which fixes some application metrics issues. The offline buffer in the metrics agent can now discard old data based on the age of metrics data, in addition to the total size of buffer. Metrics data is stored in an offline buffer for at most 22 hours in case of a network outage.

  • New metrics: Added 7 resource utilization metrics.

    • k8s_container:
      • container/cpu/request_utilization
      • container/cpu/limit_utilization
      • container/memory/request_utilization
      • container/memory/limit_utilization
    • k8s_node:
      • node/cpu/allocatable_utilization
      • node/memory/allocatable_utilization
    • k8s_pod:
      • pod/volume/utilization

Fixes

Known issues:

  • On the out-of-the-box monitoring dashboards, the GKE on-prem Windows pod status and GKE on-prem Windows node status also show data from Linux clusters.

  • The scheduler metrics, such as scheduler_pod_scheduling_attempts, are not collected in version 1.12.0 due to a configuration issue in the metric collector.

In version 1.12.0, cgroup v2 (unified) is enabled by default for Container Optimized OS (COS) nodes. This could potentially cause instability for your workloads in a COS cluster. We will switch back to cgroup v1 (hybrid) in version 1.12.1. If you are considering using version 1.12 with COS nodes, we suggest that you wait until the 1.12.1 release.

June 24, 2022

Three new memory corruption vulnerabilities (CVE-2022-29581, CVE-2022-29582, CVE-2022-1116) have been discovered in the Linux kernel. These vulnerabilities allow an unprivileged user with local access to the cluster to achieve a full container breakout to root on the node. For more information, refer to the GCP-2022-016 security bulletin.

June 16, 2022

Anthos clusters on VMware 1.10.5-gke.26 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.5-gke.26 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.11, 1.10, and 1.9.

Fixed for version 1.10.5

  • Fixed the issue where admin cluster backup did not back up always-on secrets encryption keys. This caused repairing an admin cluster using gkectl repair master --restore-from-backup to fail when always-on secrets encryption was enabled.

  • Fixed the issue of high resource usage when AIDE runs as a cron job by disabling AIDE by default. This fix will affect compliance with CIS L1 Server benchmark 1.4.2: Ensure filesystem integrity is regularly checked.

    To re-enable the AIDE cron job, see Configure AIDE cron job.

Fixed the following vulnerabilities

June 03, 2022

Cluster lifecycle improvements

GA: You can use the Cloud console to create, update, and delete Anthos on VMware user clusters. For more information, see Create a user cluster in the Cloud console.

May 26, 2022

Anthos clusters on VMware 1.11.1-gke.53 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.11.1-gke.53 runs on Kubernetes 1.22.8-gke.200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.11, 1.10, and 1.9.

Fixed for v1.11.1

  • Fixed the known issue where v1.11.0 user clusters cannot be created with a v1.10.x admin cluster.

  • Fixed the issue where the gkectl logs might be truncated when admin cluster creation has failed.

  • Fixed the issue that Anthos Identity Service with LDAP failed to authenticate against some older Active Directory servers when the user id contains a comma.

Fixed the following vulnerabilities

High-severity CVEs

Medium-severity CVEs

Anthos clusters on VMware 1.10.4-gke.32 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.4-gke.32 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.11, 1.10, and 1.9.

Fixed for v1.10.4

Fixed the following vulnerabilities

High-severity CVEs

RBAC fixes

  • anetd

    • Changed to use kubelet kubeconfig to only allow the anetd to update its own node resource, and the pod resources that are running on the node.
  • antrea-controller / anetd-win

    • Instead of reusing the RBAC config for anetd, created a dedicated RBAC config for antrea and reduced the unnecessary permissions.
  • clusterdns-controller

    • Scoped down clusterdns permissions to default resource name.
    • Scoped down configmap permissions to coredns resource name.
    • Removed create/delete permissions for configmaps. The coredns configmap is now created by the bundle, with create-only annotation to ensure we don't overwrite existing config on upgrade.
  • dns-autoscaler

    • Removed unneeded permissions, and scoped down needed permissions to a particular resource using resourceNames.
    • Restricted get configmap for dns autoscaler.
  • gke-usage-metering

    • Restricted the permission to the kube-system namespace where possible
  • seesaw-load-balancer

    • Restricted the permission by setting resource names.

May 19, 2022

Anthos clusters on VMware 1.9.6-gke.1 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.6-gke.1 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.11, 1.10, and 1.9.

Secret encryption key rotation does not fail when the cluster has more than 1000 secrets.

Fixed the following vulnerabilities

Changed scope of certain RBAC permissions

We have scoped down the over-privileged RBAC permissions for the following components in this release:

  • clusterdns-controller:

    • Scope down clusterdns permissions to 'default' resource name.
    • Scope down configmap permissions to 'coredns' resource name.
    • Remove create/delete permissions for configmaps.
  • seesaw-load-balancer:

    • Restrict the permission to access secrets by specifying certain secret names instead of allowing the access for all secrets.
  • coredns-autoscaler:

    • Reduce the get configmap permission to a specific configmap resource name.
  • anetd / anet-operator:

    • Changed to use kubelet kubeconfig to restrict the anetd to only update its own node resource, and the pod resources that are running on the node.
  • gke-usage-metering:

    • Restrict the permission to only kube-system namespace.
  • ANG (Anthos Network Gateway)

    • Remove/modify RBAC roles and lower the use of kube-rbac proxy in ANG.

May 02, 2022

Creating a 1.11.0 user cluster with a 1.10 admin cluster fails. If you need a 1.11.0 user cluster, use the following workaround:

  1. Create a 1.10 user cluster.

  2. Upgrade the user cluster to 1.11.0.

  3. Optionally, upgrade the admin cluster to 1.11.0. After the admin cluster is upgraded, you can create 1.11.0 user clusters.

For details on how to upgrade, see Upgrading Anthos clusters on VMware.

April 28, 2022

Two security vulnerabilities, CVE-2022-1055 and CVE-2022-27666, have been discovered in the Linux kernel. Each can lead to a local attacker being able to perform a container breakout, privilege escalation on the host, or both. These vulnerabilities affect all Linux node operating systems (Container-Optimized OS and Ubuntu). For instructions and more details, see the GCP-2022-014 security bulletin.

April 27, 2022

Anthos clusters on VMware 1.11.0-gke.543 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.11.0-gke.543 runs on Kubernetes v1.22.8-gke.200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.11, 1.10, and 1.9.

  • The structure of the Anthos clusters on VMware documentation is substantially different from previous versions. For details, see New documentation structure.

  • Dockershim, the Docker Engine integration code in Kubernetes, was deprecated in Kubernetes 1.20, and will be removed in Kubernetes 1.24. Thus, the ubuntu OS node image type will not be supported at that time. You should plan to convert your node pools to use either the ubuntu_containerd or the cos OS image type as soon as possible. For more details, see Using containerd for the container runtime.

  • The connect project is now called fleet host project. For more information, see Fleet host project.

  • Kubernetes 1.22 has deprecated certain APIs, a list of which can be found in Kubernetes 1.22 deprecated APIs. In your manifests and API clients, you need to replace references to the deprecated APIs with references to the newer API calls. For more information, see the What to do section in the Deprecated API Migration Guide.

  • Several Anthos metrics have been deprecated for which data is no longer collected. For a list of deprecated metrics, including instructions to migrate to replacement metrics, see Replace deprecated metrics in dashboard.

Cluster lifecycle Improvements:

  • Admin cluster creation is now resumable. If admin cluster creation fails at any step, you can now rerun gkectl create admin to resume the admin cluster creation.

Platform enhancements:

  • Windows Node Pool:

    • GA: Support for Windows Dataplane V2 is generally available. Windows Dataplane V2 is now enabled by default for Windows node pools. This means that containerd is also enabled by default for Windows node pools.
    • Added deprecation notice for Windows nodes that Docker and Flannel will be removed in a subsequent version. If you are using Docker container runtime, you should update your user cluster configuration with gkectl update cluster to use containerd and Windows Dataplane V2 instead.
    • Added support for idempotent Windows startup script execution after node reboot.
    • New Windows Server 2019 OS build version 10.0.17763.2565 has been qualified for Anthos 1.11.0.
  • Egress NAT Gateway:

    • GA: Egress NAT Gateway is now generally available. With this feature, you can configure source network address translation (SNAT) so that certain egress traffic from user clusters is given a predictable source IP address. This enables return traffic from workloads outside the originating cluster to reach the cluster. For more information, see Configuring an egress NAT gateway.
  • MetalLB:

    • GA: The new load balancer option, MetalLB, is now generally available as another bundled software load balancer in addition to Seesaw.
  • Multinic logs:

    • The Fluent Bit Logging agent can now collect logs for Pods with multiple network interfaces, and send them to Cloud Logging. Logs will be collected as system logs and no extra charges will apply.

Security enhancements: - Admin cluster CA Certificate Rotation:

  • GA: You can now use gkectl to rotate system root CA certificates for admin clusters.

Simplify day-2 operations:

  • GA: gkectl update admin supports registering an existing admin cluster.
  • Cluster diagnosis improvements:
    • gkectl diagnose cluster automatically runs during admin or user cluster upgrade failure.
    • gkectl diagnose cluster searches and surfaces related events for any validation failure.
  • GA: gkectl update supports enabling and disabling of Cloud Logging and Cloud Monitoring in an existing cluster. You can also enable or disable logging to Cloud Audit Logs with gkectl update on both admin and user clusters.
  • Changes made to the metrics-server-config ConfigMap are now preserved across cluster upgrades.

Terminology changes:

The connect project is now called fleet host project. For more information, see Fleet host project.

We have removed the over-privileged RBAC permissions for the following components.

RBAC policies applied to service account on the admin cluster

When you register a 1.11.0+ admin cluster to a fleet, a service account is created with the needed role-based access control (RBAC) policies that lets the Connect agent send requests to the admin cluster's Kubernetes API server on behalf of the service account. The service account and RBAC policies are needed so that you can manage the lifecycle of your user clusters in the Google Cloud console. For more information, see Admin cluster RBAC policies.

April 18, 2022

Anthos clusters on VMware 1.10.3-gke.49 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.3-gke.49 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

  • Fixed issue where scale down sometimes took longer than expected when cluster autoscaling is enabled in a Dataplane-v2 cluster.
  • Fixed issue where the state of an admin cluster that uses a COS image is lost during an admin cluster upgrade or admin cluster control plane repair.
  • Added keep-alive configuration to avoid timeout issues for long running vSphere operations in gkeadm.
  • RBAC fixes:

    • coredns-autoscaler:
    • Removed configmaps create permission.
    • Removed replicasets/scale permissions.
    • Removed replicationcontrollers/scale permissions.
    • Scoped down deployments/scale permissions to coredns resource name.

    • clusterdns-controller:

      • Scoped down clusterdns permissions to default resource name.
      • Scoped down configmap permissions to coredns resource name.
      • Removed create/delete permissions for configmaps. The coredns configmap is now created by the bundle, with create-only annotation to ensure we don't overwrite existing config on upgrade.
    • auto-resize controller:

    • Scoped down leases permissions to onprem-auto-resize-leader-election resource name.

    • Scoped down configmaps permissions to onprem-auto-resize-leader-election resource name.

    • load-balancer-f5:

    • Removed get list watch create patch delete permissions for configmaps.

    • Removed update create patch for events nodes.

    • Removed create permissions for services/status and services.

    • Removed view permission for secret bigip-login-9t8mzp.

  • Fixed high-severity CVEs:

April 12, 2022

A security vulnerability, CVE-2022-23648, has been discovered in containerd's handling of path traversal in the OCI image volume specification. Containers launched through containerd's CRI implementation with a specially-crafted image configuration could gain full read access to arbitrary files and directories on the host.

For more information, see the GCP-2022-013 security bulletin.

April 11, 2022

A security vulnerability, CVE-2022-0847, has been discovered in the Linux kernel version 5.8 and later that can potentially escalate container privileges to root.

For more information, see the GCP-2022-012 security bulletin.

March 24, 2022

Anthos clusters on VMware 1.9.5-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.5-gke.2 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

March 15, 2022

Anthos clusters on VMware 1.8.8-gke.1 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.8-gke.1 runs on Kubernetes v1.20.12-gke.1500.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

  • Clusters with enableDataplaneV2 set to true can experience connectivity issues between Pods due to anetd daemons (running as a Daemonset) entering a software deadlock. While in this state, anetd daemons will see stale nodes (previously deleted nodes) as peers and miss newly added nodes as new peers. If you have experienced this issue, follow these instructions to restart the anetd daemons and restore connectivity.

March 03, 2022

Anthos clusters on VMware 1.10.2-gke.34 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.2-gke.34 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

Changes

  • gkectl diagnose now reports a broken cluster caused by an admin cluster registration error during creation.

Fixes

  • Fixed issue: Failure to register admin cluster during creation

    • You can upgrade an admin cluster to version 1.10.2 without applying the documented mitigation, even if the cluster failed to register with the provided gkeConnect configuration during its creation. You can fix the registration issue by running gkectl update admin with the correct gkeConnect configuration after upgrade.
    • If the cluster registration failed when creating a version 1.10.2 admin cluster, no mitigation is needed to upgrade to later versions after version 1.10.2.
  • Fixed ".local" DNS lookup issue caused by Ubuntu 20.04 systemd-resolved configuration changes.

  • Fixed issue where Docker bridge IP incorrectly used 172.17.0.1/16 instead of 169.254.123.1/24.

  • Fixed unexpectedly high network traffic to monitoring.googleapis.com in a newly created cluster.

  • Fixed an issue that admin cluster creation or upgrade might be interrupted by temporary vCenter connection issue.

  • Fixed critical CVEs:

  • Fixed this high-severity CVE:

When cluster autoscaling is enabled in a Dataplane-v2 cluster, scale down may sometimes take longer than expected. For example, it may take approximately 20 minutes instead of 10 minutes as in a normal case.

February 24, 2022

The Envoy project recently discovered a set of vulnerabilities. All issues listed in the security bulletin are fixed in Envoy release 1.21.1. For more information, see the GCP-2022-008 security bulletin.

February 23, 2022

Anthos clusters on VMware 1.9.4-gke.3 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.4-gke.3 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

Fixes

  • Upgraded Cilium to version 1.10.5.

    • This upgrade also fixed the issue where unreachable node endpoints caused application 503 errors. Previously, when cilium-health status was run in anetd daemons, the output showed stale remote nodes.
  • Fixed unexpectedly high network traffic to monitoring.googleapis.com in a newly created cluster.

  • Fixed these high-severity CVEs:

When cluster autoscaling is enabled in a Dataplane-v2 cluster, scale down may sometimes take longer. For example, it may take approximately 20 minutes instead of 10 minutes as in a normal case.

February 17, 2022

Anthos clusters on VMware 1.8.7-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.7-gke.0 runs on Kubernetes v1.20.12-gke.1500.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

Fixes:

February 14, 2022

A security vulnerability, CVE-2022-0492, has been discovered in the Linux kernel's cgroup_release_agent_write function. The attack uses unprivileged user namespaces, and under certain circumstances, this vulnerability can be exploitable for container breakout. For more information, see the GCP-2022-006 security bulletin.

February 11, 2022

A security vulnerability, CVE-2021-43527, has been discovered in any binary that links to the vulnerable versions of libnss3 found in NSS (Network Security Services) versions prior to 3.73 or 3.68.1. Applications using NSS for certificate validation or other TLS, X.509, OCSP or CRL functionality may be impacted, depending on how they configure NSS.

For more information, see the GCP-2022-005 security bulletin.

February 10, 2022

Anthos clusters on VMware 1.10.1-gke.19 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.1-gke.19 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

  • Removed unintentional infrastructure log lines from the cluster snapshot.
  • Upgraded the Connect Agent version to 20211210-01-00.

    • This upgrade also fixed the issue where the Connect Agent restarts unexpectedly on either a newly-created cluster or an existing cluster that uses Anthos Identity Service to manage the Anthos Identity Service ClientConfig.
  • Fixed two high severity CVEs:

  • Fixed the short metric probing interval issue that sends a high volume of traffic to the monitoring.googleapis.com endpoint in a cluster.

  • If your admin cluster failed to register with the provided gkeConnect spec during creation, upgrading to a later 1.9 or 1.10 release will fail with the following error:

    failed to migrate to first admin trust chain: failed to parse current version "": invalid version: "" failed to migrate to first admin trust chain: failed to parse current version "": invalid version: ""

    If you have experienced this issue, follow these instructions to fix the gkeConnect registration issue before you upgrade your admin cluster.

February 07, 2022

A security vulnerability, CVE-2021-4034, has been discovered in pkexec, a part of the Linux policy kit package (polkit), that allows an authenticated user to perform a privilege escalation attack. PolicyKit is generally used only on Linux desktop systems to allow non-root users to perform actions such as rebooting the system, installing packages, restarting services, and so forth, as governed by a policy.

For instructions and more details, see the GCP-2022-004 security bulletin.

February 01, 2022

Three security vulnerabilities, CVE-2021-4154, CVE-2021-22600, and CVE-2022-0185, have been discovered in the Linux kernel, each of which can lead to either a container breakout, privilege escalation on the host, or both. These vulnerabilities affect all Linux node operating systems (COS and Ubuntu) on Anthos clusters on VMware.

For instructions and more details, see the GCP-2022-02 security bulletin.

January 24, 2022

Anthos clusters on VMware 1.9.3-gke.4 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.3-gke.4 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

Fixes for version 1.9.3:

  • Fixed issue where special characters in the vSphere username are not properly escaped.

Changes in version 1.9.3:

  • Upgraded the Connect Agent version to 20211210-01-00.

    • This upgrade also fixed the issue where the Connect Agent restarts unexpectedly on a newly-created cluster that uses Anthos Identity Service to manage the Anthos Identity Service ClientConfig.

Known issue in version 1.9.3:

  • The Connect Agent restarts unexpectedly on an existing cluster that uses Anthos Identity Service to manage the Anthos Identity Service ClientConfig. If you have experienced this issue, follow these instructions to upgrade the Connect Agent version.

Anthos clusters on VMware 1.8.6-gke.4 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.6-gke.4 runs on Kubernetes 1.20.12-gke.1500.

Fixes for version 1.8.6:

  • Fixed issue where special characters in the vSphere username are not properly escaped.

December 23, 2021

  • When deploying Anthos clusters on VMware releases with a version number of 1.9.0 or higher, that have the Seesaw bundled load balancer in an environment that uses NSX-T stateful distributed firewall rules, stackdriver-operator might fail to create gke-metrics-agent-conf ConfigMap and cause gke-connect-agent Pods to be in a crash loop. The underlying issue is that stateful NSX-T distributed firewall rules terminate the connection from a client to the user cluster API server through the Seesaw load balancer because Seesaw uses asymmetric connection flows. The integration issue with NSX-T distributed firewall rules affect all Anthos clusters on VMWare releases that use Seesaw. You might see similar connection problems on your own applications when they create large Kubernetes objects whose sizes are bigger than 32K. Follow these instructions to disable NSX-T distributed firewall rules, or to use stateless distributed firewall rules for Seesaw VMs.

  • If your clusters use a manual load balancer, follow these instructions to configure your load balancer to reset client connections when it detects a backend node failure. Without this configuration, clients of the Kubernetes API server might stop responding for several minutes when a server instance goes down.

December 22, 2021

Anthos clusters on VMware 1.10.0-gke.194 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.0-gke.194 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

  • vCenter/ESXi host 6.7u2 and below is no longer supported. Upgrade your vCenter environment to a supported version (6.7U3 and above) before upgrading your clusters.

  • The diskformat parameter is removed from the standard vSphere driver StorageClass as the parameter has been deprecated in Kubernetes 1.21.

  • Preview: Egress NAT gateway:

    • To enable an egress NAT gateway, the advancedNetworking section in the user cluster configuration file replaces the now-deprecated enableAnthosNetworkGateway section.

    • You must create a NetworkGatewayGroup object (previously AnthosNetworkGateway) to configure the egress NAT gateway.

    • Any admin or user clusters that are version 1.9 or earlier, and that are enabled with Anthos Network Gateway, cannot be upgraded. You must delete and recreate those clusters following these instructions.

Cluster lifecycle Improvements:

  • An admin cluster upgrade is resumable after a previous failed admin cluster upgrade attempt.

  • GA: Admin cluster registration during new cluster creation is generally available.

  • Preview: Admin cluster registration when updating existing clusters is available as a preview feature.

Platform enhancements:

  • Preview: A new load balancer option, MetalLB, is available as another bundled software load balancer in addition to Seesaw.This will be the default load balancer choice instead of Seesaw when GA.

  • GA: Support for user cluster node pool autoscaling is generally available.

  • Preview: You can create admin cluster nodes and user cluster control-plane nodes with Container-Optimized OS by specifying the osImageType as cos in the admin cluster configuration file.

  • Windows Node Pool:

    • Preview: The containerd runtime is now available for Windows node pools when Dataplane V2 for Windows is enabled.
    • Node Problem Detector checks containerd service health on the nodes and surfaces problems to the API Server. For version 1.10.0, NPD does not attempt to repair the containerd service.
    • Containerd logs are exported to the Cloud Console.

    • CSI proxy is deployed automatically onto Windows nodes. You can install and use a Windows CSI driver of your choice, such as the SMB CSI driver.

  • GA: The multi-NIC capability to provide additional network interfaces to your Pods is generally available.

  • GA: You can upgrade to Ubuntu 20.04 and containerd 1.5.

Security enhancements:

  • User cluster control plane certificates are automatically rotated at each cluster upgrade. 

Simplify day-2 operations:

  • Preview: gkectl update admin supports the enabling and disabling of Cloud Monitoring and Cloud Logging in the admin cluster. 

  • Changed the collection of application metrics to use a more scalable monitoring pipeline based on OpenTelemetry. This change significantly reduces the amount of resources required to collect metrics.

  • Updated the parser of containerd and kubelet node logs to extract severity level.

  • Introduced the --share-with optional flag in the gkectl diagnose snapshot command to share the read permission after uploading the snapshot to a Google Cloud Storage bucket.

Functionality changes:

  • Replaced the SSH tunnel with Konnectivity service for communication between the user cluster control plane and the user cluster nodes. The Kubernetes SSH tunnel has been deprecated. 

    • You must create two additional firewall rules so that user worker nodes can access ports 8132 on the user control-plane VIP address and get return packets. This is required for the Konnectivity service.

    • Introduced a new konnectivityServerNodePort field in the user cluster manual load balancer configuration. This field is required when creating or upgrading a user cluster, with manual load balancer mode, to version 1.10. 

  • The Ubuntu OS image is upgraded from 18.04 to 20.04 LTS.

    • The python command is no longer available. Any python command should be updated to python3 instead, and the syntax should be updated to Python 3.

    • /etc/resolv.conf now points to /run/systemd/resolve/stub-resolv.conf, instead of /run/systemd/resolve/resolv.conf.

    • The Ubuntu CIS benchmark version changed from v2.0.1 for Ubuntu 18.04 LTS to v1.0.0 for Ubuntu 20.04 LTS.

  • Upgraded COS from m89 to m93.

  • Upgraded containerd from 1.4 to 1.5 on Ubuntu and COS.

  • Changed gkectl diagnose snapshot to use the --all-with-logs scenario by default.

  • The gkeadm command copies the admin workstation configuration file to the admin workstation during creation so it can be used as a backup to re-create the admin workstation later.

  • Increased the Pod priority of kube-state-metrics to improve its reliability when the cluster is under resource contention.

  • Fixed an issue that the Windows nodes were assigned with duplicated IP addresses.

  • Fixed CVE-2021-32760. Because of Ubuntu PPA version pinning, this vulnerability might still be reported by certain vulnerability scanning tools, and thus appear as a false positive even though the underlying vulnerability has been patched.

  • Because of the change to use an OpenTelemetry-based scalable monitoring pipeline for application metrics, Horizontal Pod Autoscaling with user-defined metrics does not work in 1.10.0 unless you explicitly set scalableMonitoring to false, while also ensuring that both enableStackdriverForApplications and enableCustomMetricsAdapter are set to true, in the Stackdriver object.

    As a workaround, you can install a custom Prometheus adapter if you want to use Horizontal Pod Autoscaling with user-defined metrics while still keeping the scalable monitoring default setting for application metrics.

  • Because of a COS 93 configuration issue, IPv6 dualstack does not work correctly for COS node pool nodes in version 1.10.0. If you are using IPv6 dualstack with a COS node pool, wait for an upcoming patch release that addresses this issue.

  • If an admin cluster is created with osImagetype of cos, and you have rotated the audit logging service account key with gkectl update admin, the changes are overridden after the admin cluster control-plane node reboot. In that case, re-run the update command after the admin cluster control-plane node reboot to apply those changes.

  • On COS nodes, the NTP server is configured to time.google.com by default. In DHCP mode, this setting cannot be overridden to use the NTP server provided by your DHCP server. The issue will be fixed in an upcoming patch release. Before then, you can deploy a DaemonSet to override the NTP setting if you want to use a different NTP server in your COS node pool.

November 30, 2021

Anthos clusters on VMware 1.7.6-gke.6 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.6-gke.6 runs on Kubernetes v1.19.15-gke.1900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

  • Fixed issue where special characters in the vSphere username are not properly escaped.
  • Alleviated the high CPU and memory usage by /etc/cron.daily/aide discussed in this issue.
  • Fixed issue where user cluster node is not synching time.
  • Fixed CVE-2021-41103. Because of Ubuntu PPA version pinning, this vulnerability might still be reported by certain vulnerability scanning tools, and appear as a false positive even though the underlying vulnerability has been patched.

November 29, 2021

Anthos clusters on VMware 1.8.5-gke.3 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.5-gke.3 runs on Kubernetes v1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

  • Fixed issue where special characters in the vSphere username are not properly escaped.
  • Alleviated the high CPU and memory usage by /etc/cron.daily/aide discussed in this issue.
  • Fixed issue where user cluster node is not synching time.
  • Fixed CVE-2021-41103. Because of Ubuntu PPA version pinning, this vulnerability might still be reported by certain vulnerability scanning tools, and appear as a false positive even though the underlying vulnerability has been patched.

November 18, 2021

Anthos clusters on VMware 1.9.2-gke.4 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.2-gke.4 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

With version 1.9.2, cert-manager is installed in the cert-manager namespace. Previously, for versions 1.8.2 to 1.9.1, cert-manager was installed in the kube-system namespace.

The cert-manager version is upgraded from 1.0.3 to 1.5.4.

If you already use any ClusterIssuer with a different cluster resource namespace from the default cert-manager namespace, follow these steps if you upgrade to version 1.9.2.

   * Manually copy the related certificates, secrets, or issuers to the cert-manager namespace to use the installed cert-manager after upgrading to 1.9.2.    

   * If you need to use a different version of cert-manager, or if you need to install it in a different namespace, follow these instructions each time that you upgrade your cluster. 

Fixes:

  • Fixed issue with cilium-operator not reconciling CiliumNode for Windows nodes when updating the cluster to add Windows node pools.
  • Fixed issue which could temporarily result in no healthy CoreDNS pods present during cluster operations.
  • Fixed issue where you cannot run gkectl upgrade loadbalancer on a user cluster seesaw load balancer.
  • Fixed issue where node_filesystem metrics report gives wrong size for /run.
  • Fixed CVE-2021-37159. Because of Ubuntu PPA version pinning, this vulnerability might still be reported as a false positive by certain vulnerability scanning tools, although the underlying vulnerability has been patched in the 1.9.2 release.
  • Fixed issue where user cluster node is not synching time.
  • Alleviated the high CPU and memory usage by /etc/cron.daily/aide discussed in this issue.

October 29, 2021

The security community recently disclosed a new security vulnerability CVE-2021-30465 found in runc that has the potential to allow full access to a node filesystem.

For more information, see the GCP-2021-011 security bulletin.

October 27, 2021

Anthos clusters on VMware 1.8.4-gke.1 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.4-gke.1 runs on Kubernetes v1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

Fixes for version 1.8.4:

  • Fixed high-severity CVE-2021-3711.
  • Fixed gkectl check-config failure when Anthos clusters are configured with a proxy whose url contains special characters.
  • Fixed "cert-manager" cainjector leader-election failure.

Known issue in version 1.8.4:

If you have already installed your own cert-manager in your cluster, read the suggested mitigation before upgrading to a version >=1.8.2 in order to avoid an installation conflict with the cert-manager deployed by Anthos clusters on VMware.

  • Installing your cert-manager with Apigee may also result in a conflict with the cert-manager deployed by Anthos clusters on VMware. To avoid this, read the suggested mitigation before upgrading to this version.

Anthos clusters on VMware 1.7.5-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.5-gke.0 runs on Kubernetes v1.19.12-gke.2101.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

Fixes for version 1.7.5:

Fixed gkectl check-config failure when Anthos clusters are configured with a proxy whose url contains special characters.

October 21, 2021

A security issue was discovered in the Kubernetes ingress-nginx controller, CVE-2021-25742. Ingress-nginx custom snippets allow retrieval of ingress-nginx service account tokens and secrets across all namespaces. For more information, see the GCP-2021-024 security bulletin.

October 20, 2021

Anthos clusters on VMware 1.9.1-gke.6 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.1-gke.6 runs on Kubernetes v1.21.5-gke.400.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

  • In version 1.9.0, there was a known issue with restoring an admin cluster using a backup when using a private registry. That has been fixed in version 1.9.1.
  • Fixed gkectl check-config failure that occurs when Anthos clusters are configured with a proxy whose url contains special characters.
  • Fixed "cert-manager" cainjector leader-election failure.

If you have already installed your own cert-manager in your cluster, read the suggested mitigation before upgrading to a version >=1.8.2 in order to avoid an installation conflict with the cert-manager deployed by Anthos clusters on VMware.

  • Installing your cert-manager with Apigee may also result in a conflict with the cert-manager deployed by Anthos clusters on VMware. To avoid this, read the suggested mitigation before upgrading to this version.

October 04, 2021

A security vulnerability, CVE-2020-8561, has been discovered in Kubernetes where certain webhooks can be made to redirect kube-apiserver requests to private networks of that API server. For more information, see the GCP-2021-021 security bulletin.

September 29, 2021

Anthos clusters on VMware 1.9.0-gke.8 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.0-gke.8 runs on Kubernetes v1.21.4-gke.200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

Features:

Cluster lifecycle Improvements:

  • GA: You can register an admin cluster during its creation by filling in the gkeConnect section in the admin cluster configuration file, similar to user cluster registration.

Platform enhancements:

  • Preview: User clusters can now be in a different vSphere datacenter from the admin cluster, resulting in datacenter isolation between the admin cluster and user clusters. This provides greater resiliency in the case of vSphere environment failures.

  • GA: Support for Windows node pools is generally available.This release adds:

    • Preview: Windows DataplaneV2 support, which allows for using Windows Network Policy
    • Node Problem Detector (NPD) support on Windows
    • Streamlined process for preparing Windows images in a private registry
    • Enhanced Flannel CNI support on Windows

    The upstream fixes for the "Windows Pod stuck at terminating status" error are also applied to this release, which improves the stability of running Windows workloads.

  • GA: Support for Container-Optimized OS (COS) node pools is generally available.

  • GA: CoreDNS is now the cluster DNS provider.

    • Clusters that are upgraded to 1.9 will have their KubeDNS provider replaced with CoreDNS. During the upgrade, CoreDNS is first deployed and then KubeDNS is removed, so applications should not observe DNS unavailability. However before upgrading, ensure that your cluster has enough additional resources to deploy CoreDNS. CoreDNS requires 100 millicpu and 170 MiB of memory per instance, all clusters require a minimum of 2 instances, and there is an additional instance deployed for every 16 nodes in the cluster.
    • You can configure cluster DNS options such as upstream name servers by using the new ClusterDNS custom resource.

Security enhancements:

  • GA: Always-on secrets encryption: You can enable secrets encryption with internally generated keys instead of a hardware security module (HSM). Use the gkectl update command to rotate these keys or to enable or disable secrets encryption after cluster creation.
  • Preview: Windows network policy support. This release introduces a new network plugin, Antrea, for Windows nodes. In addition to network connectivity and services support, it provides network policy support. When creating a user cluster, you can set enableWindowsDataplaneV2 to true to enable this feature. Enabling this feature replaces Flannel with Antrea on Windows nodes.
  • Preview: Azure AD group support for Authentication: This feature allows cluster admins to configure RBAC policies based on Azure AD groups for authorization in clusters. This supports retrieval of groups information for users belonging to more than 200 groups, thus overcoming a limitation of regular OIDC configured with Azure AD as the identity provider.

Simplify day-2 operations:

  • Preview: When creating a user cluster, you can set enableVMTracking in the configuration file to true to enable vSphere tag creation and attachment to the VMs in the user cluster. This allows easy mapping of VMs to clusters and node pools. See Enable VM tracking.
  • GA: New metrics agents based on open telemetry are introduced to improve reliability, scalability and resource usage.
  • Preview: You can enable or disable Stackdriver with gkectl update on existing user clusters. You can enable or disable cloud audit logging and monitoring with gkectl update on both admin and user clusters.

Breaking changes:

  • User cluster registration is now required and enforced. You must fill in the gkeConnect section of the user cluster configuration file before creating a new user cluster. You cannot upgrade a user cluster unless that cluster is registered. To unblock the cluster upgrade, add the gkeConnect section to the configuration file and run gkectl update cluster to register an existing 1.8 user cluster.

  • User clusters must be upgraded before the admin cluster. The flag --force-upgrade-admin to allow the old upgrade flow (admin cluster upgrade first) is no longer supported.

  • The following requirements are now enforced when you create a cluster that has logging and monitoring enabled.

    • The Config Monitoring for Ops API is enabled in your logging-monitoring project.
    • The Ops Config Monitoring Resource Metadata Writer role is granted to your logging-monitoring service account.
    • The URL opsconfigmonitoring.googleapis.com is added to your proxy allowlist (if applicable).

Changes:

  • There is now a checkpoint file for the admin cluster, located in the same datastore folder as the admin cluster data disk, with the name DATA_DISK_NAME-checkpoint.yaml, or DATA_DISK_NAME.yaml if the length of DATA_DISK_NAME is greater than the filename length limit. This file is required for future upgrades and should be considered as important as the admin cluster data disk.

    Note: If you have enabled VM encryption in vCenter, you must grant Cryptographer.Access permission to the vCenter credentials specified in your admin cluster configuration file, before trying to create or upgrade your admin cluster.

  • The admin cluster backup with gkectl preview feature introduced in 1.8 now allows updates to clusterBackup.datastore. This datastore may be different from vCenter.datastore so long as it is in the same datacenter as the cluster.

  • The k8s 1.21 release includes the following metrics changes:

    • Add new field status for storage_operation_duration_seconds, so that you can know about all status storage operation latency.
    • The storage metrics storage_operation_errors_total and storage_operation_status_count are marked deprecated. In both cases, the storage_operation_duration_seconds metric can be used to recover equivalent counts (using status=fail-unknown in the case of storage_operations_errors_total).

    • Rename the metric etcd_object_counts to apiserver_storage_object_counts and mark it as stable. The original etcd_object_counts metrics name is marked as "Deprecated" and will be removed in the future.

  • A new GKE on-prem control plane uptime dashboard is introduced with a new metric, kubernetes.io/anthos/container/uptime, for component availability. The old GKE on-prem control plane status dashboard and old kubernetes.io/anthos/up metric are deprecated. New alerts for admin cluster control plane components availability and user cluster control plane components availability are introduced with a new kubernetes.io/anthos/container/uptime metric to replace deprecated alerts and the old kubernetes.io/anthos/up metric.

  • You can now skip certain health checks performed by gkectl diagnose cluster with the –skip-validation-xxx flag.

Fixes:

  • Fixed the issue of gkeadm trying to set permissions for the component access service account when --auto-create-service-accounts=false.
  • Fixed the timeout issue for admin cluster creation or upgrade that was caused by high network latency to reach the container registry.
  • Fixed the gkectl create-config admin and gkectl create-config cluster panic issue in the 1.8.0-1.8.3 releases.
  • Fixed the /run/aide disk usage issue that was caused by the accumulated cron log for aide.

Restoring an admin cluster from a backup using gkectl repair admin-master –restore-from-backup fails when using a private registry. The issue will be resolved in a future release.

September 23, 2021

Anthos clusters on VMware 1.7.4-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.4-gke.2 runs on Kubernetes v1.19.12-gke.2101.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • Fixed high-severity CVE-2021-3711.
  • Fixed CVE-2021-25741 mentioned in the GCP-2021-018 security bulletin.
  • Fixed the Istio security vulnerabilities listed in the GCP-2021-016 security bulletin.
  • Fixed the issue that gkeadm tries to set permissions for the component access service account when --auto-create-service-accounts=false.

September 21, 2021

Anthos clusters on VMware 1.8.3-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.3-gke.0 runs on Kubernetes v1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • Fixed high-severity CVE-2021-3711.
  • Fixed CVE-2021-25741 mentioned in the GCP-2021-018 security bulletin.
  • Fixed the Istio security vulnerabilities listed in the GCP-2021-016 security bulletin.
  • Fixed the issue that gkeadm tries to set permissions for the component access service account when --auto-create-service-accounts=false.

In versions 1.8.0-1.8.3, the gkectl create-config admin/cluster command panics with the message panic: invalid version: "latest". As a workaround, use gkectl create-config admin/cluster --gke-on-prem-version=$DESIRED_CLUSTER_VERSION. Replace DESIRED_CLUSTER_VERSION with the desired version.

September 17, 2021

A security issue was discovered in Kubernetes, CVE-2021-25741, where a user may be able to create a container with subpath volume mounts to access files and directories outside of the volume, including on the host filesystem. For more information, see the GCP-2021-018 security bulletin.

September 16, 2021

Anthos clusters on VMware 1.6.5-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.5-gke.0 runs on Kubernetes 1.18.20-gke.4501.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

September 03, 2021

Anthos clusters on VMware 1.7.3-gke.6 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.3-gke.X runs on Kubernetes v1.19.12-gke.1100

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • Fixed the Ubuntu user password expiration issue. This is a required fix for customers running 1.7.2 or 1.7.3-gke.2. Either use the suggested workaround to fix this issue, or upgrade to get this fix.

  • Fixed the issue that the stackdriver-log-forwarder pod was sometimes in crashloop because of fluent-bit segfault.

August 31, 2021

Anthos clusters on VMware 1.8.2-gke.11 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.2-gke.11 runs on Kubernetes 1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Starting from version 1.8.2, Anthos clusters on VMware uses cert-manager instead of Istio Citadel for issuing TLS certificates used by metrics endpoints.

Fixes:

  • Fixed the Ubuntu user password expiration issue. You must get this fix. Either use the suggested workaround to fix this issue, or upgrade to get this fix.
  • Enhanced the admin cluster upgrade logic to prevent the admin cluster state (that is, the admin master data disk) from being lost in those cases when the disk is renamed or migrated accidentally.
  • Fixed the issue that the GKE connect-register service account key is printed in the klog in 1.8.0 and 1.8.1 when users run gkectl update cluster to update the GKE connect spec, such as to register an existing user cluster.
  • Fixed issue that when ESXi hosts were unavailable in the vCenter cluster (such as when disconnected from vCenter or in maintenance mode), the Cluster API controller and cluster health controllers would crash loop, and the gkectl diagnose cluster command would crash.
  • Fixed the issue that an admin cluster upgrade might be blocked indefinitely if admin node machines are upgraded before the new Cluster API controller is ready.
  • Fixed the issue that the onprem-user-cluster-controller might leak vCenter sessions over time.

  • Fixed the issue that the gateway IP was assigned to a Windows Pod, which made it unable to have network connectivity.

  • Fixed CVE-2021-33909 and CVE-2021-33910 on Ubuntu and COS.

HPA with custom metrics doesn't work in version 1.8.2 due to the migration from Istio to cert-manager for the monitoring pipeline. Customers using the HPA custom metrics with the monitoring pipeline should wait for a future release that will include this fix.

August 09, 2021

Anthos clusters on VMware 1.7.3-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.3-gke.2 runs on Kubernetes 1.19.12-gke.1100.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • These security vulnerabilities have been fixed: CVE-2021-3520, CVE-2021-33909, and CVE-2021-33910.

  • Fixed the issue that the /etc/cron.daily/aide` script uses up all existing space in /run, causing a crash loop in Pods.

  • Fixed the issue that admin cluster upgrade may fail due to an expired front-proxy-client certificate on the admin cluster control plane node.

August 05, 2021

Anthos clusters on VMware 1.6.4-gke.7 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.4-gke.7 runs on Kubernetes 1.18.20-gke.2900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • These security vulnerabilities have been fixed: CVE-2021-3520, CVE-2021-33909, and CVE-2021-33910.

  • Fixed the issue that admin cluster upgrade may fail due to an expired front-proxy-client certificate on the admin cluster control plane node.

July 22, 2021

Anthos clusters on VMware 1.8.1-gke.7 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.1-gke.7 runs on Kubernetes v1.20.8-gke.1500.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • The issue that the etc/cron.daily/aide script uses up all existing space in /run, causing a crashloop in Pods, has been fixed. The files located under /run/aide/ will be cleaned up periodically.
  • If you use the gkectl upgrade loadbalancer to attempt to update some parameters of the Seesaw load balancer in version 1.8.0, this will not work in either DHCP or IPAM mode. If your setup includes this configuration, do not upgrade to version 1.8.0, but instead to version 1.8.1 or later. If you are already at version 1.8.0, you can upgrade to 1.8.1 first before updating any parameters. See Upgrading Seesaw load balancer with version 1.8.0.
  • For Windows nodes, fixed an issue by adding a step for automatically detecting the network interface name instead of hard-coding it, since this name might be different depending on the network adapter being used in the base VM template.
  • Fixed an issue for building a Windows VM template that avoids retrying the VM shutdown in the gkectl prepare windows command, as this retrying caused the command to be stuck for a long time.
  • Fixed an issue where snapshot.storage.k8s.io/v1 resources were rejected by the snapshot admission webhook.
  • The CVE-2021-3520 security vulnerability has been fixed. 

July 08, 2021

Anthos clusters on VMware 1.8.0-gke.25 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.0-gke.25 runs on Kubernetes v1.20.5-gke.1301.

Fixes:

Fixed CVE-2021-34824 that could expose private keys and certificates from Kubernetes secrets through the credentialName field when using Gateway or DestinationRule. This vulnerability affects all clusters created or upgraded with Anthos clusters on VMware version 1.8.0.21. For more information, see the GCP-2021-012 security bulletin.

July 07, 2021

Anthos clusters on VMware 1.8.0-gke.25 is now available to resolve this issue.

The Istio project recently disclosed a new security vulnerability, CVE-2021-34824, affecting Istio. Istio contains a remotely exploitable vulnerability where credentials specified in the credentialName field for Gateway or DestinationRule can be accessed from different namespaces.

For more information, see the GCP-2021-012 security bulletin.

June 28, 2021

Anthos clusters on VMware 1.8.0-gke.21 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.0-gke.21 runs on Kubernetes v1.20.5-gke.1301.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Cluster lifecycle improvements:

You should no longer use gcloud to unregister a user cluster, because clusters are registered automatically. Instead, register existing user clusters by using gkectl update cluster. You can also use gkectl update cluster to consolidate out-of-band registration that was done using gcloud. For more information, see Cluster registration.

Platform enhancements:

  • Preview: Cluster autoscaling is now available in preview. With cluster autoscaling, you can horizontally scale node pools in proportion to workload demand. When demand is high, the cluster autoscaler adds nodes to the node pool. When demand is low, the cluster autoscaler removes nodes from the node pool, scaling back down to a minimum size that you designate. Cluster autoscaling can increase the availability of your workloads while controlling costs.

  • Preview: User cluster control-plane node and admin cluster add-on node auto sizing are now available in preview. The features can be enabled separately in user cluster or admin cluster configurations. When you enable user cluster control-plane node auto sizing, user cluster control-plane nodes are automatically resized in proportion to the number of node pool nodes in the given user cluster. When you enable admin cluster add-on node auto sizing, admin cluster add-on nodes are automatically resized in proportion to the number nodes in the admin cluster.

  • Preview: Windows Server container support for Anthos clusters on VMware is now available in preview. This allows you to modernize and run your Windows-based apps more efficiently in your data centers without having to go through risky application rewrites. You can use Windows containers alongside Linux containers for your container workloads. The same experience and benefits that you have come to enjoy with Anthos clusters on VMware using Linux--application portability, consolidation, cost savings, and agility--can now be applied to Windows Server applications also.

  • Preview: Admin cluster backup is now available in preview. With this feature enabled, admin cluster backups are automatically performed before and after user and admin cluster creation, update, and upgrade. A new gkectl backup admin command performs manual backup. Upon admin cluster storage failure, you can restore the admin cluster from a backup with the gkectl repair admin-cluster --restore-from-backup command.

Security enhancements:

  • The Ubuntu node image is qualified with the CIS (Center for Internet Security) L1/L2 Server Benchmark.

  • Generally available: Workload identity support is now generally available. For more information, see Fleet workload identity. The connect-agent service account key is no longer required during installation. The connect agent uses workload identity to authenticate to Google Cloud instead of an exported Google Cloud service account key.

  • You can now use gkectl to rotate system root CA certificates for user clusters.

  • You can now use gkectl to update vCenter CA certificates for both admin clusters and user clusters.

Network feature enhancements:

Preview: Egress NAT gateway is now available in preview. To be able to access off-cluster workloads, traffic originating within the cluster that is related to specific flows must have deterministic source IP addresses. Egress NAT gateway gives you fine-grained control over which traffic gets a deterministic source IP address, and then provides that address. The Egress NAT Gateway functionality is built on top of Dataplane V2.

Storage enhancements:

  • The Anthos vSphere CSI driver now supports both offline and online volume expansion for dynamically and statically created block volumes only.

    • Offline volume expansion is available in vSphere 7.0 and later. Online expansion is available in vSphere 7.0u2 and later.

    • The vSphere CSI driver StorageClass standard-rwo, which is installed in user clusters automatically, sets allowVolumeExpansion to true by default for newly created clusters running on vSphere 7.0 or later. You can use both online and offline expansion for volumes using this StorageClass.

  • The volume snapshot feature now supports v1 versions of VolumeSnapshot, VolumeSnapshotContent, and VolumeSnapshotClass objects. The v1beta1 versions are deprecated and will soon stop being served.

Simplify day-2 operations:

  • You can now use Anthos Identity Service (AIS) and OpenID Connect (OIDC) for authentication to admin clusters in addition to user clusters.

  • Preview: Anthos Identity Service can now resolve groups with Okta as identity provider. This allows administrators to write RBAC policy with Okta groups.

  • Preview: Anthos Identity service now supports LDAP authentication methods in addition to OIDC. You can use AIS with Microsoft Active Directory without the need for provisioning Active Directory Federation Services.

  • The Anthos metadata agent replaces the original metadata agent to collect and send Anthos metadata to Google Cloud Platform, so that Google Cloud Platform can use this metadata to build a better user interface for Anthos clusters. You must 1) enable the Config Monitoring for Ops API in your logging-monitoring project, 2) grant the Ops Config Monitoring Resource Metadata Writer role to your logging-monitoring service account, and 3) add opsconfigmonitoring.googleapis.com to your proxy allowlist (if applicable).

  • You can use gkectl diagnose snapshot --upload-to [GCS_BUCKET] --service-account-key-file [SA_KEY_FILE] to automatically upload snapshots to a Google Cloud Storage (GCS) bucket. The provided service account must have the roles/storage.admin IAM role enabled.

Functionality changes:

  • The admin cluster now uses containerd on all nodes, including the admin cluster control-plane node, admin cluster add-on nodes, and user cluster control-plane nodes. This applies to both new admin clusters and existing admin clusters upgraded from 1.7.x. On user cluster node pools,  containerd is the default container runtime for new node pools, but existing node pools that are upgraded from 1.7.x will continue using Docker Engine. You can continue to use Docker Engine for a new node pool by setting its osImageType to ubuntu.

  • A new ubuntu_containerd OS image type is introduced. ubuntu_containerd uses an identical OS image as ubuntu, but the node is configured to use containerd as the container runtime instead. The ubuntu_containerd OS is used for new node pools by default, but existing node pools upgraded from 1.7.x continue using Docker Engine. Docker Engine support will be removed in Kubernetes 1.24, and you should start converting your node pools to ubuntu_containerd as soon as possible.

  • When installing or upgrading to 1.8.0-gke.21 on a vCenter with a vSphere version older than 6.7 Update 3, you may receive a notification. Note that vSphere versions older than 6.7 Update 3 will no longer be supported in Anthos clusters on VMware in an upcoming version.

  • The create-config Secret is removed in both the admin and the user clusters. If you previously relied on workarounds that modify the secret(s), contact Cloud Support for updates.

  • You can update the CPU and memory configuration for the user cluster control-plane node with gkectl update cluster.

  • You can configure the CPU and memory configurations for the admin control-plane node to non-default settings during admin cluster creation through the newly introduced admin cluster configuration fields.

  • Node auto repairs are throttled at the node pool level. The number of repairs per hour for a node pool is limited to the either 3, or 10% of the number of nodes in the node pool, whichever is greater.

  • Starting from Kubernetes 1.20, timeouts on exec probes are honored, and default to one second if unspecified. If you have Pods using exec probes, ensure they can easily complete in one second or explicitly set an appropriate timeout. See Configure Probes for more details.

  • Starting from Kubernetes 1.20, Kubelet no longer creates the target_path for NodePublishVolume in accordance with the CSI spec. If you have self-managed CSI drivers deployed in your cluster, ensure they are idempotent and do any necessary mount creation/verification. See Kubernetes issue #88759 for details.

  • Non-deterministic treatment of objects with invalid ownerReferences was fixed in Kubernetes 1.20. You can run the kubectl-check-ownerreferences tool prior to upgrade to locate existing objects with invalid ownerReferences. The metadata.selfLink field, deprecated since Kubernetes 1.16, is no longer populated in Kubernetes 1.20. See Kubernetes issue #1164 for details.

Breaking changes:

  • The Istio components have been upgraded to handle ingress support. Previously, using HTTPS for ingress required both an Istio Gateway and Kubernetes Ingress. With this release, the full ingress spec is natively supported. See Ingress migration to manage this upgrade for Istio components.

  • The Cloud Run for Anthos user cluster configuration option is no longer supported. Cloud Run for Anthos is now installed as part of registration with a fleet. This allows for configuring and upgrading Cloud Run separately from Anthos clusters on VMware. To upgrade to the newest version of Cloud Run for Anthos, see Installing Cloud Run for Anthos.

Fixes:

  • Previously, the admin cluster upgrade could be affected by the expired front-proxy-client certificate that persists in the data disk for the admin cluster control-plane node. Now the front-proxy-client certificate is renewed during an upgrade.

  • Fixed an issue where logs are sent to the parent project of the service account specified in the stackdriver.serviceAccountKeyPath field of your cluster configuration file while the value of stackdriver.projectID is ignored.

  • Fixed an issue that Calico-node Pods sometimes use an excessive amount of CPU in large-scale clusters.

The stackdriver-metadata-agent-cluster-level-* Pod might have logs that look like this:

reflector.go:131] third_party/golang/kubeclient/tools/cache/reflector.go:99: Failed to list *unstructured.Unstructured: the server could not find the requested resource

You can safely ignore these logs.

June 17, 2021

When you upgrade an unregistered Anthos cluster on VMware from a version earlier than 1.7.0 to a version 1.7.0 or later, you need to manually install and configure the Anthos Config Management operator. If you had previously installed Anthos Config Management, you need to re-install it. For details on how to do this, see Installing Anthos Config Management.

If you are using a private registry for software images, upgrading an Anthos cluster on VMware will always require special steps, described in Updating Anthos Config Management using a private registry. Upgrading from a version earlier than 1.7.0 to a version 1.7.0 or later additionally requires that you manually install and configure the Anthos Config Management operator as described in Installing Anthos Config Management.

June 08, 2021

Anthos clusters on VMware 1.5.4-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.5.4-gke.2 runs on Kubernetes v.1.17.9-gke.4400. The supported versions that offer the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

Fixes

These security vulnerabilities have been fixed:

Fixed CVE-2021-25735 mentioned in the GCP-2021-003 Security Bulletin, CVE-2021-31535, and other medium and low vulnerability CVEs with fixes available.

June 07, 2021

Anthos clusters on VMware 1.6.3-gke.3 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.3-gke.3 runs on Kubernetes v1.18.18-gke.100. The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

Fixes

These security vulnerabilities have been fixed:

Fixed CVE-2021-25735 mentioned in the GCP-2021-003 Security Bulletin, CVE-2021-31535, and other medium and low vulnerability CVEs with fixes available.

May 27, 2021

Anthos clusters on VMware 1.7.2-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.2-gke.2 runs on Kubernetes 1.19.10-gke.1602.

The supported versions that offer the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

The Ubuntu node image shipped in version 1.7.2 is qualified with the CIS (Center for Internet Security) L1 Server Benchmark.

Fixes:

An admin cluster upgrade may fail due to an expired front-proxy-client certificate on the admin control plane node. Make sure that the certificate is not expired, and recreate it if needed. See: Renew an expired certificate.

May 21, 2021

In Anthos clusters on VMware 1.7, logs are sent to the parent project of your logging-monitoring service account. That is, logs are sent to the parent project of the service account specified in the stackdriver.serviceAccountKeyPath field of your cluster configuration file. The value of stackdriver.projectID is ignored. This issue will be fixed in an upcoming release.

As a workaround, view logs in the parent project of your logging-monitoring service account.

May 20, 2021

In version 1.7.1, the stackdriver-log-forwarder starts to consume significantly increasing memory after a period of time, and the logs show an excessive number of OAuth 2.0 token requests. Follow these steps to mitigate this issue.

May 11, 2021

A recently discovered vulnerability, CVE-2021-31920, affects Istio in respect to its authorization policies. Istio contains a remotely exploitable vulnerability where an HTTP request with multiple slashes or escaped slash characters can bypass Istio authorization policy when path-based authorization rules are used. While Anthos clusters on VMware uses an Istio Gateway object for network ingress traffic into clusters, authorization policies are not a supported or intended use case for Istio as part of the Anthos clusters on VMware prerequisites. For more details, refer to the Istio security bulletin.

May 06, 2021

The Envoy and Istio projects recently announced several new security vulnerabilities (CVE-2021-28683, CVE-2021-28682, and CVE-2021-29258) that could allow an attacker to crash Envoy.

For more information, see the GCP-2021-004 security bulletin.

May 05, 2021

Anthos clusters on VMware 1.7.1-gke.4 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.1-gke.4 runs on Kubernetes 1.19.7-gke.2400.

The supported versions that offer the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

If you upgrade the admin cluster before you upgrade the associated user clusters within the same minor version, such as from 1.7.0 to 1.7.1, the user cluster control-planes will be upgraded together with the admin cluster. This applies even if you use the flag --force-upgrade-admin. This behavior, in versions 1.7.0 and later, is different from versions 1.6 and earlier, and is expected behavior.

Fixes:

  • Fixed a bug, so that the hardware version of a virtual machine is determined based on the ESXi host apiVersion instead of the host version. When host ESXi apiVersion is at least 6.7U2, VMs with version vmx-15 are created. Also, the CSI preflight checks validate the ESXi host API version instead of the host version.

  • Fixed a bug, so that if vSphereCSIDisabled is set to true, Container Storage Interface (CSI) preflight checks do not run when you execute commands such as gkectl check-config or create loadbalancer or create cluster.

  • Fixed CVE-2021-3444, CVE-2021-3449, CVE-2021-3450, CVE-2021-3492, CVE-2021-3493, and CVE-2021-29154 on the Ubuntu operating system used by the admin workstation, cluster nodes, and Seesaw.

  • Fixed a bug where attempting to install or upgrade GKE on-prem 1.7.0 failed with an "/STSService/ 400 Bad Request" when the vCenter is installed with the external platform services controller. Installations where the vCenter server is a single appliance are not affected. Note that VMware deprecated the external platform services controller in 2018.

  • Fixed a bug where auto repair failed to trigger for unhealthy nodes if the cluster-health-controller was restarted while a previously issued repair was in progress.

  • Fixed a bug so that the command gkectl diagnose snapshot output includes the list of containers and the containerd daemon log on Container-Optimized OS (COS) nodes.

  • Fixed a bug that caused gkectl update admin to generate an InternalFields diff unexpectedly.

  • Fixed the issue that the stackdriver-log-forwarder pod was sometimes in crashloop because of fluent-bit segfault.

April 20, 2021

The Kubernetes project recently announced a new security vulnerability, CVE-2021-25735, that could allow node updates to bypass a Validating Admission Webhook. For more details, see the GCP-2021-003 security bulletin.

March 25, 2021

Anthos clusters on VMware 1.7.0-gke.16 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.0-gke.16 runs on Kubernetes 1.19.7-gke.2400.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting GKE On-Prem are 1.6, 1.5, and 1.4.

Cluster lifecycle improvements

  • The cluster upgrade process has changed. Instead of upgrading the admin cluster first, you can upgrade user clusters to the newer version without upgrading the admin cluster. The new flow, which requires upgrading gkeadm, allows you to preview new features before performing a full upgrade with the admin cluster. In addition, the 1.7.0 version of gkectl can perform operations on both 1.6.X and 1.7.0 clusters.

  • Starting with version 1.7.0, you can deploy Anthos clusters on vSphere 7.0 environments in addition to vSphere 6.5 and 6.7. Note that Anthos clusters on VMware will phase out vSphere 6.5 support following VMware end of general support timelines.

  • Published the minimum hardware resource requirements for a proof-of-concept cluster.

Platform enhancements

  • GA: Node auto repair is now generally available and enabled by default for newly created clusters. When the feature is enabled, cluster-health-controller performs periodic health checks, surfaces problems as events on cluster objects, and automatically repairs unhealthy nodes.

  • GA: vSphere resource metrics is now generally available and enabled by default for newly created clusters. When the feature is enabled, VM level resource contention metrics are collected and displayed in the VM health dashboards automatically created through out-of-the-box monitoring. You can use these dashboards to track VM resource contention issues.

  • GA: Dataplane V2 is now generally available and can be enabled in newly created clusters.

  • GA: Network Policy Logging is now generally available. Network policy logging is available only for clusters running Dataplane V2.

  • You can attach vSphere tags to user cluster node pools during cluster creation and update. You can use tags to organize and select VMs in vCenter.

Security enhancements:

  • Preview: You can run Container-Optimized OS on your user cluster worker nodes.

Simplify Day-2 operations:

  • GA: Support for vSphere folders is now generally available. This allows you to install Anthos clusters on VMware in a vSphere folder, reducing the scope of the permission required for the vSphere user.

  • A new gkectl update admin command supports updating certain admin cluster configurations including adding static IP addresses.

  • The central log aggregator component has been removed from the logging pipeline to improve reliability, scalability and resource usage.

  • Cluster scalability has been improved:

    • 50 user clusters per admin cluster

    • With Seesaw, 500 nodes, 15,000 Pods, and 500 LoadBalancer Services per user cluster

    • With F5 BIG-IP, 250 nodes, 7,500 Pods, and 250 LoadBalancer Services per user cluster

Anthos Config Management:

Anthos Config Management (ACM) is now decoupled from Anthos clusters on VMware. This provides multiple benefits including decoupling the ACM release cadence from Anthos clusters on VMware, simplifying the testing and qualification process, and providing a consistent installation and upgrade flow.

Storage enhancements:

GA: The vSphere CSI driver is now generally available. Your vCenter server and ESXi hosts must both be running 6.7 update 3 or newer. The preflight checks and gkectl diagnose cluster have been enhanced to cover the CSI prerequisites.

Functionality changes:

  • gkectl diagnose cluster now includes validation load balancing, including F5, Seesaw, and manual mode.

  • gkectl diagnose snapshot now provides an HTML index file in the snapshot, and collects extra container information from the admin cluster control-plane node when the Kubernetes API server is inaccessible.

  • gkectl update admin has been updated to:

    • Enable or disable auto repair in the admin cluster
    • Add static IP addresses to the admin cluster
    • Enable/disable vSphere resource metrics in the admin cluster
  • gkectl update cluster has been enhanced to enable or disable vSphere resource metrics in a user cluster.

  • Given that we no longer need an allowlisted service account in the admin workstation configuration file, we deprecated the gcp.whitelistedServiceAccountKeyPath field and added a new gcp.componentAccessServiceAccountKeyPath field. For consistency, we also renamed the corresponding gcrKeyPath field in the admin cluster configuration file.

Breaking changes:

  • The following Google Cloud API endpoints must be allowlisted in network proxies and firewalls. These are now required for Connect Agent to authenticate to Google when the cluster is registered in Hub:

    • securetoken.googleapis.com
    • sts.googleapis.com
    • Iamcredentials.googleapis.com
  • gkectl now accepts only v1 cluster configuration files. For instructions on converting your v0 configuration files, see Converting configuration files.

Fixes:

  • Fixed a bug where Grafana dashboards based on the container_cpu_usage_seconds_total metric show no data.

  • Fixed an issue where scheduling Stackdriver components on user cluster control-plane nodes caused resource contention issues.

  • Fixed Stackdriver Daemonsets to tolerate NoSchedule and NoExecute taints.

  • Fixed an HTTP/2 connection issue that sometimes caused problems with connections from the kubelet to the Kubernetes API server. This issue also could lead to nodes becoming not ready.

Known issues:

  • Calico-node Pods sometimes use an excessive amount of CPU in large-scale clusters. You can mitigate the issue by killing such Pods.

  • When running gkectl update admin against a cluster upgraded from 1.6, you might get the following diff:

    - InternalFields: nil,
    - InternalFields: map[string]string{"features.onprem.cluster.gke.io/bundle- 
    vsphere-credentials": "enabled"},
    

    You can safely ignore this and proceed with the update.

February 26, 2021

Anthos clusters on VMware (GKE on-prem) 1.6.2-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.2-gke.0 clusters run on Kubernetes 1.18.13-gke.400.

Fixed in 1.6.2-gke.0:

  • Fixed a kubelet restarting issue that was found when running workloads that rely on kubectl exec/port-forward/attach, such as Jenkins.

  • Fixed CVE-2021-3156 in the node operating system image. CVE-2021-3156 is described in Security bulletins.

GKE on-prem 1.4.5-gke.0 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.5-gke.0 clusters run on Kubernetes 1.16.11-gke.11.

Fixed in 1.4.5-gke.0:

January 27, 2021

Anthos clusters on VMware (GKE on-prem) 1.6.1-gke.1 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.1-gke.1 clusters run on Kubernetes 1.18.13-gke.400.

Fixes:

  • Fixed a bug where the user cluster upgrade is blocked if the vcenter resource pool is neither directly nor indirectly specified (that is, if the vcenter resource pool is inherited and is the one used by the admin cluster) in the configs.
  • Fixed CVE-2020-15157 and CVE-2020-15257 in containerd.
  • Fixed an issue where upgrading the admin cluster from 1.5 to 1.6.0 breaks 1.5 user clusters that use any OIDC provider and that have no value for authentication.oidc.capath in the user cluster configuration file.

January 21, 2021

Anthos GKE on-prem 1.5.3-gke.0 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.3-gke.0 clusters run on Kubernetes 1.17.9-gke.4400.

Fixes:

  • Fixed CVE-2020-15157 and CVE-2020-15257 in containerd.

  • Cloud Run Operator is now able to successfully update custom resource definitions (CRDs).

December 10, 2020

Anthos clusters on VMware 1.6.0-gke.7 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.0-gke.7 clusters run on Kubernetes 1.18.6-gke.6600.

Note: The fully supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.6, 1.5, and 1.4.

Users can use a credential configuration file with gkeadm (credential.yaml), which is generated during running the gkeadm create config command, to improve security by removing credentials from admin-ws-config.yaml.

Node Problem Detector and Node Auto Repair automatically detect and repair additional failures, such as Kubelet-API server connection loss (an OSS issue) and long-lasting DiskPressure conditions.

Preview: Repair administrator master VM failures by using the new command, gkectl repair admin-master.

Preview: Secrets Encryption for user clusters using Thales Luna Network HSM Devices.

Preview: Service Account Key Rotation in gkectl for Usage Metering, Cloud Audit Logs, and Google Cloud's operations suite service accounts.

Anthos Identity Service enables dynamic configuration changes for OpenID Connect (OIDC) configuration without needing to recreate user clusters.

Google Cloud's operations suite support for bundled Seesaw load balancing:

Metrics and logs of bundled Seesaw load balancers are now uploaded to Google Cloud through Google Cloud's operations suite to provide the best observability experience.

Cloud Audit Logs

Offline buffer for Cloud Audit Logs: Audit logs are now buffered on disk if not able to reach Cloud Audit Logs and can withstand at least 4 hours of network outage.

CSI volume snapshots

The CSI snapshot controllers are now automatically deployed in user clusters, enabling the users to create snapshots of persistent volumes and restore the volumes' data by provisioning new volumes from these snapshots.

Functionality changes:

  • Gkectl diagnose cluster and snapshot enhancements:

    • Added a --log-since flag to gkectl diagnose snapshot. Users can use it to collect logs of containers and nodes within a relative time duration in the snapshot.

    • Replaced the --seed-config flag with the --config flag in the gkectl diagnose cluster command. Users can use this command with the seed configuration to rule out the VIP issue and provide more debugging information of the cluster.

    • Added more validations in gkectl diagnose cluster.

  • Added iscsid support: Qualified storage drivers that previously required additional steps benefit from the default iscsi service deployment on the worker nodes.

  • On each cluster node, Anthos clusters on VMware now reserves 330 MiB + 5% of the node's memory capacity for operating system components and core Kubernetes components. This is an increase of 50 MiB. For more information see Resources available for your workloads.

Breaking changes:

Fixes:

  • Security fix: Resolve credential file references when only a subset of credentials are specified by reference.

  • Fixed vSphere credential update when CSI storage is not enabled.

  • Fixed a bug in Fluent Bit in which the buffer for logs might fill up node disk space.

Known issues:

  • gkectl update reverts your edits on clientconfig CR in 1.6.0. We strongly suggest that customers back up the clientconfig CR after every manual change.

  • Kubectl describe CSINode and gkectl diagnose snapshot might sometimes fail due to the OSS Kubernetes issue on dereferencing nil pointer fields.

  • The OIDC provider doesn't use the common CA by default. You must explicitly supply the CA certificate.

  • Upgrading the admin cluster from 1.5 to 1.6.0 breaks 1.5 user clusters that use any OIDC provider and that have no value for authentication.oidc.capath in the user cluster configuration file.

    To work around this issue, run the following script, using your OIDC provider address as the IDENTITY_PROVIDER, YOUR_OIDC_PROVIDER_ADDRESS in the following script:

    USER_CLUSTER_KUBECONFIG=usercluster-kubeconfig

    IDENTITY_PROVIDER=YOUR_OIDC_PROVIDER_ADDRESS

    openssl s_client -showcerts -verify 5 -connect $IDENTITY_PROVIDER:443 < /dev/null | awk '/BEGIN CERTIFICATE/,/END CERTIFICATE/{ if(/BEGIN CERTIFICATE/){i++}; out="tmpcert"i".pem"; print >out}'

    ROOT_CA_ISSUED_CERT=$(ls tmpcert*.pem | tail -1)

    ROOT_CA_CERT="/etc/ssl/certs/$(openssl x509 -in $ROOT_CA_ISSUED_CERT -noout -issuer_hash).0"

    cat tmpcert*.pem $ROOT_CA_CERT > certchain.pem CERT=$(echo $(base64 certchain.pem) | sed 's\ \\g') rm tmpcert1.pem tmpcert2.pem

    kubectl --kubeconfig $USER_CLUSTER_KUBECONFIG patch clientconfig default -n kube-public --type json -p "[{ \"op\": \"replace\", \"path\": \"/spec/authentication/0/oidc/certificateAuthorityData\", \"value\":\"${CERT}\"}]"

November 16, 2020

GKE on-prem 1.5.2-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.2-gke.3 clusters run on Kubernetes 1.17.9-gke.4400.

GKE Data Plane V2 Preview is now available.

  • GKE Data Plane V2 is a new programmable data path that enables Google to offer new network security features like Network Policy Logging and Node Network Policy.

For information about enabling Dataplane V2, see User cluster configuration file. For information about Network Policy Logging, see Logging network policy events.

Binary Authorization for GKE on-prem 0.2.1 is now available.

  • Binary Authorization for GKE on-prem 0.2.1 adds a proxy side cache that caches AdmissionReview responses. This can improve the reliability of the webhook.

Fixes:

  • Fixed false warning in gkectl check-config for admin cluster for manual load balancing category.
  • Updated Istio Ingress (Kubernetes) Custom Resource Definitions (CRDs) to use v1beta1.
  • Fixed issue where GKE on-prem upgrade is stuck because of Cloud Run for Anthos on-prem pods crash looping. Cloud Run for Anthos on-prem causes an operational outage of GKE on-prem when Cloud Run for Anthos on-prem is enabled in upgrade of GKE on-prem 1.4 to 1.5. Fixed webhook; custom resource definition (CRD) is not fixed.

Known issues:

Cloud Run Operator is unable to update custom resource definitions (CRDs). Applying the CRDs manually either before or during the upgrade lets the operator continue the upgrade.

Workaround:

gsutil cat gs://gke-on-prem-release/hotfixes/1.5/cloudrun/crds.yaml | kubectl apply -f -

November 02, 2020

GKE on-prem 1.4.4-gke.1 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.4-gke.1 clusters run on Kubernetes 1.16.11-gke.11.

Fixes:

  • Updated Istio Ingress (Kubernetes) Custom Resource Definitions (CRDs) to use v1beta1.

GKE on-prem 1.3.5-gke.2 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.5-gke.2 clusters run on Kubernetes 1.15.12-gke.6400.

Fixes:

October 23, 2020

GKE on-prem 1.5.1-gke.8 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.1-gke.8 clusters run on Kubernetes 1.17.9-gke.4400.

Binary Authorization for GKE on-prem Preview is now available:

This release enables customers to generate credential configuration templates by using the gkectl create-config credential command.

Published the best practices for how to set up GKE on-prem components for high availability and how to recover from disasters.

Published the best practices for creating, configuring, and operating GKE on-prem clusters at large scale.

Known issues:

The version of Anthos Configuration Management included in the GKE on-prem release 1.5.1-gke.8 had initially referenced a version of the nomos image that had not been moved into the gcr.io/gke-on-prem-release repository, thus preventing a successful installation or upgrade of Anthos Configuration Management. This image has since been pushed to the repository to correct the issue for customers not using private registries. Customers using private registries will need to upgrade to 1.5.2 when it is available (scheduled for November 16, 2020) or manually copy the nomos:v1.5.1-rc.7 image into their private repository.

Fixes:

  • Fixed cluster creation issue when Cloud Run is enabled.
  • Fixed the false positive error in docker registry preflight check where REGISTRY_ADDRESS/NAMESPACE might be mistakenly used as the registry address to store the certs on a test VM, causing authentication errors.

September 24, 2020

GKE on-prem 1.5.0-gke.27 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.0-gke.27 clusters run on Kubernetes 1.17.9-gke.4400.

Improved upgrade and installation:

  • Preflight checks are now blocking with v1 configs for installation and upgrades. Users can use --skip-preflight-check-blocking to unblock the operation.
  • Added support for running gkeadm on macOS Catalina, v10.14.
  • Enabled installation and upgrade by using any Google Cloud–authenticated service account. This removes the need for allowlisting.
  • Improved security by adding support for using an external credential file in admin or user configuration. This enables customers to check in their cluster configuration files in source code repositories without exposing confidential credential information.

Improved HA and failure recovery:

Improved support for Day-2 operations:

  • The gkectl update cluster command is now generally available. Users can use it to change supported features in the user cluster configurations after cluster creation.
  • The gkectl update credentials command for vSphere and F5 credentials is now generally available.
  • Improves scalability with 20 user clusters per admin cluster, and 250 nodes, 7500 pods, 500 load balancing services (using Seesaw), and 250 load balancing services (using F5) per user cluster.
  • Introduces vSphere CSI driver in preview.

Enhanced monitoring with Cloud Monitoring:

  • Introduces out-of-the-box alerts for critical cluster metrics and events in preview.
  • Out-of-the-box monitoring dashboards are automatically created during installation when Cloud Monitoring is enabled.
  • Lets users modify CPU or memory resource settings for Cloud Monitoring components.

Functionality changes:

  • Preflight check failures now block gkectl create loadbalancer for the bundled load balancer with Seesaw.
  • Adds a blocking preflight check for the anthos.googleapis.com API of a configured gkeConnect project.
  • Adds a blocking preflight check on proxy IP and service/pod CIDR overlapping.
  • Adds a non-blocking preflight check on cluster health before an admin or user cluster upgrade.
  • Updates the gkectl diagnose snapshot:
    • Fixes the all scenario to collect all supported Kubernetes resources for the target cluster.
    • Collects F5 load balancer information, including Virtual Server, Virtual Address, Pool, Node, and Monitor.
    • Collects vSphere information, including VM objects and their events based on the resource pool and the Datacenter, Cluster, Network, and Datastore objects that are associated with VMs.
  • Fixes the OIDC proxy configuration issue. Users no longer need to edit NO_PROXY env settings in the cluster configuration to include new node IPs.
  • Adds monitoring.dashboardEditor to the roles granted to the logging-monitoring service account during admin workstation creation with --auto-create-service-accounts.
  • Bundled load balancing with Seesaw switches to the IPVS maglev hashing algorithm, achieving stateless, seamless failover. There is no connection sync daemon anymore.
  • The hostconfig section of the ipBlock file can be specified directly in the cluster yaml file network section and has a streamlined format.

Breaking changes:

  • Starting with version 1.5, instead of using kubectl patch machinedeployment to resize the user cluster and kubectl edit cluster to add static IPs to user clusters, use gkectl update cluster to resize the worker node in user clusters and to add static IPs to user clusters.
  • Starting with version 1.5, the gkectl log is saved in a single file instead of multiple files by log verbosity levels. By default, the gkectl log is saved in the /home/ubuntu/.config/gke-on-prem/logs directory with a symlink created under the ./logs directory for easy access. Users can use --log_dir or --log_file to change this default setting.
  • Starting with version 1.5, the gkeadm log is saved in a single file instead of multiple files by log verbosity levels. By default, the gkeadm log is saved under ./logs. Users can use --log_dir or --log_file to change this default setting.
  • In version 1.5 only, the etcd version is updated from 3.3 to 3.4, which means the etcd image becomes smaller for improved performance and security (distroless), and the admin and user cluster etcd restore process is changed.
  • In 1.5 and later releases, a new firewall rule needs to be enabled from admin cluster add-on nodes to vCenter server API port 443.

Fixes:

  • Fixed an issue that caused approximately 50 seconds of downtime for the user cluster API service during cluster upgrade or update.
  • Corrected the default log verbosity setting in gkectl and gkeadm help messages.

Known issues:

  • Due to a 1.17 Kubernetes issue, kube-apiserver and kube-scheduler don't expose kubernetes_build_info on the /metrics endpoint in the 1.5 release. Customers can use Kubernetes_build_info from kube-controller-manager to get similar information like the Kubernetes major version, minor version, and build date.
  • Cloud Run for Anthos on-prem causes an operational outage of GKE on-prem when Cloud Run for Anthos on-prem is enabled in both installation and upgrade of GKE on-prem 1.5.0.

September 17, 2020

GKE on-prem 1.4.3-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.3-gke.3 clusters run on Kubernetes 1.16.11-gke.11.

Fixes:

  • Fixed CVE-2020-14386 described in Security Bulletin.

  • Preflight check for hostname validation was too strict. We updated the hostname validation following the RFC 1123 DNS subdomain definition.

  • There was an issue in the 1.4.0 and 1.4.2 releases where the node problem detector didn't start when the node restarted. This is fixed in this version.

GKE on-prem 1.3.4-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.4-gke.3 clusters run on Kubernetes 1.15.12-gke.15.

Fixes:

August 20, 2020

GKE on-prem 1.4.2-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.2-gke.3 clusters run on Kubernetes 1.16.11-gke.11.

GPU support (beta solution in collaboration with Nvidia)

In partnership with Nvidia, users can now manually attach a GPU to a worker node VM to run GPU workloads. This requires using the open source Nvidia GPU operator.

Note: Manually attached GPUs do not persist through node lifecycle events. You must manually re-attach them. This is a beta solution and can be used for evaluation and proof of concept.

The Ubuntu image is upgraded to include the newest packages.

gkectl delete loadbalancer is updated to support the new version of configuration files for admin and user clusters.

Fixes:

  • Resolved a few incorrect Kubelet Metrics' names collected by Prometheus.
  • Updated restarting machines process during admin cluster upgrade to make the upgrade process more resilient to transient connection issues.
  • Resolved a preflight check OS image validation error when using a non-default vSphere folder for cluster creation; the OS image template is expected to be in that folder.
  • Resolved a gkectl upgrade loadbalancer issue to avoid validating the upgraded SeesawGroup. This fix lets the existing SeesawGroup config be updated without negatively affecting the upgrade process.
  • Resolved an issue where ClientConfig CRD is deleted when the upgrade to the latest version is run multiple times.
  • Resolved a gkectl update credentials vsphere issue where the vsphere-metrics-exporter was using the old credentials even after updating the credentials.
  • Resolved an issue where the VIP preflight check reported a user cluster add-on load balancer IP false positive.
  • Fixed gkeadm updating config after upgrading on Windows, specifically for the gkeOnPremVersion and bundlePath fields.
  • Automatically mount the data disk after rebooting on admin workstations created using gkeadm 1.4.0 and later.
  • Reverted thin disk provisioning change for boot disks in 1.4.0 and 1.4.1 on all normal (excludes test VMs) cluster nodes.
  • Removed vCenter Server access check from user cluster nodes.

July 30, 2020

Anthos clusters on VMware 1.3.3-gke.0 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.3-gke.0 clusters run on Kubernetes 1.15.12-gke.9.

Fixes:

June 25, 2020

Anthos clusters on VMware 1.4.0-gke.13 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.0-gke.13 clusters run on Kubernetes 1.16.8-gke.6.

Updated to Kubernetes 1.16:

Simplified upgrade:

  • This release provides a simplified upgrade experience via the following changes:

    • Automatically migrate information from the previous version of admin workstation using gkeadm.
    • Extend preflight checks to better prepare for upgrades.
    • Support skip version upgrade to enable users to upgrade the cluster from any patch release of a minor release to any patch release of the next minor release. For more information about the detailed upgrade procedure and limitations, see upgrading GKE on-prem.
    • The alternate upgrade scenario for Common Vulnerabilities and Exposures has been deprecated. All upgrades starting with version 1.3.2 need to upgrade the entire admin workstation.
    • The bundled load balancer is now automatically upgraded during cluster upgrade.

Improved installation and cluster configuration:

  • The user cluster node pools feature is now generally available.
  • This release improves the installation experience via the following changes:

    • Supports gkeadm for Windows OS.
    • Introduces a standalone command for creating admin clusters.
  • Introduce a new version of configuration files to separate admin and user cluster configurations and commands. This is designed to provide a consistent user experience and better configuration management.

Improved disaster recovery capabilities:

  • This release provides enhanced disaster recovery functionality to support backup and restore HA user cluster with etcd.
  • This release also provides a manual process to recover a single etcd replica failure in a HA cluster without any data loss.

Enhanced monitoring with Cloud Monitoring (formerly Stackdriver):

  • This release provides better product monitoring and resource usage management via the following changes:

  • Ubuntu Image now conforms with PCI DSS, NIST Baseline High, and DoD SRG IL2 compliance configurations.

Functionality changes:

  • Enabled Horizontal Pod Autoscaler (HPA) for the Istio ingress gateway.
  • Removed ingress controller from admin cluster.
  • Consolidated sysctl configs with Google Kubernetes Engine.
  • Added etcd defrag pod in admin cluster and user cluster, which will be responsible for monitoring etcd's database size and defragmenting it as needed. This helps reclaim etcd database size and recover etcd when its disk space is exceeded.

Support for a vSphere folder (Preview):

  • This release allows customers to install GKE on-prem in a vSphere folder, reducing the scope of the permission required for the vSphere user.

Improved scale:

Fixes:

  • Fixed the issue of the user cluster's Kubernetes API server not being able to connect to kube-etcd after admin nodes and user cluster master reboot. In previous versions, kube-dns in admin clusters was configured through kubeadm. In 1.4, this configuration is moved from kubeadm to bundle, which enables deploying two kube-dns replicas on two admin nodes. As a result, a single admin node reboot/failure won't disrupt user cluster API access.
  • Fixed the issue that controllers such as calico-typha can't be scheduled on an admin cluster master node, when the admin cluster master node is under disk pressure.
  • Resolved pods failure with MatchNodeSelector on admin cluster master after node reboot or kubelet restart.
  • Tuned etcd quota limit settings based on the etcd data disk size and the settings in GKE Classic.

Known issues:

  • If a user cluster is created without any node pool named the same as the cluster, managing the node pools using gkectl update cluster would fail. To avoid this issue, when creating a user cluster, you need to name one node pool the same as the cluster.
  • The gkectl command might exit with panic when converting config from "/path/to/config.yaml" to v1 config files. When that occurs, you can resolve the issue by removing the unused bundled load balancer section ("loadbalancerconfig") in the config file.
  • When using gkeadm to upgrade an admin workstation on Windows, the info file filled out from this template needs to have the line endings converted to use Unix line endings (LF) instead of Windows line endings (CRLF). You can use Notepad++ to convert the line endings.
  • After upgrading an admin workstation with a static IP using gkeadm, you need to run ssh-keygen -R <admin-workstation-ip> to remove the IP from the known hosts, because the host identification changed after VM re-creation.
  • We have added Horizontal Pod Autoscaler for istio-ingress and istio-pilot deployments. HPA can scale up unnecessarily for istio-ingress and istio-pilot deployments during cluster upgrades. This happens because the metrics server is not able to report usage of some pods (newly created and terminating; for more information, see this Kubernetes issue). No actions are needed; scale down will happen five minutes after the upgrade finishes.
  • When running a preflight check for config.yaml that contains both admincluster and usercluster sections, the "data disk" check in the "user cluster vCenter" category might fail with the message: [FAILURE] Data Disk: Data disk is not in a folder. Use a data disk in a folder when using vSAN datastore. User clusters don't use data disks, and it's safe to ignore the failure.
  • When upgrading the admin cluster, the preflight check for the user cluster OS image validation will fail. The user cluster OS image is not used in this case, and it's safe to ignore the "User Cluster OS Image Exists" failure in this case.
  • A Calico-node pod might be stuck in an unready state after node IP changes. To resolve this issue, you need to delete any unready Calico-node pods.
  • The BIG-IP controller might fail to update F5 VIP after any admin cluster master IP changes. To resolve this, you need to use the admin cluster master node IP in kubeconfig and delete the bigip-controller pod from the admin master.
  • The stackdriver-prometheus-k8s pod could enter a crashloop after host failure. To resolve this, you need to remove any corrupted PersistentVolumes that the stackdriver-prometheus-k8s pod uses.
  • After node IP change, pods running with hostNetwork don't get podIP corrected until Kubelet restarts. To resolve this, you need to restart Kubelet or delete those pods using previous IPs.
  • An admin cluster fails after any admin cluster master node IP address changes. To avoid this, you should avoid changing the admin master IP address if possible by using a static IP or a non-expired DHCP lease instead. If you encounter this issue and need further assistance, please contact Google Support.
  • User cluster upgrade might be stuck with the error: Failed to update machine status: no matches for kind "Machine" in version "cluster.k8s.io/v1alpha1". To resolve this, you need to delete the clusterapi pod in the user cluster namespace in the admin cluster.

If your vSphere environment has fewer than three hosts, user cluster upgrade might fail. To resolve this, you need to disable antiAffinityGroups in the cluster config before upgrading the user cluster. For v1 config, please set antiAffinityGroups.enabled = false; for v0 config, please set usercluster.antiaffinitygroups.enabled = false.

Note: Disabling antiAffinityGroups in the cluster config during upgrade is only allowed for the 1.3.2 to 1.4.x upgrade to resolve the upgrade issue; the support might be removed in the future.

May 21, 2020

Workload Identity is now available in Alpha for GKE on-prem. Please contact support if you are interested in a trial of Workload Identity in GKE on-prem.

Preflight check for VM internet and Docker Registry access validation is updated.

Preflight check for internet validation is updated to not follow redirect. If your organization requires outbound traffic to pass through a proxy server, you no longer need to allowlist the following addresses in your proxy server:

  • console.cloud.google.com
  • cloud.google.com

The Ubuntu image is upgraded to include the newest packages.

Upgraded the Istio image to version 1.4.7 to fix a security vulnerability.

Some ConfigMaps in the admin cluster were refactored to Secrets to allow for more granular access control of sensitive configuration data.

April 23, 2020

Preflight check in gkeadm for access to the Cloud Storage bucket that holds the admin workstation OVA.

Preflight check for internet access includes additional URL www.googleapis.com.

Preflight check for test VM DNS availability.

Preflight check for test VM NTP availability.

Preflight check for test VM F5 access.

Before downloading and creating VM templates from OVAs, GKE on-prem checks if the VM template already exists in vCenter.

Rename gkeadm automatically created service accounts.

OVA download displays download progress.

gkeadm prepopulates bundlepath in the seed config on the admin workstation.

Fix for Docker failed DNS resolution on admin workstation at startup.

Admin workstation provisioned by gkeadm uses thin disk provisioning.

Improved user cluster Istio ingress gateway reliability.

Ubuntu image is upgraded to include newest packages.

Update the vCenter credentials for your clusters using the preview command gkectl update credentials vsphere.

The gkeadm configuration file, admin-ws-config.yaml, accepts paths that are prefixed with ~/ for the Certificate Authority (CA) certificate.

Test VMs wait until the network is ready before starting preflight checks.

Improve the error message in preflight check failure for F5 BIG-IP.

Skip VIP check in preflight check in manual load balancing mode.

Upgraded Calico to version 3.8.8 to fix several security vulnerabilities.

Upgraded F5 BIG-IP Controller Docker image to version 1.14.0 to fix a security vulnerability.

Fixed gkeadm admin workstation gcloud proxy username and password configuration.

Fixed the bug that was preventing gkectl check-config from automatically using the proxy that you set in your configuration file when running the full set of preflight validation checks with any GKE on-prem download image.

Fixed an admin workstation upgrade failure when the upgrade process was unable to retrieve SSH keys, which would cause a Golang segmentation fault.

April 01, 2020

When upgrading from version 1.2.2 to 1.3.0 by using the Bundle download in the alternate upgrade method, a timeout might occur that will cause your user cluster upgrade to fail. To avoid this issue, you must perform the full upgrade process that includes upgrading your admin workstation with the OVA file.

March 23, 2020

Anthos clusters on VMware 1.3.0-gke.16 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.0-gke.16 clusters run on Kubernetes 1.15.7-gke.32.

A new installer helps you create and prepare the admin workstation.

Support for vSAN datastore on your admin and user clusters.

In bundled load balancing mode, GKE on-prem provides and manages the Seesaw load balancer.

The Authentication Plugin for Anthos has been integrated into and replaced with the Google Cloud command-line interface, which improves the authentication process and provides the user consent flow through gcloud commands.

Added support for up to 100 nodes per user cluster.

The Cluster CA now signs the TLS certificates that the Kubelet API serves, and the TLS certificates are auto-rotated.

vSphere credential rotation is enabled. Users can now use Solution User Certificates to authenticate to GKE deployed on-prem.

gkectl automatically uses the proxy URL from config.yaml to configure the proxy on the admin workstation.

Preview Feature: Introducing User cluster Nodepools. A node pool is a group of nodes within a cluster that all have the same configuration. In GKE on-prem 1.3.0, node pools are a preview feature in the user clusters. This feature lets users create multiple node pools in a cluster, and update them as needed.

The metric kubelet_containers_per_pod_count is changed to a histogram metric.

Fixed an issue in the vSphere storage plugin that prevented vSphere storage policies from working. This is an example of how you might use this feature.

Prometheus + Grafana: two graphs on the Machine dashboard don't work because of missing metrics: Disk Usage and Disk Available.

All OOM events for containers trigger a SystemOOM event, even if they are container/pod OOM events. To check whether an OOM is actually a SystemOOM, check the kernel log for a message oom-kill:…. If oom_memcg=/ (instead of oom_memcg=/kubepods/…), then it's a SystemOOM. If it's not a SystemOOM, it's safe to ignore.

Affected versions: 1.3.0-gke.16

If you configured a proxy in the config.yaml and also used a bundle other than the full bundle (static IP | DHCP), you must append the --fast flag to run gkectl check-config. For example: gkectl check-config --config config.yaml --fast.

Running the 1.3 version of the gkectl diagnose command might fail if your clusters:

  • Are older than Anthos clusters on VMware version 1.3.
  • Include manually installed add-ons in the kube-system namespace.

February 21, 2020

GKE on-prem version 1.2.2-gke.2 is now available. To upgrade, see Upgrading GKE on-prem.

Improved gkectl check-config to validate any valid Google Cloud service accounts regardless of whether an IAM role is set.

You need to use vSphere provider version 1.15 when using Terraform to create the admin workstation. vSphere provider version 1.16 introduces breaking changes that would affect all Anthos versions.

Skip the preflight check when resuming cluster creation/upgrade.

Resolved a known issue of cluster upgrade when using a vSAN datastore associated with a GKE on-prem version before 1.2

Resolved the following warning when uploading an OS image with the enableMPTSupport configuration flag set. This flag is used to indicate whether the virtual video card supports mediated passthrough.

Warning: Line 102: Unable to parse 'enableMPTSupport' for attribute 'key' on element 'Config'.

Fixed the BigQuery API service name for the preflight check service requirements validation.

Fixed the preflight check to correctly validate the default resource pool in the case where the resourcepool field in the GKE on-prem configuration file is empty.

Fixed a comment about the workernode.replicas field in the GKE on-prem configuration file to say that the minimum number of worker nodes is three.

Fixed gktctl prepare to skip checking the data disk.

Fixed gktctl check-config so that it cleans up F5 BIG-IP resources on exit.

January 31, 2020

GKE on-prem version 1.2.1-gke.4 is now available. To upgrade, see Upgrading GKE on-prem.

This patch version includes the following changes:

Adds searchdomainsfordns field to static IPs host configuration file. searchdomainsfordns is an array of DNS search domains to use in the cluster. These domains are used as part of a domain search list.

Adds a preflight check that validates an NTP server is available.

gkectl check-config now automatically uploads GKE on-prem's node OS image to vSphere. You no longer need to run gkectl prepare before gkectl check-config.

Adds a --cleanup flag for gkectl check-config. The flag's default value is true.

Passing in --cleanup=false preserves the test VM and associated SSH keys that gkectl check-config creates for its preflight checks. Preserving the VM can be helpful for debugging.

Fixes a known issue from 1.2.0-gke.6 that prevented gkectl check-config from performing all of its validations against clusters in nested resource pools or the default resource pool.

Fixes an issue that caused F5 BIG-IP VIP validation to fail due to timing out. The timeout window for F5 BIG-IP VIP validation is now longer.

Fixes an issue that caused cluster upgrades to overwrite changes to add-on configurations.

Fixes the known issue from 1.2.0-gke.6 that affects routing updates due to the route reflector configuration.

January 28, 2020

Affected versions: 1.2.0-gke.6

In some cases, certain nodes in a user cluster fail to get routing updates from the route reflector. Consequently Pods on a node may not be able to communicate with Pods on other nodes. One possible symptom is a kube-dns resolution error.

To work around this issue, follow these steps to create a BGPPeer object in your user cluster.

Save the following BGPPeer manifest as full-mesh.yaml:

apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
  name: full-mesh
spec:
  nodeSelector: "!has(route-reflector)"
  peerSelector: "!has(route-reflector)" 

Create the BGPPeer in your user cluster:

kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] apply -f full-mesh.yaml

Verify that the full-mesh BGPPeer was created:

kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get bgppeer

The output shows full-mesh in the list of BGPPeers:

NAME            AGE
  full-mesh       61s
  gke-group-1     3d21h
  ...

This issue will be fixed in version 1.2.1.

January 03, 2020

Affected versions: 1.1.0-gke.6 and later

Starting with version 1.1.0-gke.6, the gkeconnect.proxy field is no longer in the GKE on-prem configuration file.

If you include gkeconnect.proxy in the configuration file, the gkectl check-config command can fail with this error:

[FAILURE] Config: Could not parse config file: error unmarshaling JSON: 
while decoding JSON: json: unknown field "proxy"

To correct this issue, remove gkeconnect.proxy from the configuration file.

In versions prior to 1.1.0-gke.6, the Connect Agent used the proxy server specified in gkeconnect.proxy. Starting with version 1.1.0-gke.6, the Connect Agent uses the proxy server specified in the global proxy field.

December 20, 2019

Warning: If you installed GKE on-prem versions before 1.2, and you use a vSAN datastore, you should contact Google Support before attempting an upgrade to 1.2.0-gke.6.

GKE on-prem version 1.2.0-gke.6 is now available. To upgrade, see Upgrading GKE on-prem.

This minor version includes the following changes:

The default Kubernetes version for cluster nodes is now version 1.14.7-gke.24 (previously 1.13.7-gke.20).

GKE on-prem now supports vSphere 6.7 Update 3. Read its release notes.

GKE on-prem now supports VMware NSX-T version 2.4.2.

Any user cluster, even your first use cluster, can now use a datastore that is separate from the admin cluster's datastore. If you specify a separate datastore for a user cluster, the user cluster nodes, PersistentVolumes (PVs) for the user cluster nodes, user control plane VMs, and PVs for the user control plane VMs all use the separate datastore.

Expanded preflight checks for validating your GKE on-prem configuration file before your create your clusters. These new checks can validate that your Google Cloud project, vSphere network, and other elements of your environment are correctly configured.

Published basic installation workflow. This workflow offers a simplified workflow for quickly installing GKE on-prem using static IPs.

Published guidelines for installing Container Storage Interface (CSI) drivers. CSI enables using storage devices not natively supported by Kubernetes.

Updated documentation for authenticating using OpenID Connect (OIDC) with the Anthos Plugin for Kubectl. GKE on-prem's OIDC integration is now generally available.

From the admin workstation, gcloud now requires that you log in to gcloud with a Google Cloud user account. The user account should have at least the Viewer IAM role in all Google Cloud projects associated with your clusters.

You can now create admin and user clusters separately from one another.

Fixes an issue that prevented resuming cluster creation for HA user clusters.

Affected versions: 1.1.0-gke.6, 1.2.0-gke.6

The stackdriver.proxyconfigsecretname field was removed in version 1.1.0-gke.6. GKE on-prem's preflight checks will return an error if the field is present in your configuration file.

To work around this, before you install or upgrade to 1.2.0-gke.6, delete the proxyconfigsecretname field from your configuration file.

Affected versions: 1.2.0-6-gke.6

In user clusters, Prometheus and Grafana get automatically disabled during upgrade. However, the configuration and metrics data are not lost. In admin clusters, Prometheus and Grafana stay enabled.

To work around this issue, after the upgrade, open monitoring-sample for editing and set enablePrometheus to true:

1.kubectl edit monitoring --kubeconfig [USER_CLUSTER_KUBECONFIG] \ -n kube-system monitoring-sample

2. Set the field enablePrometheus to true.

Affected versions: All versions

Before version 1.2.0-gke.6, a known issue prevents Stackdriver from updating its configuration after cluster upgrades. Stackdriver still references an old version, which prevents Stackdriver from receiving the latest features of its telemetry pipeline. This issue can make it difficult for Google Support to troubleshoot clusters.

After you upgrade clusters to 1.2.0-gke.6, run the following command against admin and user clusters:

kubectl --kubeconfig=[KUBECONFIG] \
-n kube-system --type=json patch stackdrivers stackdriver \
-p '[{"op":"remove","path":"/spec/version"}]'

where [KUBECONFIG] is the path to the cluster's kubeconfig file.

November 19, 2019

GKE On-Prem version 1.1.2-gke.0 is now available. To download version 1.1.2-gke.0's OVA, gkectl, and upgrade bundle, see Downloads. Then, see Upgrading admin workstation and Upgrading clusters.

This patch version includes the following changes:

New Features

Published Managing clusters.

Fixes

Fixed the known issue from November 5.

Fixed the known issue from November 8.

Known Issues

If you are running multiple data centers in vSphere, running gkectl diagnose cluster might return the following error, which you can safely ignore:

Checking storage...FAIL path '*' resolves to multiple datacenters

If you are running a vSAN datastore, running gkectl diagnose cluster might return the following error, which you can safely ignore:

PersistentVolume [NAME]: virtual disk "[[DATASTORE_NAME]] [PVC]" IS NOT attached to machine "[MACHINE_NAME]" but IS listed in the Node.Status

November 08, 2019

In GKE On-Prem version 1.1.1-gke.2, a known issue prevents creation of clusters configured to use a Docker registry. You configure a Docker registry by populating the GKE On-Prem configuration file's privateregistryconfig field. Cluster creation fails with an error such as Failed to create root cluster: could not create external client: could not create external control plane: docker run error: exit status 125

A fix is targeted for version 1.1.2. In the meantime, if you want to create a cluster configured to use a Docker registry, pass in the --skip-validation-docker flag to gkectl create cluster.

November 05, 2019

GKE On-Prem's configuration file has a field, vcenter.datadisk, which looks for a path to a virtual machine disk (VMDK) file. During installation, you choose a name for the VMDK. By default, GKE On-Prem creates a VMDK and saves it to the root of your vSphere datastore.

If you are using a vSAN datastore, you need to create a folder in the datastore in which to save the VMDK. You provide the full path to the field—for example, datadisk: gke-on-prem/datadisk.vmdk—and GKE On-Prem saves the VMDK in that folder.

When you create the folder, vSphere assigns the folder a universally unique identifier (UUID). Although you provide the folder path to the GKE On-Prem config, the vSphere API looks for the folder's UUID. Currently, this mismatch can cause cluster creation and upgrades to fail.

A fix is targeted for version 1.1.2. In the meantime, you need to provide the folder's UUID instead of the folder's path. Follow the workaround instructions currently available in the upgrading clusters and installation topics.

October 25, 2019

GKE On-Prem version 1.1.1-gke.2 is now available. To download version 1.1.1-gke.2's OVA, gkectl, and upgrade bundle, see Downloads. Then, see Upgrading admin workstation and Upgrading clusters.

This patch version includes the following changes:

New Features

Action required: This version upgrades the minimum gcloud version on the admin workstation to 256.0.0. You should upgrade your admin workstation. Then, you should upgrade your clusters.

The open source CoreOS toolbox is now included in all GKE On-Prem cluster nodes. This suite of tools is useful for troubleshooting node issues. See Debugging node issues using toolbox.

Fixes

Fixed an issue that prevented clusters configured with OIDC from being upgraded.

Fixed CVE-2019-11253 described in Security bulletins.

Fixed an issue that caused cluster metrics to be lost due to a lost connection to Google Cloud. When a GKE On-Prem cluster's connection to Google Cloud is lost for a period of time, that cluster's metrics are now fully recovered.

Fixed an issue that caused ingestion of admin cluster metrics to be slower than ingesting user cluster metrics.

Known Issues

For user clusters that are using static IPs and a different network than their admin cluster: If you overwrite the user cluster's network configuration, the user control plane might not be able to start. This occurs because it's using the user cluster's network, but allocates an IP address and gateway from the admin cluster.

As a workaround, you can update each user control plane's MachineDeployment specification to use the correct network. Then, delete each user control plane Machine, causing the MachineDeployment to create new Machines:

  1. List MachineDeployments in the admin cluster

    kubectl get machinedeployments --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]
    
  2. Update a user control plane MachineDeployment from your shell

    kubectl edit machinedeployment --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] [MACHINEDEPLOYMENT_NAME]
    
  3. List Machines in the admin cluster

    kubectl get machines --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]
    
  4. Delete user control plane Machines in the admin cluster

    kubectl delete machines --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] [MACHINE_NAME]
    

September 26, 2019

GKE On-Prem version 1.1.0-gke.6 is now available. To download version 1.1.0-gke.6's gkectl and upgrade bundle, see Downloads. Then, see Upgrading clusters.

This minor version includes the following changes:

The default Kubernetes version for cluster nodes is now version 1.13.7-gke.20 (previously 1.12.7-gke.19).

Action required: As of version 1.1.0-gke.6, GKE On-Prem now creates vSphere Distributed Resource Scheduler (DRS) rules for your user cluster's nodes (vSphere VMs), causing them to be spread across at least three physical hosts in your datacenter.

This feature is enabled by default for all new and existing user clusters running version 1.1.0-gke.6.

The feature requires that your vSphere environment meet the following conditions:

  • VMware DRS must be enabled. VMware DRS requires vSphere Enterprise Plus license edition. To learn how to enable DRS, see Creating a DRS Cluster.
  • The vSphere user account provided in your GKE On-Prem configuration file's vcenter field must have the Host.Inventory.EditCluster permission.
  • There are at least three physical hosts available.

If you do not want to enable this feature for your existing user clusters—for example, if you don't have enough hosts to accommodate the feature—perform the following steps before you upgrade your user clusters:

  1. Open your existing GKE On-Prem configuration file.
  2. Under the usercluster specification, add the antiaffinitygroups field as described in the antiaffinitygroups documentation: usercluster: ... antiaffinitygroups: enabled: false

  3. Save the file.

  4. Use the configuration file to upgrade. Your clusters are upgraded, but the feature is not enabled.

You can now set the default storage class for your clusters.

You can now use Container Storage Interface (CSI) 1.0 as a storage class for your clusters.

You can now delete broken or unhealthy user clusters with gkectl delete cluster --force

You can now diagnose node issues using the debug-toolbox container image.

You can now skip validatations run by gkectl commands.

The tarball that gkectl diagnose snapshot creates now includes a log of the command's output by default.

Adds gkectl diagnose snapshot flag --seed-config. When you pass the flag, it includes your clusters' GKE On-Prem configuration file in the tarball procduced by snapshot.

The gkeplatformversion field has been removed from the GKE On-Prem configuration file. To specify a cluster's version, provide the version's bundle to the bundlepath field.

You need to add the vSphere permission, Host.Inventory.EditCluster, before you can use antiaffinitygroups.

You now specify a configuration file in gkectl diagnose snapshot by passing the --snapshot-config (previously --config). See Diagnosing cluster issues.

You now capture your cluster's configuration file with gkectl diagnose snapshot by passing --snapshot-config (previously --config). See Diagnosing cluster issues.

gkectl diagnose commands now return an error if you provide a user cluster's kubeconfig, rather than an admin cluster's kubeconfig.

Cloud Console now notifies you when an upgrade is available for a registered user cluster.

A known issue prevents version 1.0.11, 1.0.1-gke.5, and 1.0.2-gke.3 clusters using OIDC from being upgraded to version 1.1. A fix is targeted for version 1.1.1. If you configured a version 1.0.11, 1.0.1-gke.5, or 1.0.2-gke.3 cluster with OIDC, you are not able to upgrade it. Create a version 1.1 cluster by following Installing GKE On-Prem.

August 22, 2019

GKE On-Prem version 1.0.2-gke.3 is now available. This patch release includes the following changes:

Seesaw is now supported for manual load balancing.

You can now specify a different vSphere network for admin and user clusters.

You can now delete user clusters using gkectl. See Deleting a user cluster.

gkectl diagnose snapshot now gets logs from the user cluster control planes.

GKE On-Prem OIDC specification has been updated with several new fields: kubectlredirecturl, scopes, extraparams, and usehttpproxy.

Calico updated to version 3.7.4.

Stackdriver Monitoring's system metrics prefixed changed from external.googleapis.com/prometheus/ to kubernetes.io/anthos/. If you are tracking metrics or alerts, update your dashbaords with the next prefix.

July 30, 2019

GKE On-Prem version 1.0.1-gke.5 is now available. This patch release includes the following changes:

New Features

Changes

gkectl check-config now also checks node IP availability if you are using static IPs.

gkectl prepare now checks if a VM exists and is marked as a template in vSphere before attempting to upload the VM's OVA image.

Adds support for specifying a vCenter cluster, and resource pool in that cluster.

Upgrades F5 BIG-IP controller to version 1.9.0.

Upgrades Istio ingress controller to version 1.2.2.

Fixes

Fixes registry data persistence issues with the admin workstation's Docker registry.

Fixes validation that checks whether a user cluster's name is already in use.

July 25, 2019

GKE On-Prem version 1.0.11 is now available.

June 17, 2019

GKE On-Prem is now generally available. Version 1.0.10 includes the following changes:

Upgrading from beta-1.4 to 1.0.10

Before upgrading your beta clusters to the first general availability version, perform the steps described in Installing GKE On-Prem, and review the following points:

  • If you are running a beta version before beta-1.4, be sure to upgrade to beta-1.4 first.

  • If your beta clusters are running their own L4 load balancers (not the default, F5 BIG-IP), you need to delete and recreate your clusters to run the latest GKE On-Prem version.

  • If your clusters were upgraded to beta-1.4 from beta-1.3, run the following command for each user cluster before upgrading:

    kubectl delete crd networkpolicies.crd.projectcalico.org

  • vCenter certificate verification is now required. (vsphereinsecure is no longer supported.) If you're upgrading your beta 1.4 clusters to 1.0.10, you need to provide a vCenter trusted root CA public certificate in the upgrade configuration file.

  • You need to upgrade all of your running clusters. For this upgrade to succeed, your clusters can't run in a mixed version state.

  • You need to upgrade your admin clusters to the latest version first, then upgrade your user clusters.

New Features

You can now enable the Manual load balancing mode to configure a L4 load balancer. You can still choose to use the default load balancer, F5 BIG-IP.

GKE On-Prem's configuration-driven installation process has been updated. You now declaratively install using a singular configuration file.

Adds gkectl create-config, which generates a configuration file for installing GKE On-Prem, upgrading existing clusters, and for creating additional user clusters in an existing installation. This replaces the installation wizard and create-config.yaml from previous versions. See the updated documentation for installing GKE On-Prem.

Adds gkectl check-config, which validates the GKE On-Prem configuration file. See the updated documentation for installing GKE On-Prem.

Adds an optional --validate-attestations flag to gkectl prepare. This flag verifies that the container images included in your admin workstationwere built and signed by Google and are ready for deployment. See the updated documentation for installing GKE On-Prem.

Changes

Upgrades Kubernetes version to 1.12.7-gke.19. You can now upgrade your clusters to this version. You can no longer create clusters that run Kubernetes version 1.11.2-gke.19.

We recommend upgrading your admin cluster before you upgrade your user clusters.

Upgrades Istio ingress controller to version 1.1.7.

vCenter certificate verification is now required. vsphereinsecure is no longer supported). You provide the certificate in the GKE On-Prem configration file's cacertpath field.

When a client calls the vCenter server, the vCenter server must prove its identity to the client by presenting a certificate. That certificate must be signed by a certificate authority (CA). The certificate is must not be self-signed.

If you're upgrading your beta 1.4 clusters to 1.0.10, you need to provide a vCenter trusted root CA public certificate in the upgrade configuration file.

Known Issues

Upgrading clusters can cause disruption or downtime for workloads that use PodDisruptionBudgets (PDBs).

You might not be able to upgrade beta clusters that use the Manual load balancing mode to GKE On-Prem version 1.0.10. To upgrade and continue using your own load balancer with these clusters, you need to recreate the clusters.

May 24, 2019

GKE On-Prem beta version 1.4.7 is now available. This release includes the following changes:

New Features

In the gkectl diagnose snapshot command, the --admin-ssh-key-path parameter is now optional.

Changes

On May 8, 2019, we introduced a change to Connect, the service that enables you to interact with your GKE On-Prem clusters using Cloud Console. To use the new Connect agent, you must re-register your clusters with Cloud Console, or you must upgrade to GKE On-Prem beta-1.4.

Your GKE On-Prem clusters and the workloads running on them will continue to operate uninterrupted. However, your clusters will not be visible in Cloud Console until you re-register them or upgrade to beta-1.4.

Before you re-register or upgrade, make sure your service account has the gkehub.connect role. Also, if your service account has the old clusterregistry.connect role, it's a good idea to remove that role.

Grant your service account the gkehub.connect role:

gcloud projects add-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/gkehub.connect"

If your service account has the old clusterregistry.connect role, remove the old role:

gcloud projects remove-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/clusterregistry.connect"

Re-register your cluster, or upgrade to GKE On-Prem beta-1.4.

To re-register your cluster:

gcloud alpha container hub register-cluster [CLUSTER_NAME] \
    --context=[USER_CLUSTER_CONTEXT] \
    --service-account-key-file=[LOCAL_KEY_PATH] \
    --kubeconfig-file=[KUBECONFIG_PATH] \
    --project=[PROJECT_ID]

To upgrade to GKE On-Prem beta-1.4:

gkectl upgrade --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]

Known Issues

There is an issue that prevents the Connect agent from being updated to the new version during an upgrade. To work around this issue, run the following command after you upgrade a cluster:

kubectl delete pod gke-connect-agent-install -n gke-connect

May 13, 2019

Known Issues

Clusters upgraded from version beta-1.2 to beta-1.3 might be affected by a known issue that damages the cluster's configuration file and prevents future cluster upgrades. This issue affects all future cluster upgrades.

You can resolve this issue by deleting and recreating clusters upgraded from beta-1.2 to beta-1.3.

To resolve the issue without deleting and recreating the cluster, you need to re-encode and apply each cluster's Secrets. Perform the following steps:

  1. Get the contents of the create-config Secrets stored in the admin cluster. This must be done for the create-config Secret in the kube-system namespace, and for the create-config Secrets in each user cluster's namespace:

    kubectl get secret create-config -n [USER_CLUSTER_NAME] -o jsonpath={.data.cfg} | base64 -d > [USER_CLUSTER_NAME]_create_secret.yaml

    For example:

    kubectl get secret create-config -n kube-system -o jsonpath={.data.cfg} | base64 -d > kube-system_create_secret.yaml

  2. For each user cluster, open the [USER_CLUSTER_NAME]_create_secret.yaml file in an editor.

    If the values for registerserviceaccountkey and connectserviceaccountkey are not REDACTED, no further action is required: the Secrets do not need to be re-encoded and written to the cluster.

  3. Open the original create_config.yaml file in another editor.

  4. In [USER_CLUSTER_NAME]_create_secret.yaml, replace the registerserviceaccountkey and connectserviceaccountkey values with the values from the original create_config.yaml file. Save the changed file.

  5. Repeat steps 2-4 for each [USER_CLUSTER_NAME]_create_secret.yaml, and for the kube-system_create_secret.yaml file.

  6. Base64-encode each [USER_CLUSTER_NAME]_create_secret.yaml file and the kube-system_create_secret.yaml file:

    cat [USER_CLUSTER_NAME]_create_secret.yaml | base64 > [USER_CLUSTER_NAME]_create_secret_create_secret.b64

    cat kube-system-cluster_create_secret.yaml | base64 > kube-system-cluster_create_secret.b64

  7. Replace the data[cfg] field in each Secret in the cluster with the contents of the corresponding file:

    kubectl edit secret create-config -n [USER_CLUSTER_NAME]
      # kubectl edit opens the file in the shell's default text editor
      # Open `first-user-cluster_create_secret.b64` in another editor, and replace
      # the `cfg` value with the copied value
      # Make sure the copied string has no newlines in it
    
  8. Repeat step 7 for each [USER_CLUSTER_NAME]_create_secret.yaml Secret, and for the kube-system_create_secret.yaml Secret.

  9. To ensure that the update was successful, repeat step 1.

May 07, 2019

GKE On-Prem beta version 1.4.1 is now available. This release includes the following changes:

New Features

In the gkectl diagnose snapshot command, the --admin-ssh-key-path parameter is now optional.

Changes

On May 8, 2019, we introduced a change to Connect, the service that enables you to interact with your GKE On-Prem clusters using Cloud Console. To use the new Connect agent, you must re-register your clusters with Cloud Console, or you must upgrade to GKE On-Prem beta-1.4.

Your GKE On-Prem clusters and the workloads running on them will continue to operate uninterrupted. However, your clusters will not be visible in Cloud Console until you re-register them or upgrade to beta-1.4.

Before your re-register or upgrade, make sure your service account has the gkehub.connect role. Also, if your service account has the old clusterregistry.connect role, it's a good idea to remove that role.

Grant your service account the gkehub.connect role:

gcloud projects add-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/gkehub.connect"

If your service account has the old clusterregistry.connect role, remove the old role:

gcloud projects remove-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/clusterregistry.connect"

Re-register you cluster, or upgrade to GKE On-Prem beta-1.4.

To re-register your cluster:

gcloud alpha container hub register-cluster [CLUSTER_NAME] \
    --context=[USER_CLUSTER_CONTEXT] \
    --service-account-key-file=[LOCAL_KEY_PATH] \
    --kubeconfig-file=[KUBECONFIG_PATH] \
    --project=[PROJECT_ID]

To upgrade to GKE On-Prem beta-1.4:

gkectl upgrade --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]

Known Issues

There is an issue that prevents the Connect agent from being updated to the new version during an upgrade. To work around this issue, run the following command after you upgrade a cluster:

kubectl delete pod gke-connect-agent-install -n gke-connect

April 25, 2019

GKE On-Prem beta version 1.3.1 is now available. This release includes the following changes:

New Features

The gkectl diagnose snapshot command now has a --dry-run flag.

The gkectl diagnose snapshot command now supports four scenarios.

The gkectl diagnose snapshot command now supports regular expressions for specifying namespaces.

Changes

Istio 1.1 is now the default ingress controller. The ingress controller runs in the gke-system namespace for both admin and user clusters. This enables easier TLS management for Ingress. To enable ingress, or to re-enable ingress after an upgrade, follow the instructions under Enabling ingress.

The gkectl tool no longer uses Minikube and KVM for bootstrapping. This means you do not have to enable nested virtualization on your admin workstation VM.

Known Issues

GKE On-Prem's ingress controller uses Istio 1.1 with automatic Secret discovery. However, the node agent for Secret discovery may fail to get Secret updates after Secret deletion. So avoid deleting Secrets. If you must delete a Secret and Ingress TLS fails afterwards, manually restart the Ingress Pod in the gke-system namespace.

April 11, 2019

GKE On-Prem beta version 1.2.1 is now available. This release includes the following changes:

New Features

GKE On-Prem clusters now automatically connect back to Google using Connect.

You can now run up to three control planes per user cluster.

Changes

gkectl now validates vSphere and F5 BIG-IP credentials creating clusters.

Known Issues

A regression causes gkectl diagnose snapshot commands to use the wrong SSH key, which prevents the command from collecting information from user clusters. As a workaround for support cases, you might need to SSH into individual user cluster nodes and manually gather data.

April 02, 2019

GKE On-Prem beta version 1.1.1 is now available. This release includes the following changes:

New Features

You now install GKE On-Prem with an Open Virtual Appliance (OVA), a pre-configured virtual machine image that includes several command-line interface tools. This change makes installations easier and removes a layer of virtualization. You no longer need to run gkectl inside a Docker container.

If you installed GKE On-Prem versions before beta-1.1.1, you should create a new admin workstation following the documented instructions. After you install the new admin workstation, copy over any SSH keys, configuration files, kubeconfigs, and any other files you need, from your previous workstation to the new one.

Added documentation for backing up and restoring clusters.

You can now configure authentication for clusters using OIDC and ADFS. To learn more, refer to Authenticating with OIDC and AD FS and Authentication.

Changes

You now must use an admin cluster's private key to run gkectl diagnose snapshot.

Added a configuration option during installation for deploying multi-master user clusters.

Connect documentation has been migrated.

Fixes

Fixed an issue where cluster networking could be interrupted when a node is removed unexpectedly.

Known Issues

GKE On-Prem's Configuration Management has been upgraded from version 0.11 to 0.13. Several components of the system have been renamed. You need to take some steps to clean up the previous versions' resources and install a new instance.

If you have an active instance of Configuration Management:

  1. Uninstall the instance:

    kubectl -n=nomos-system delete nomos --all

  2. Make sure that the instance's namespace has no resources:

    kubectl -n nomos-system get all

  3. Delete the namespace:

    kubectl delete ns nomos-system

  4. Delete the CRD:

    kubectl delete crd nomos.addons.sigs.k8s.io

  5. Delete all kube-system resources for the operator:

    kubectl -n kube-system delete all -l k8s-app=nomos-operator

If you don't have an active instance of Configuration Management:

  1. Delete the Configuration Management namespace:

    kubectl delete ns nomos-system

  2. Delete the CRD:

    kubectl delete crd nomos.addons.sigs.k8s.io

  3. Delete all kube-system resources for the operator:

    kubectl -n kube-system delete all -l k8s-app=nomos-operator

March 12, 2019

GKE On-Prem beta version 1.0.3 is now available. This release includes the following changes:

Fixes

Fixed an issue that caused Docker certificates to be saved to the wrong location.

March 04, 2019

GKE On-Prem beta version 1.0.2 is now available. This release includes the following changes:

New Features

You can now run gkectl version to check which version of gkectl you're running.

You can now upgrade user clusters to future beta versions.

Anthos Config Management version 0.11.6 is now available.

Stackdriver Logging is now enabled on each node. By default, the logging agent replicates logs to your project for only control plane services, cluster API, vSphere controller, Calico, BIG-IP controller, Envoy proxy, Connect, Anthos Config Management, Prometheus and Grafana services, Istio control plane, and Docker. Application container logs are excluded by default, but can be optionally enabled.

Stackdriver Prometheus Sidecar captures metrics for the same components as the logging agent.

Kubernetes Network Policies are now supported.

Changes

You can now update IP blocks in the cluster specification to expand the IP range for a given cluster.

If clusters you installed during alpha were disconnected from Google after beta, you might need to connect them again. Refer to Registering a cluster.

Getting started has been updated with steps for activating your service account and running gkectl prepare.

gkectl diagnose snapshot now only collects configuration data and excludes logs.  This tool is used to capture details of your environment prior to opening a support case.

Support for optional SNAT pool name configuration for F5 BIG-IP at cluster-creation time. This can be used to configure "--vs-snat-pool-name" value on F5 BIG-IP controller.

You now need to provide a VIP for add-ons that run in the admin cluster.

Fixes

Cluster resizing operations improved to prevent unintended node deletion.

February 07, 2019

GKE On-Prem alpha version 1.3 is now available. This release includes the following changes:

New Features

During installation, you can now provide YAML files with nodeip blocks to configure static IPAM.

Changes

You now need to provision a 100GB disk in vSphere Datastore. GKE On-Prem uses the disk to store some of its vital data, such as etcd. See Data center requirements.

You can now only provide lowercase hostnames to nodeip blocks.

GKE On-Prem now enforces unique names for user clusters.

Metrics endpoints and APIs that use Istio endpoints are now secured using mTLS and role-based access control.

External communication by Grafana is disabled.

Improvements to Prometheus and Alertmanager health-checking.

Prometheus now uses secured port for scraping metrics.

Several updates to Grafana dashboards.

Known Issues

If your vCenter user account uses a format like DOMAINUSER, you might need to escape the backslash (DOMAIN\USER). Be sure to do this when prompted to enter the user account during installation.

January 23, 2019

GKE On-Prem alpha version 1.2.1 is now available. This release includes the following changes:

New Features

You can now use gkectl to delete admin clusters.

Changes

gkectl diagnose snapshot commands now allow you to specify nodes while capturing snapshots of remote command results and files.

January 14, 2019

GKE On-Prem alpha version 1.1.2 is now available. This release includes the following changes:

New Features

You can now use the gkectl prepare command to pull and push GKE On-Prem's container images, which deprecates the populate_registry.sh script.

gkectl prepare now prompts you to enter information about your vSphere cluster and resource pool.

You can now use the gkectl create command to create and add user clusters to existing admin control planes by passing in an existing kubeconfig file when prompted during cluster creation.

You can now pass in a Ingress TLS Secret for admin and user clusters at cluster creation time. You will see the following new prompt:

Do you want to use TLS for Admin Control Plane/User Cluster ingress?

Providing the TLS Secret and certs allows gkectl to set up the Ingress TLS. HTTP is not automatically disabled with TLS installation.

Changes

GKE On-Prem now runs Kubernetes version 1.11.2-gke.19.

The default footprint for GKE On-Prem has changed:

  • Minimum memory requirement for user cluster nodes is now 8192M.

GKE On-Prem now runs minikube version 0.28.0.

GKE Policy Management has been upgraded to version 0.11.1.

gkectl no longer prompts you to provide a proxy configuration by default.

There are three new ConfigMap resources in the user cluster namespace: cluster-api-etcd-metrics-config, kube-etcd-metrics-config, and kube-apiserver-config. GKE On-Prem uses these files to quickly bootstrap the metrics proxy container.

kube-apiserver events now live in their own etcd. You can see kube-etcd-events in your user cluster's namespace.

Cluster API controllers now use leader election.

vSphere credentials are now pulled from credential files.

gkectl diagnose commands now work with both admin and user clusters.

gkectl diagnose snapshot can now take snapshots of remote files on the node, results of remote commands on the nodes, and Prometheus queries.

gkectl diagnose snapshot can now take snapshots in multiple parallel threads.

gkectl diagnose snapshot now allows you to specify words to be excluded from the snapshot results.

Fixes

Fixed issues with minikube caching that caused unexpected network calls.

Fixed an issue with pulling F5 BIG-IP credentials. Credentials are now read from a credentials file instead of using environment variables.

Known Issues

You might encounter the following govmomi warning when you run gkectl prepare:

Warning: Line 102: Unable to parse 'enableMPTSupport' for attribute 'key' on element 'Config'

Resizing user clusters can cause inadvertent node deletion or recreation.

PersistentVolumes can fail to mount, producing the error devicePath is empty. As a workaround, delete and re-create the associated PersistentVolumeClaim.

Resizing IPAM address blocks if using static IP allocation for nodes, is not supported in alpha. To work around this, consider allocating more IP addresses than you currently need.

On slow disks, VM creation can timeout and cause deployments to fail. If this occurs, delete all resources and try again.

December 19, 2018

GKE On-Prem alpha 1.0.4 is now available. This release includes the following changes:

Fixes

The vulnerability caused by CVE-2018-1002105 has been patched.

November 30, 2018

GKE On-Prem alpha 1.0 is now available. The following changes are included in this release:

Changes

GKE On-Prem alpha 1.0 runs Kubernetes 1.11.

The default footprint for GKE On-Prem has changed:

  • The admin control plane runs three nodes, which use 4 CPUs and 16GB memory.
  • The user control plane runs one node that uses 4 CPUs 16GB memory.
  • User clusters run a minimum of three nodes, which use 4 CPUs and 16GB memory.

Support for high-availability Prometheus setup.

Support for custom Alert Manager configuration.

Prometheus upgraded from 2.3.2 to 2.4.3.

Grafana upgraded from 5.0.4 to 5.3.4.

kube-state-metrics upgraded from 1.3.1 to 1.4.0.

Alert Manager upgraded from 1.14.0 to 1.15.2.

node_exporter upgraded from 1.15.2 to 1.16.0.

Fixes

The vulnerability caused by CVE-2018-1002103 has been patched.

Known Issues

PersistentVolumes can fail to mount, producing the error devicePath is empty. As a workaround, delete and re-create the associated PersistentVolumeClaim.

Resizing IPAM address blocks if using static IP allocation for nodes, is not supported in alpha. To work around this, consider allocating more IP addresses than you currently need.

GKE On-Prem alpha 1.0 does not yet pass all conformance tests.

Only one user cluster per admin cluster can be created. To create additional user clusters, create another admin cluster.

October 31, 2018

GKE On-Prem EAP 2.1 is now available. The following changes are included in this release:

Changes

When you create admin and user clusters at the same time, you can now re-use the admin cluster's F5 BIG-IP credentials to create the user cluster. Also, the CLI now requires that BIG-IP credentials be provided; this requirement cannot be skipped using --dry-run.

F5 BIG-IP controller upgraded to use the latest OSS version, 1.7.0.

To improve stability for slow vSphere machines, cluster machine creation timeout is now 15 minutes (previously five minutes).

October 17, 2018

GKE On-Prem EAP 2.0 is now available. The following changes are included in this release:

Changes

Support for GKE Connect.

Support for Monitoring.

Support for installation using private registries.

Support for front-ending the L7 load-balancer as a L4 VIP on F5 BIG-IP.

Support for static IP allocation for nodes during cluster bootstrap.

Known Issues

Only one user cluster per admin cluster can be created. To create additional user clusters, create another admin cluster.

Cluster upgrades are not supported in EAP 2.0.

On slow disks, VM creation can timeout and cause deployments to fail. If this occurs, delete all resources and try again.

As part of the cluster bootstrapping process, a short-lived minikube instance is run. The minikube version used has security vulnerability CVE-2018-1002103.