Release notes

This page documents production updates to Anthos clusters on VMware (GKE on-prem). You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

See also:

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or you can programmatically access release notes in BigQuery.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly: https://cloud.google.com/feeds/gkeonprem-release-notes.xml

August 12, 2022

Anthos clusters on VMware 1.10.6-gke.36 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.6-gke.36 runs on Kubernetes 1.21.14-gke.2100.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

  • Fixed the issue where mounting emptyDir volume with exec option on Container-Optimized OS (COS) nodes fails with permission error.
  • Fixed the issue where enabling and disabling cluster autoscaler sometimes prevents nodepool replicas from being updated.
  • Fixed the following vulnerabilities:

August 02, 2022

A new vulnerability CVE-2022-2327 has been discovered in the Linux kernel that can lead to local privilege escalation. This vulnerability allows an unprivileged user to achieve a full container breakout to root on the node.

For more information, see the GCP-2022-018 security bulletin.

July 27, 2022

Anthos clusters on VMware 1.11.2-gke.53 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.11.2-gke.53 runs on Kubernetes 1.22.8-gke.204.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

  • Fixed a known issue in which the cluster backup feature affected the inclusion of always-on secrets encryption keys in the backup.
  • Fixed a known issue of high-resource usage when AIDE runs as a cron job, by disabling AIDE by default. This fix affects compliance with CIS L1 Server benchmark 1.4.2: Ensure filesystem integrity is regularly checked. Customers can opt in to re-enable the AIDE if needed. To re-enable the AIDE cron job, see Configure AIDE cron job.
  • Fixed a known issue where gke-metrics-agent DaemonSet has frequent CrashLoopBackOff errors by upgrading to gke-metrics-agent v1.1.0-anthos.14.
  • Fixed the following vulnerabilities:

July 19, 2022

Anthos clusters on VMware 1.9.7-gke.8 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.7-gke.8 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

  • Fixed a known issue in which the cluster backup feature affected the inclusion of always-on secrets encryption keys in the backup.
  • Fixed a known issue of high-resource usage when AIDE runs as a cron job, by disabling AIDE by default. This fix affects compliance with CIS L1 Server benchmark 1.4.2: Ensure filesystem integrity is regularly checked. Customers can opt in to re-enable the AIDE if needed. To re-enable the AIDE cron job, see Configure AIDE cron job.
  • Fixed the following vulnerabilities:

July 07, 2022

Anthos clusters on VMware v1.12.0-gke.446 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware v1.12.0-gke.446 runs on Kubernetes v1.23.5-gke.1504.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.12, 1.11, and 1.10.

Announcements

  • vSphere releases for versions lower than version 7.0 Update 2 are deprecated in Kubernetes 1.24. VMware's General Support for vSphere 6.7 will end on October 15, 2022. Customers are recommended to upgrade vSphere (both ESXi and vCenter) to version 7.0 Update 2 or above. vSphere versions less than version 7.0 Update 2 will no longer be supported in Anthos clusters on VMware in an upcoming version. You must upgrade vSphere to 7.0 Update 2 or above before you can upgrade to Anthos clusters on VMware 1.13.0.

  • Beta versions of VolumeSnapshot CRDs are deprecated in Kubernetes v1.20 and are unsupported in the Kubernetes v1.24 release.
    The upcoming Anthos clusters on VMware version 1.13 release will no longer serve v1beta1 VolumeSnapshot CRDs. Make sure that you migrate manifests and API clients to use snapshot.storage.k8s.io/v1 API version, available since Kubernetes v1.20. All existing persisted objects remain accessible via the new snapshot.storage.k8s.io/v1 APIs.

  • The dockershim component in Kubernetes enables cluster nodes to use the Docker Engine container runtime. However, Kubernetes 1.24 removed the dockershim component. Starting from Anthos clusters on VMware version 1.12.0, you cannot create new clusters that use the Docker Engine container runtime. All new clusters must use the default container runtime Containerd. A cluster update will also be blocked if you want to switch from containerd node pool to docker node pool, or if you add new docker node pools. For existing version 1.11.x clusters with docker node pools, you can continue upgrading it to version 1.12.0, but you must update the node pools to use containerd before you can upgrade to version 1.13.0 in the future.

Breaking changes:

In Kubernetes 1.23, the rbac.authorization.k8s.io/v1alpha1 API version is removed. Instead, use the rbac.authorization.k8s.io/v1 API. See the Kubernetes 1.23.5 release notes.

Platform enhancements:

  • General Availability (GA): Separate vSphere data centers for the admin cluster and the user clusters are supported.
  • GA: Anthos Identity service LDAP authentication is supported.
  • GA: User cluster control-plane node and admin cluster add-on node auto sizing is supported.

Security enhancements:

  • Preview: Preparing credentials for user clusters as Kubernetes secrets before cluster creation.

    • The credential preparation feature prepares the credentials before a user cluster is created. After credential preparation, user cluster credentials are saved as versioned Kubernetes secrets in the admin cluster, and the template which is used for credential preparation can be deleted from the admin workstation. When creating a user cluster, it only needs to configure the namespace and the versions of the prepared secrets in the user cluster config file. Using this feature can help protect user cluster credentials.
  • Preview: The gkectl update credentials command supports rotating the component access SA key for both the admin and the user clusters.

  • The COS node image shipped in version 1.12.0 is qualified with the Center for Internet Security (CIS) L1 Server Benchmark.

  • The gkectl update credentials command supports register service account key rotation.

Cluster lifecycle Improvements:

  • Preview: You can configure the time duration of Pod Disruption Budget (PDB) violation timeout during a node drain. The default behavior is to always block on a PDB violation and to not force-delete pods during node drain, to avoid unexpected data corruption, and this default is unchanged. In certain cases, when users want to unblock the PDB violation deadlock with the bound timeout during cluster upgrade, they can apply the special annotation onprem.cluster.gke.io/pdb-violation-timeout: TIMEOUT on the machine objects.

Simplify day-2 operations

  • Preview: Launched the enablement of Google Cloud Managed Service for Prometheus to track metrics in Anthos on vSphere clusters, and introduced two separate flags to enable logging and monitoring for user applications separately: EnableCloudLoggingForApplications and EnableGMPForApplications. The legacy flag EnableStackdriverForApplications is deprecated, and will be removed in a future release. Customers can monitor and alert on the applications using Prometheus with Google-managed Prometheus without managing and operating Prometheus. Customers can set enableGMPForApplications in the Stackdriver spec to enable Google Managed Prometheus for application metrics without any other manual steps, and the Google Managed Prometheus components are then set up automatically. See Enable Managed Service for Prometheus for user applications for details.

  • All sample dashboards to monitor cluster health are available in Cloud Monitoring sample dashboards. Customers can install the dashboards with one click. See Install sample dashboards.

  • Improvements to cluster diagnosis: The gkectl diagnose cluster command automatically runs when gkectl diagnose snapshot is run, and the output is saved in a new folder in the snapshot called /diagnose-report.

  • The gkectl diagnose cluster command surfaces more detailed information for issues arising from virtual machine creation.

  • A validation check for the existence of an OS image has been added to the gkectl update admin and gkectl diagnose cluster commands.

  • A blocking preflight check has been added. This check validates that the vCenter.datastore specified in the cluster configuration file doesn't belong to a DRS-enabled datastore cluster.

Functionality changes:

  • Upgraded COS from m93 to m97, and containerd to 1.6 on COS.

  • Metrics agent: Upgraded gke-metrics-agent from 1.1.0 to 1.8.3, which fixes some application metrics issues. The offline buffer in the metrics agent can now discard old data based on the age of metrics data, in addition to the total size of buffer. Metrics data is stored in an offline buffer for at most 22 hours in case of a network outage.

  • New metrics: Added 7 resource utilization metrics.

    • k8s_container:
      • container/cpu/request_utilization
      • container/cpu/limit_utilization
      • container/memory/request_utilization
      • container/memory/limit_utilization
    • k8s_node:
      • node/cpu/allocatable_utilization
      • node/memory/allocatable_utilization
    • k8s_pod:
      • pod/volume/utilization

Fixes

Known issues:

  • On the out-of-the-box monitoring dashboards, the GKE on-prem Windows pod status and GKE on-prem Windows node status also show data from Linux clusters.

  • The scheduler metrics, such as scheduler_pod_scheduling_attempts, are not collected in version 1.12.0 due to a configuration issue in the metric collector.

In version 1.12.0, cgroup v2 (unified) is enabled by default for Container Optimized OS (COS) nodes. This could potentially cause instability for your workloads in a COS cluster. We will switch back to cgroup v1 (hybrid) in version 1.12.1. If you are considering using version 1.12 with COS nodes, we suggest that you wait until the 1.12.1 release.

June 24, 2022

Three new memory corruption vulnerabilities (CVE-2022-29581, CVE-2022-29582, CVE-2022-1116) have been discovered in the Linux kernel. These vulnerabilities allow an unprivileged user with local access to the cluster to achieve a full container breakout to root on the node. For more information, refer to the GCP-2022-016 security bulletin.

June 16, 2022

Anthos clusters on VMware 1.10.5-gke.26 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.5-gke.26 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.11, 1.10, and 1.9.

Fixed for version 1.10.5

  • Fixed the issue where admin cluster backup did not back up always-on secrets encryption keys. This caused repairing an admin cluster using gkectl repair master --restore-from-backup to fail when always-on secrets encryption was enabled.

  • Fixed the issue of high resource usage when AIDE runs as a cron job by disabling AIDE by default. This fix will affect compliance with CIS L1 Server benchmark 1.4.2: Ensure filesystem integrity is regularly checked.

    To re-enable the AIDE cron job, see Configure AIDE cron job.

Fixed the following vulnerabilities

June 03, 2022

Cluster lifecycle improvements

GA: You can use the Cloud console to create, update, and delete Anthos on VMware user clusters. For more information, see Create a user cluster in the Cloud console.

May 26, 2022

Anthos clusters on VMware 1.11.1-gke.53 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.11.1-gke.53 runs on Kubernetes 1.22.8-gke.200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.11, 1.10, and 1.9.

Fixed for v1.11.1

  • Fixed the known issue where v1.11.0 user clusters cannot be created with a v1.10.x admin cluster.

  • Fixed the issue where the gkectl logs might be truncated when admin cluster creation has failed.

  • Fixed the issue that Anthos Identity Service with LDAP failed to authenticate against some older Active Directory servers when the user id contains a comma.

Fixed the following vulnerabilities

High-severity CVEs

Medium-severity CVEs

Anthos clusters on VMware 1.10.4-gke.32 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.4-gke.32 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.11, 1.10, and 1.9.

Fixed for v1.10.4

Fixed the following vulnerabilities

High-severity CVEs

RBAC fixes

  • anetd

    • Changed to use kubelet kubeconfig to only allow the anetd to update its own node resource, and the pod resources that are running on the node.
  • antrea-controller / anetd-win

    • Instead of reusing the RBAC config for anetd, created a dedicated RBAC config for antrea and reduced the unnecessary permissions.
  • clusterdns-controller

    • Scoped down clusterdns permissions to default resource name.
    • Scoped down configmap permissions to coredns resource name.
    • Removed create/delete permissions for configmaps. The coredns configmap is now created by the bundle, with create-only annotation to ensure we don't overwrite existing config on upgrade.
  • dns-autoscaler

    • Removed unneeded permissions, and scoped down needed permissions to a particular resource using resourceNames.
    • Restricted get configmap for dns autoscaler.
  • gke-usage-metering

    • Restricted the permission to the kube-system namespace where possible
  • seesaw-load-balancer

    • Restricted the permission by setting resource names.

May 19, 2022

Anthos clusters on VMware 1.9.6-gke.1 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.6-gke.1 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.11, 1.10, and 1.9.

Secret encryption key rotation does not fail when the cluster has more than 1000 secrets.

Fixed the following vulnerabilities

Changed scope of certain RBAC permissions

We have scoped down the over-privileged RBAC permissions for the following components in this release:

  • clusterdns-controller:

    • Scope down clusterdns permissions to 'default' resource name.
    • Scope down configmap permissions to 'coredns' resource name.
    • Remove create/delete permissions for configmaps.
  • seesaw-load-balancer:

    • Restrict the permission to access secrets by specifying certain secret names instead of allowing the access for all secrets.
  • coredns-autoscaler:

    • Reduce the get configmap permission to a specific configmap resource name.
  • anetd / anet-operator:

    • Changed to use kubelet kubeconfig to restrict the anetd to only update its own node resource, and the pod resources that are running on the node.
  • gke-usage-metering:

    • Restrict the permission to only kube-system namespace.
  • ANG (Anthos Network Gateway)

    • Remove/modify RBAC roles and lower the use of kube-rbac proxy in ANG.

May 02, 2022

Creating a 1.11.0 user cluster with a 1.10 admin cluster fails. If you need a 1.11.0 user cluster, use the following workaround:

  1. Create a 1.10 user cluster.

  2. Upgrade the user cluster to 1.11.0.

  3. Optionally, upgrade the admin cluster to 1.11.0. After the admin cluster is upgraded, you can create 1.11.0 user clusters.

For details on how to upgrade, see Upgrading Anthos clusters on VMware.

April 28, 2022

Two security vulnerabilities, CVE-2022-1055 and CVE-2022-27666, have been discovered in the Linux kernel. Each can lead to a local attacker being able to perform a container breakout, privilege escalation on the host, or both. These vulnerabilities affect all Linux node operating systems (Container-Optimized OS and Ubuntu). For instructions and more details, see the GCP-2022-014 security bulletin.

April 27, 2022

Anthos clusters on VMware 1.11.0-gke.543 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.11.0-gke.543 runs on Kubernetes v1.22.8-gke.200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.11, 1.10, and 1.9.

  • The structure of the Anthos clusters on VMware documentation is substantially different from previous versions. For details, see New documentation structure.

  • Dockershim, the Docker Engine integration code in Kubernetes, was deprecated in Kubernetes 1.20, and will be removed in Kubernetes 1.24. Thus, the ubuntu OS node image type will not be supported at that time. You should plan to convert your node pools to use either the ubuntu_containerd or the cos OS image type as soon as possible. For more details, see Using containerd for the container runtime.

  • The connect project is now called fleet host project. For more information, see Fleet host project.

  • Kubernetes 1.22 has deprecated certain APIs, a list of which can be found in Kubernetes 1.22 deprecated APIs. In your manifests and API clients, you need to replace references to the deprecated APIs with references to the newer API calls. For more information, see the What to do section in the Deprecated API Migration Guide.

  • Several Anthos metrics have been deprecated for which data is no longer collected. For a list of deprecated metrics, including instructions to migrate to replacement metrics, see Replace deprecated metrics in dashboard.

Cluster lifecycle Improvements:

  • Admin cluster creation is now resumable. If admin cluster creation fails at any step, you can now rerun gkectl create admin to resume the admin cluster creation.

Platform enhancements:

  • Windows Node Pool:

    • GA: Support for Windows Dataplane V2 is generally available. Windows Dataplane V2 is now enabled by default for Windows node pools. This means that containerd is also enabled by default for Windows node pools.
    • Added deprecation notice for Windows nodes that Docker and Flannel will be removed in a subsequent version. If you are using Docker container runtime, you should update your user cluster configuration with gkectl update cluster to use containerd and Windows Dataplane V2 instead.
    • Added support for idempotent Windows startup script execution after node reboot.
    • New Windows Server 2019 OS build version 10.0.17763.2565 has been qualified for Anthos 1.11.0.
  • Egress NAT Gateway:

    • GA: Egress NAT Gateway is now generally available. With this feature, you can configure source network address translation (SNAT) so that certain egress traffic from user clusters is given a predictable source IP address. This enables return traffic from workloads outside the originating cluster to reach the cluster. For more information, see Configuring an egress NAT gateway.
  • MetalLB:

    • GA: The new load balancer option, MetalLB, is now generally available as another bundled software load balancer in addition to Seesaw.
  • Multinic logs:

    • The Fluent Bit Logging agent can now collect logs for Pods with multiple network interfaces, and send them to Cloud Logging. Logs will be collected as system logs and no extra charges will apply.

Security enhancements: - Admin cluster CA Certificate Rotation:

  • GA: You can now use gkectl to rotate system root CA certificates for admin clusters.

Simplify day-2 operations:

  • GA: gkectl update admin supports registering an existing admin cluster.
  • Cluster diagnosis improvements:
    • gkectl diagnose cluster automatically runs during admin or user cluster upgrade failure.
    • gkectl diagnose cluster searches and surfaces related events for any validation failure.
  • GA: gkectl update supports enabling and disabling of Cloud Logging and Cloud Monitoring in an existing cluster. You can also enable or disable logging to Cloud Audit Logs with gkectl update on both admin and user clusters.
  • Changes made to the metrics-server-config ConfigMap are now preserved across cluster upgrades.

Terminology changes:

The connect project is now called fleet host project. For more information, see Fleet host project.

We have removed the over-privileged RBAC permissions for the following components.

RBAC policies applied to service account on the admin cluster

When you register a 1.11.0+ admin cluster to a fleet, a service account is created with the needed role-based access control (RBAC) policies that lets the Connect agent send requests to the admin cluster's Kubernetes API server on behalf of the service account. The service account and RBAC policies are needed so that you can manage the lifecycle of your user clusters in the Google Cloud console. For more information, see Admin cluster RBAC policies.

April 18, 2022

Anthos clusters on VMware 1.10.3-gke.49 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.3-gke.49 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

  • Fixed issue where scale down sometimes took longer than expected when cluster autoscaling is enabled in a Dataplane-v2 cluster.
  • Fixed issue where the state of an admin cluster that uses a COS image is lost during an admin cluster upgrade or admin cluster control plane repair.
  • Added keep-alive configuration to avoid timeout issues for long running vSphere operations in gkeadm.
  • RBAC fixes:

    • coredns-autoscaler:
    • Removed configmaps create permission.
    • Removed replicasets/scale permissions.
    • Removed replicationcontrollers/scale permissions.
    • Scoped down deployments/scale permissions to coredns resource name.

    • clusterdns-controller:

      • Scoped down clusterdns permissions to default resource name.
      • Scoped down configmap permissions to coredns resource name.
      • Removed create/delete permissions for configmaps. The coredns configmap is now created by the bundle, with create-only annotation to ensure we don't overwrite existing config on upgrade.
    • auto-resize controller:

    • Scoped down leases permissions to onprem-auto-resize-leader-election resource name.

    • Scoped down configmaps permissions to onprem-auto-resize-leader-election resource name.

    • load-balancer-f5:

    • Removed get list watch create patch delete permissions for configmaps.

    • Removed update create patch for events nodes.

    • Removed create permissions for services/status and services.

    • Removed view permission for secret bigip-login-9t8mzp.

  • Fixed high-severity CVEs:

April 12, 2022

A security vulnerability, CVE-2022-23648, has been discovered in containerd's handling of path traversal in the OCI image volume specification. Containers launched through containerd's CRI implementation with a specially-crafted image configuration could gain full read access to arbitrary files and directories on the host.

For more information, see the GCP-2022-013 security bulletin.

April 11, 2022

A security vulnerability, CVE-2022-0847, has been discovered in the Linux kernel version 5.8 and later that can potentially escalate container privileges to root.

For more information, see the GCP-2022-012 security bulletin.

March 24, 2022

Anthos clusters on VMware 1.9.5-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.5-gke.2 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

March 15, 2022

Anthos clusters on VMware 1.8.8-gke.1 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.8-gke.1 runs on Kubernetes v1.20.12-gke.1500.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

  • Clusters with enableDataplaneV2 set to true can experience connectivity issues between Pods due to anetd daemons (running as a Daemonset) entering a software deadlock. While in this state, anetd daemons will see stale nodes (previously deleted nodes) as peers and miss newly added nodes as new peers. If you have experienced this issue, follow these instructions to restart the anetd daemons and restore connectivity.

March 03, 2022

Anthos clusters on VMware 1.10.2-gke.34 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.2-gke.34 runs on Kubernetes 1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

Changes

  • gkectl diagnose now reports a broken cluster caused by an admin cluster registration error during creation.

Fixes

  • Fixed issue: Failure to register admin cluster during creation

    • You can upgrade an admin cluster to version 1.10.2 without applying the documented mitigation, even if the cluster failed to register with the provided gkeConnect configuration during its creation. You can fix the registration issue by running gkectl update admin with the correct gkeConnect configuration after upgrade.
    • If the cluster registration failed when creating a version 1.10.2 admin cluster, no mitigation is needed to upgrade to later versions after version 1.10.2.
  • Fixed ".local" DNS lookup issue caused by Ubuntu 20.04 systemd-resolved configuration changes.

  • Fixed issue where Docker bridge IP incorrectly used 172.17.0.1/16 instead of 169.254.123.1/24.

  • Fixed unexpectedly high network traffic to monitoring.googleapis.com in a newly created cluster.

  • Fixed an issue that admin cluster creation or upgrade might be interrupted by temporary vCenter connection issue.

  • Fixed critical CVEs:

  • Fixed this high-severity CVE:

When cluster autoscaling is enabled in a Dataplane-v2 cluster, scale down may sometimes take longer than expected. For example, it may take approximately 20 minutes instead of 10 minutes as in a normal case.

February 24, 2022

The Envoy project recently discovered a set of vulnerabilities. All issues listed in the security bulletin are fixed in Envoy release 1.21.1. For more information, see the GCP-2022-008 security bulletin.

February 23, 2022

Anthos clusters on VMware 1.9.4-gke.3 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.4-gke.3 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

Fixes

  • Upgraded Cilium to version 1.10.5.

    • This upgrade also fixed the issue where unreachable node endpoints caused application 503 errors. Previously, when cilium-health status was run in anetd daemons, the output showed stale remote nodes.
  • Fixed unexpectedly high network traffic to monitoring.googleapis.com in a newly created cluster.

  • Fixed these high-severity CVEs:

When cluster autoscaling is enabled in a Dataplane-v2 cluster, scale down may sometimes take longer. For example, it may take approximately 20 minutes instead of 10 minutes as in a normal case.

February 17, 2022

Anthos clusters on VMware 1.8.7-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.7-gke.0 runs on Kubernetes v1.20.12-gke.1500.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

Fixes:

February 14, 2022

A security vulnerability, CVE-2022-0492, has been discovered in the Linux kernel's cgroup_release_agent_write function. The attack uses unprivileged user namespaces, and under certain circumstances, this vulnerability can be exploitable for container breakout. For more information, see the GCP-2022-006 security bulletin.

February 11, 2022

A security vulnerability, CVE-2021-43527, has been discovered in any binary that links to the vulnerable versions of libnss3 found in NSS (Network Security Services) versions prior to 3.73 or 3.68.1. Applications using NSS for certificate validation or other TLS, X.509, OCSP or CRL functionality may be impacted, depending on how they configure NSS.

For more information, see the GCP-2022-005 security bulletin.

February 10, 2022

Anthos clusters on VMware 1.10.1-gke.19 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.1-gke.19 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

  • Removed unintentional infrastructure log lines from the cluster snapshot.
  • Upgraded the Connect Agent version to 20211210-01-00.

    • This upgrade also fixed the issue where the Connect Agent restarts unexpectedly on either a newly-created cluster or an existing cluster that uses Anthos Identity Service to manage the Anthos Identity Service ClientConfig.
  • Fixed two high severity CVEs:

  • Fixed the short metric probing interval issue that sends a high volume of traffic to the monitoring.googleapis.com endpoint in a cluster.

  • If your admin cluster failed to register with the provided gkeConnect spec during creation, upgrading to a later 1.9 or 1.10 release will fail with the following error:

    failed to migrate to first admin trust chain: failed to parse current version "": invalid version: "" failed to migrate to first admin trust chain: failed to parse current version "": invalid version: ""

    If you have experienced this issue, follow these instructions to fix the gkeConnect registration issue before you upgrade your admin cluster.

February 07, 2022

A security vulnerability, CVE-2021-4034, has been discovered in pkexec, a part of the Linux policy kit package (polkit), that allows an authenticated user to perform a privilege escalation attack. PolicyKit is generally used only on Linux desktop systems to allow non-root users to perform actions such as rebooting the system, installing packages, restarting services, and so forth, as governed by a policy.

For instructions and more details, see the GCP-2022-004 security bulletin.

February 01, 2022

Three security vulnerabilities, CVE-2021-4154, CVE-2021-22600, and CVE-2022-0185, have been discovered in the Linux kernel, each of which can lead to either a container breakout, privilege escalation on the host, or both. These vulnerabilities affect all Linux node operating systems (COS and Ubuntu) on Anthos clusters on VMware.

For instructions and more details, see the GCP-2022-02 security bulletin.

January 24, 2022

Anthos clusters on VMware 1.9.3-gke.4 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.3-gke.4 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

Fixes for version 1.9.3:

  • Fixed issue where special characters in the vSphere username are not properly escaped.

Changes in version 1.9.3:

  • Upgraded the Connect Agent version to 20211210-01-00.

    • This upgrade also fixed the issue where the Connect Agent restarts unexpectedly on a newly-created cluster that uses Anthos Identity Service to manage the Anthos Identity Service ClientConfig.

Known issue in version 1.9.3:

  • The Connect Agent restarts unexpectedly on an existing cluster that uses Anthos Identity Service to manage the Anthos Identity Service ClientConfig. If you have experienced this issue, follow these instructions to upgrade the Connect Agent version.

Anthos clusters on VMware 1.8.6-gke.4 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.6-gke.4 runs on Kubernetes 1.20.12-gke.1500.

Fixes for version 1.8.6:

  • Fixed issue where special characters in the vSphere username are not properly escaped.

December 23, 2021

  • When deploying Anthos clusters on VMware releases with a version number of 1.9.0 or higher, that have the Seesaw bundled load balancer in an environment that uses NSX-T stateful distributed firewall rules, stackdriver-operator might fail to create gke-metrics-agent-conf ConfigMap and cause gke-connect-agent Pods to be in a crash loop. The underlying issue is that stateful NSX-T distributed firewall rules terminate the connection from a client to the user cluster API server through the Seesaw load balancer because Seesaw uses asymmetric connection flows. The integration issue with NSX-T distributed firewall rules affect all Anthos clusters on VMWare releases that use Seesaw. You might see similar connection problems on your own applications when they create large Kubernetes objects whose sizes are bigger than 32K. Follow these instructions to disable NSX-T distributed firewall rules, or to use stateless distributed firewall rules for Seesaw VMs.

  • If your clusters use a manual load balancer, follow these instructions to configure your load balancer to reset client connections when it detects a backend node failure. Without this configuration, clients of the Kubernetes API server might stop responding for several minutes when a server instance goes down.

December 22, 2021

Anthos clusters on VMware 1.10.0-gke.194 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.10.0-gke.194 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.10, 1.9, and 1.8.

  • vCenter/ESXi host 6.7u2 and below is no longer supported. Upgrade your vCenter environment to a supported version (6.7U3 and above) before upgrading your clusters.

  • The diskformat parameter is removed from the standard vSphere driver StorageClass as the parameter has been deprecated in Kubernetes 1.21.

  • Preview: Egress NAT gateway:

    • To enable an egress NAT gateway, the advancedNetworking section in the user cluster configuration file replaces the now-deprecated enableAnthosNetworkGateway section.

    • You must create a NetworkGatewayGroup object (previously AnthosNetworkGateway) to configure the egress NAT gateway.

    • Any admin or user clusters that are version 1.9 or earlier, and that are enabled with Anthos Network Gateway, cannot be upgraded. You must delete and recreate those clusters following these instructions.

Cluster lifecycle Improvements:

  • An admin cluster upgrade is resumable after a previous failed admin cluster upgrade attempt.

  • GA: Admin cluster registration during new cluster creation is generally available.

  • Preview: Admin cluster registration when updating existing clusters is available as a preview feature.

Platform enhancements:

  • Preview: A new load balancer option, MetalLB, is available as another bundled software load balancer in addition to Seesaw.This will be the default load balancer choice instead of Seesaw when GA.

  • GA: Support for user cluster node pool autoscaling is generally available.

  • Preview: You can create admin cluster nodes and user cluster control-plane nodes with Container-Optimized OS by specifying the osImageType as cos in the admin cluster configuration file.

  • Windows Node Pool:

    • Preview: The containerd runtime is now available for Windows node pools when Dataplane V2 for Windows is enabled.
    • Node Problem Detector checks containerd service health on the nodes and surfaces problems to the API Server. For version 1.10.0, NPD does not attempt to repair the containerd service.
    • Containerd logs are exported to the Cloud Console.

    • CSI proxy is deployed automatically onto Windows nodes. You can install and use a Windows CSI driver of your choice, such as the SMB CSI driver.

  • GA: The multi-NIC capability to provide additional network interfaces to your Pods is generally available.

  • GA: You can upgrade to Ubuntu 20.04 and containerd 1.5.

Security enhancements:

  • User cluster control plane certificates are automatically rotated at each cluster upgrade. 

Simplify day-2 operations:

  • Preview: gkectl update admin supports the enabling and disabling of Cloud Monitoring and Cloud Logging in the admin cluster. 

  • Changed the collection of application metrics to use a more scalable monitoring pipeline based on OpenTelemetry. This change significantly reduces the amount of resources required to collect metrics.

  • Updated the parser of containerd and kubelet node logs to extract severity level.

  • Introduced the --share-with optional flag in the gkectl diagnose snapshot command to share the read permission after uploading the snapshot to a Google Cloud Storage bucket.

Functionality changes:

  • Replaced the SSH tunnel with Konnectivity service for communication between the user cluster control plane and the user cluster nodes. The Kubernetes SSH tunnel has been deprecated. 

    • You must create two additional firewall rules so that user worker nodes can access ports 8132 on the user control-plane VIP address and get return packets. This is required for the Konnectivity service.

    • Introduced a new konnectivityServerNodePort field in the user cluster manual load balancer configuration. This field is required when creating or upgrading a user cluster, with manual load balancer mode, to version 1.10. 

  • The Ubuntu OS image is upgraded from 18.04 to 20.04 LTS.

    • The python command is no longer available. Any python command should be updated to python3 instead, and the syntax should be updated to Python 3.

    • /etc/resolv.conf now points to /run/systemd/resolve/stub-resolv.conf, instead of /run/systemd/resolve/resolv.conf.

    • The Ubuntu CIS benchmark version changed from v2.0.1 for Ubuntu 18.04 LTS to v1.0.0 for Ubuntu 20.04 LTS.

  • Upgraded COS from m89 to m93.

  • Upgraded containerd from 1.4 to 1.5 on Ubuntu and COS.

  • Changed gkectl diagnose snapshot to use the --all-with-logs scenario by default.

  • The gkeadm command copies the admin workstation configuration file to the admin workstation during creation so it can be used as a backup to re-create the admin workstation later.

  • Increased the Pod priority of kube-state-metrics to improve its reliability when the cluster is under resource contention.

  • Fixed an issue that the Windows nodes were assigned with duplicated IP addresses.

  • Fixed CVE-2021-32760. Because of Ubuntu PPA version pinning, this vulnerability might still be reported by certain vulnerability scanning tools, and thus appear as a false positive even though the underlying vulnerability has been patched.

  • Because of the change to use an OpenTelemetry-based scalable monitoring pipeline for application metrics, Horizontal Pod Autoscaling with user-defined metrics does not work in 1.10.0 unless you explicitly set scalableMonitoring to false, while also ensuring that both enableStackdriverForApplications and enableCustomMetricsAdapter are set to true, in the Stackdriver object.

    As a workaround, you can install a custom Prometheus adapter if you want to use Horizontal Pod Autoscaling with user-defined metrics while still keeping the scalable monitoring default setting for application metrics.

  • Because of a COS 93 configuration issue, IPv6 dualstack does not work correctly for COS node pool nodes in version 1.10.0. If you are using IPv6 dualstack with a COS node pool, wait for an upcoming patch release that addresses this issue.

  • If an admin cluster is created with osImagetype of cos, and you have rotated the audit logging service account key with gkectl update admin, the changes are overridden after the admin cluster control-plane node reboot. In that case, re-run the update command after the admin cluster control-plane node reboot to apply those changes.

  • On COS nodes, the NTP server is configured to time.google.com by default. In DHCP mode, this setting cannot be overridden to use the NTP server provided by your DHCP server. The issue will be fixed in an upcoming patch release. Before then, you can deploy a DaemonSet to override the NTP setting if you want to use a different NTP server in your COS node pool.

November 30, 2021

Anthos clusters on VMware 1.7.6-gke.6 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.6-gke.6 runs on Kubernetes v1.19.15-gke.1900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

  • Fixed issue where special characters in the vSphere username are not properly escaped.
  • Alleviated the high CPU and memory usage by /etc/cron.daily/aide discussed in this issue.
  • Fixed issue where user cluster node is not synching time.
  • Fixed CVE-2021-41103. Because of Ubuntu PPA version pinning, this vulnerability might still be reported by certain vulnerability scanning tools, and appear as a false positive even though the underlying vulnerability has been patched.

November 29, 2021

Anthos clusters on VMware 1.8.5-gke.3 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.5-gke.3 runs on Kubernetes v1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

  • Fixed issue where special characters in the vSphere username are not properly escaped.
  • Alleviated the high CPU and memory usage by /etc/cron.daily/aide discussed in this issue.
  • Fixed issue where user cluster node is not synching time.
  • Fixed CVE-2021-41103. Because of Ubuntu PPA version pinning, this vulnerability might still be reported by certain vulnerability scanning tools, and appear as a false positive even though the underlying vulnerability has been patched.

November 18, 2021

Anthos clusters on VMware 1.9.2-gke.4 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.2-gke.4 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

With version 1.9.2, cert-manager is installed in the cert-manager namespace. Previously, for versions 1.8.2 to 1.9.1, cert-manager was installed in the kube-system namespace.

The cert-manager version is upgraded from 1.0.3 to 1.5.4.

If you already use any ClusterIssuer with a different cluster resource namespace from the default cert-manager namespace, follow these steps if you upgrade to version 1.9.2.

   * Manually copy the related certificates, secrets, or issuers to the cert-manager namespace to use the installed cert-manager after upgrading to 1.9.2.    

   * If you need to use a different version of cert-manager, or if you need to install it in a different namespace, follow these instructions each time that you upgrade your cluster. 

Fixes:

  • Fixed issue with cilium-operator not reconciling CiliumNode for Windows nodes when updating the cluster to add Windows node pools.
  • Fixed issue which could temporarily result in no healthy CoreDNS pods present during cluster operations.
  • Fixed issue where you cannot run gkectl upgrade loadbalancer on a user cluster seesaw load balancer.
  • Fixed issue where node_filesystem metrics report gives wrong size for /run.
  • Fixed CVE-2021-37159. Because of Ubuntu PPA version pinning, this vulnerability might still be reported as a false positive by certain vulnerability scanning tools, although the underlying vulnerability has been patched in the 1.9.2 release.
  • Fixed issue where user cluster node is not synching time.
  • Alleviated the high CPU and memory usage by /etc/cron.daily/aide discussed in this issue.

October 29, 2021

The security community recently disclosed a new security vulnerability CVE-2021-30465 found in runc that has the potential to allow full access to a node filesystem.

For more information, see the GCP-2021-011 security bulletin.

October 27, 2021

Anthos clusters on VMware 1.8.4-gke.1 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.4-gke.1 runs on Kubernetes v1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

Fixes for version 1.8.4:

  • Fixed high-severity CVE-2021-3711.
  • Fixed gkectl check-config failure when Anthos clusters are configured with a proxy whose url contains special characters.
  • Fixed "cert-manager" cainjector leader-election failure.

Known issue in version 1.8.4:

If you have already installed your own cert-manager in your cluster, read the suggested mitigation before upgrading to a version >=1.8.2 in order to avoid an installation conflict with the cert-manager deployed by Anthos clusters on VMware.

  • Installing your cert-manager with Apigee may also result in a conflict with the cert-manager deployed by Anthos clusters on VMware. To avoid this, read the suggested mitigation before upgrading to this version.

Anthos clusters on VMware 1.7.5-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.5-gke.0 runs on Kubernetes v1.19.12-gke.2101.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

Fixes for version 1.7.5:

Fixed gkectl check-config failure when Anthos clusters are configured with a proxy whose url contains special characters.

October 21, 2021

A security issue was discovered in the Kubernetes ingress-nginx controller, CVE-2021-25742. Ingress-nginx custom snippets allow retrieval of ingress-nginx service account tokens and secrets across all namespaces. For more information, see the GCP-2021-024 security bulletin.

October 20, 2021

Anthos clusters on VMware 1.9.1-gke.6 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.1-gke.6 runs on Kubernetes v1.21.5-gke.400.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

  • In version 1.9.0, there was a known issue with restoring an admin cluster using a backup when using a private registry. That has been fixed in version 1.9.1.
  • Fixed gkectl check-config failure that occurs when Anthos clusters are configured with a proxy whose url contains special characters.
  • Fixed "cert-manager" cainjector leader-election failure.

If you have already installed your own cert-manager in your cluster, read the suggested mitigation before upgrading to a version >=1.8.2 in order to avoid an installation conflict with the cert-manager deployed by Anthos clusters on VMware.

  • Installing your cert-manager with Apigee may also result in a conflict with the cert-manager deployed by Anthos clusters on VMware. To avoid this, read the suggested mitigation before upgrading to this version.

October 04, 2021

A security vulnerability, CVE-2020-8561, has been discovered in Kubernetes where certain webhooks can be made to redirect kube-apiserver requests to private networks of that API server. For more information, see the GCP-2021-021 security bulletin.

September 29, 2021

Anthos clusters on VMware 1.9.0-gke.8 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.0-gke.8 runs on Kubernetes v1.21.4-gke.200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

Features:

Cluster lifecycle Improvements:

  • GA: You can register an admin cluster during its creation by filling in the gkeConnect section in the admin cluster configuration file, similar to user cluster registration.

Platform enhancements:

  • Preview: User clusters can now be in a different vSphere datacenter from the admin cluster, resulting in datacenter isolation between the admin cluster and user clusters. This provides greater resiliency in the case of vSphere environment failures.

  • GA: Support for Windows node pools is generally available.This release adds:

    • Preview: Windows DataplaneV2 support, which allows for using Windows Network Policy
    • Node Problem Detector (NPD) support on Windows
    • Streamlined process for preparing Windows images in a private registry
    • Enhanced Flannel CNI support on Windows

    The upstream fixes for the "Windows Pod stuck at terminating status" error are also applied to this release, which improves the stability of running Windows workloads.

  • GA: Support for Container-Optimized OS (COS) node pools is generally available.

  • GA: CoreDNS is now the cluster DNS provider.

    • Clusters that are upgraded to 1.9 will have their KubeDNS provider replaced with CoreDNS. During the upgrade, CoreDNS is first deployed and then KubeDNS is removed, so applications should not observe DNS unavailability. However before upgrading, ensure that your cluster has enough additional resources to deploy CoreDNS. CoreDNS requires 100 millicpu and 170 MiB of memory per instance, all clusters require a minimum of 2 instances, and there is an additional instance deployed for every 16 nodes in the cluster.
    • You can configure cluster DNS options such as upstream name servers by using the new ClusterDNS custom resource.

Security enhancements:

  • GA: Always-on secrets encryption: You can enable secrets encryption with internally generated keys instead of a hardware security module (HSM). Use the gkectl update command to rotate these keys or to enable or disable secrets encryption after cluster creation.
  • Preview: Windows network policy support. This release introduces a new network plugin, Antrea, for Windows nodes. In addition to network connectivity and services support, it provides network policy support. When creating a user cluster, you can set enableWindowsDataplaneV2 to true to enable this feature. Enabling this feature replaces Flannel with Antrea on Windows nodes.
  • Preview: Azure AD group support for Authentication: This feature allows cluster admins to configure RBAC policies based on Azure AD groups for authorization in clusters. This supports retrieval of groups information for users belonging to more than 200 groups, thus overcoming a limitation of regular OIDC configured with Azure AD as the identity provider.

Simplify day-2 operations:

  • Preview: When creating a user cluster, you can set enableVMTracking in the configuration file to true to enable vSphere tag creation and attachment to the VMs in the user cluster. This allows easy mapping of VMs to clusters and node pools. See Enable VM tracking.
  • GA: New metrics agents based on open telemetry are introduced to improve reliability, scalability and resource usage.
  • Preview: You can enable or disable Stackdriver with gkectl update on existing user clusters. You can enable or disable cloud audit logging and monitoring with gkectl update on both admin and user clusters.

Breaking changes:

  • User cluster registration is now required and enforced. You must fill in the gkeConnect section of the user cluster configuration file before creating a new user cluster. You cannot upgrade a user cluster unless that cluster is registered. To unblock the cluster upgrade, add the gkeConnect section to the configuration file and run gkectl update cluster to register an existing 1.8 user cluster.

  • User clusters must be upgraded before the admin cluster. The flag --force-upgrade-admin to allow the old upgrade flow (admin cluster upgrade first) is no longer supported.

  • The following requirements are now enforced when you create a cluster that has logging and monitoring enabled.

    • The Config Monitoring for Ops API is enabled in your logging-monitoring project.
    • The Ops Config Monitoring Resource Metadata Writer role is granted to your logging-monitoring service account.
    • The URL opsconfigmonitoring.googleapis.com is added to your proxy allowlist (if applicable).

Changes:

  • There is now a checkpoint file for the admin cluster, located in the same datastore folder as the admin cluster data disk, with the name DATA_DISK_NAME-checkpoint.yaml, or DATA_DISK_NAME.yaml if the length of DATA_DISK_NAME is greater than the filename length limit. This file is required for future upgrades and should be considered as important as the admin cluster data disk.

    Note: If you have enabled VM encryption in vCenter, you must grant Cryptographer.Access permission to the vCenter credentials specified in your admin cluster configuration file, before trying to create or upgrade your admin cluster.

  • The admin cluster backup with gkectl preview feature introduced in 1.8 now allows updates to clusterBackup.datastore. This datastore may be different from vCenter.datastore so long as it is in the same datacenter as the cluster.

  • The k8s 1.21 release includes the following metrics changes:

    • Add new field status for storage_operation_duration_seconds, so that you can know about all status storage operation latency.
    • The storage metrics storage_operation_errors_total and storage_operation_status_count are marked deprecated. In both cases, the storage_operation_duration_seconds metric can be used to recover equivalent counts (using status=fail-unknown in the case of storage_operations_errors_total).

    • Rename the metric etcd_object_counts to apiserver_storage_object_counts and mark it as stable. The original etcd_object_counts metrics name is marked as "Deprecated" and will be removed in the future.

  • A new GKE on-prem control plane uptime dashboard is introduced with a new metric, kubernetes.io/anthos/container/uptime, for component availability. The old GKE on-prem control plane status dashboard and old kubernetes.io/anthos/up metric are deprecated. New alerts for admin cluster control plane components availability and user cluster control plane components availability are introduced with a new kubernetes.io/anthos/container/uptime metric to replace deprecated alerts and the old kubernetes.io/anthos/up metric.

  • You can now skip certain health checks performed by gkectl diagnose cluster with the –skip-validation-xxx flag.

Fixes:

  • Fixed the issue of gkeadm trying to set permissions for the component access service account when --auto-create-service-accounts=false.
  • Fixed the timeout issue for admin cluster creation or upgrade that was caused by high network latency to reach the container registry.
  • Fixed the gkectl create-config admin and gkectl create-config cluster panic issue in the 1.8.0-1.8.3 releases.
  • Fixed the /run/aide disk usage issue that was caused by the accumulated cron log for aide.

Restoring an admin cluster from a backup using gkectl repair admin-master –restore-from-backup fails when using a private registry. The issue will be resolved in a future release.

September 23, 2021

Anthos clusters on VMware 1.7.4-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.4-gke.2 runs on Kubernetes v1.19.12-gke.2101.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • Fixed high-severity CVE-2021-3711.
  • Fixed CVE-2021-25741 mentioned in the GCP-2021-018 security bulletin.
  • Fixed the Istio security vulnerabilities listed in the GCP-2021-016 security bulletin.
  • Fixed the issue that gkeadm tries to set permissions for the component access service account when --auto-create-service-accounts=false.

September 21, 2021

Anthos clusters on VMware 1.8.3-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.3-gke.0 runs on Kubernetes v1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • Fixed high-severity CVE-2021-3711.
  • Fixed CVE-2021-25741 mentioned in the GCP-2021-018 security bulletin.
  • Fixed the Istio security vulnerabilities listed in the GCP-2021-016 security bulletin.
  • Fixed the issue that gkeadm tries to set permissions for the component access service account when --auto-create-service-accounts=false.

In versions 1.8.0-1.8.3, the gkectl create-config admin/cluster command panics with the message panic: invalid version: "latest". As a workaround, use gkectl create-config admin/cluster --gke-on-prem-version=$DESIRED_CLUSTER_VERSION. Replace DESIRED_CLUSTER_VERSION with the desired version.

September 17, 2021

A security issue was discovered in Kubernetes, CVE-2021-25741, where a user may be able to create a container with subpath volume mounts to access files and directories outside of the volume, including on the host filesystem. For more information, see the GCP-2021-018 security bulletin.

September 16, 2021

Anthos clusters on VMware 1.6.5-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.5-gke.0 runs on Kubernetes 1.18.20-gke.4501.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

September 03, 2021

Anthos clusters on VMware 1.7.3-gke.6 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.3-gke.X runs on Kubernetes v1.19.12-gke.1100

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • Fixed the Ubuntu user password expiration issue. This is a required fix for customers running 1.7.2 or 1.7.3-gke.2. Either use the suggested workaround to fix this issue, or upgrade to get this fix.

  • Fixed the issue that the stackdriver-log-forwarder pod was sometimes in crashloop because of fluent-bit segfault.

August 31, 2021

Anthos clusters on VMware 1.8.2-gke.11 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.2-gke.11 runs on Kubernetes 1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Starting from version 1.8.2, Anthos clusters on VMware uses cert-manager instead of Istio Citadel for issuing TLS certificates used by metrics endpoints.

Fixes:

  • Fixed the Ubuntu user password expiration issue. You must get this fix. Either use the suggested workaround to fix this issue, or upgrade to get this fix.
  • Enhanced the admin cluster upgrade logic to prevent the admin cluster state (that is, the admin master data disk) from being lost in those cases when the disk is renamed or migrated accidentally.
  • Fixed the issue that the GKE connect-register service account key is printed in the klog in 1.8.0 and 1.8.1 when users run gkectl update cluster to update the GKE connect spec, such as to register an existing user cluster.
  • Fixed issue that when ESXi hosts were unavailable in the vCenter cluster (such as when disconnected from vCenter or in maintenance mode), the Cluster API controller and cluster health controllers would crash loop, and the gkectl diagnose cluster command would crash.
  • Fixed the issue that an admin cluster upgrade might be blocked indefinitely if admin node machines are upgraded before the new Cluster API controller is ready.
  • Fixed the issue that the onprem-user-cluster-controller might leak vCenter sessions over time.

  • Fixed the issue that the gateway IP was assigned to a Windows Pod, which made it unable to have network connectivity.

  • Fixed CVE-2021-33909 and CVE-2021-33910 on Ubuntu and COS.

HPA with custom metrics doesn't work in version 1.8.2 due to the migration from Istio to cert-manager for the monitoring pipeline. Customers using the HPA custom metrics with the monitoring pipeline should wait for a future release that will include this fix.

August 09, 2021

Anthos clusters on VMware 1.7.3-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.3-gke.2 runs on Kubernetes 1.19.12-gke.1100.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • These security vulnerabilities have been fixed: CVE-2021-3520, CVE-2021-33909, and CVE-2021-33910.

  • Fixed the issue that the /etc/cron.daily/aide` script uses up all existing space in /run, causing a crash loop in Pods.

  • Fixed the issue that admin cluster upgrade may fail due to an expired front-proxy-client certificate on the admin cluster control plane node.

August 05, 2021

Anthos clusters on VMware 1.6.4-gke.7 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.4-gke.7 runs on Kubernetes 1.18.20-gke.2900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • These security vulnerabilities have been fixed: CVE-2021-3520, CVE-2021-33909, and CVE-2021-33910.

  • Fixed the issue that admin cluster upgrade may fail due to an expired front-proxy-client certificate on the admin cluster control plane node.

July 22, 2021

Anthos clusters on VMware 1.8.1-gke.7 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.1-gke.7 runs on Kubernetes v1.20.8-gke.1500.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • The issue that the etc/cron.daily/aide script uses up all existing space in /run, causing a crashloop in Pods, has been fixed. The files located under /run/aide/ will be cleaned up periodically.
  • If you use the gkectl upgrade loadbalancer to attempt to update some parameters of the Seesaw load balancer in version 1.8.0, this will not work in either DHCP or IPAM mode. If your setup includes this configuration, do not upgrade to version 1.8.0, but instead to version 1.8.1 or later. If you are already at version 1.8.0, you can upgrade to 1.8.1 first before updating any parameters. See Upgrading Seesaw load balancer with version 1.8.0.
  • For Windows nodes, fixed an issue by adding a step for automatically detecting the network interface name instead of hard-coding it, since this name might be different depending on the network adapter being used in the base VM template.
  • Fixed an issue for building a Windows VM template that avoids retrying the VM shutdown in the gkectl prepare windows command, as this retrying caused the command to be stuck for a long time.
  • Fixed an issue where snapshot.storage.k8s.io/v1 resources were rejected by the snapshot admission webhook.
  • The CVE-2021-3520 security vulnerability has been fixed. 

July 08, 2021

Anthos clusters on VMware 1.8.0-gke.25 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.0-gke.25 runs on Kubernetes v1.20.5-gke.1301.

Fixes:

Fixed CVE-2021-34824 that could expose private keys and certificates from Kubernetes secrets through the credentialName field when using Gateway or DestinationRule. This vulnerability affects all clusters created or upgraded with Anthos clusters on VMware version 1.8.0.21. For more information, see the GCP-2021-012 security bulletin.

July 07, 2021

Anthos clusters on VMware 1.8.0-gke.25 is now available to resolve this issue.

The Istio project recently disclosed a new security vulnerability, CVE-2021-34824, affecting Istio. Istio contains a remotely exploitable vulnerability where credentials specified in the credentialName field for Gateway or DestinationRule can be accessed from different namespaces.

For more information, see the GCP-2021-012 security bulletin.

June 28, 2021

Anthos clusters on VMware 1.8.0-gke.21 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.0-gke.21 runs on Kubernetes v1.20.5-gke.1301.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Cluster lifecycle improvements:

You should no longer use gcloud to unregister a user cluster, because clusters are registered automatically. Instead, register existing user clusters by using gkectl update cluster. You can also use gkectl update cluster to consolidate out-of-band registration that was done using gcloud. For more information, see Cluster registration.

Platform enhancements:

  • Preview: Cluster autoscaling is now available in preview. With cluster autoscaling, you can horizontally scale node pools in proportion to workload demand. When demand is high, the cluster autoscaler adds nodes to the node pool. When demand is low, the cluster autoscaler removes nodes from the node pool, scaling back down to a minimum size that you designate. Cluster autoscaling can increase the availability of your workloads while controlling costs.

  • Preview: User cluster control-plane node and admin cluster add-on node auto sizing are now available in preview. The features can be enabled separately in user cluster or admin cluster configurations. When you enable user cluster control-plane node auto sizing, user cluster control-plane nodes are automatically resized in proportion to the number of node pool nodes in the given user cluster. When you enable admin cluster add-on node auto sizing, admin cluster add-on nodes are automatically resized in proportion to the number nodes in the admin cluster.

  • Preview: Windows Server container support for Anthos clusters on VMware is now available in preview. This allows you to modernize and run your Windows-based apps more efficiently in your data centers without having to go through risky application rewrites. You can use Windows containers alongside Linux containers for your container workloads. The same experience and benefits that you have come to enjoy with Anthos clusters on VMware using Linux--application portability, consolidation, cost savings, and agility--can now be applied to Windows Server applications also.

  • Preview: Admin cluster backup is now available in preview. With this feature enabled, admin cluster backups are automatically performed before and after user and admin cluster creation, update, and upgrade. A new gkectl backup admin command performs manual backup. Upon admin cluster storage failure, you can restore the admin cluster from a backup with the gkectl repair admin-cluster --restore-from-backup command.

Security enhancements:

  • The Ubuntu node image is qualified with the CIS (Center for Internet Security) L1/L2 Server Benchmark.

  • Generally available: Workload identity support is now generally available. For more information, see Fleet workload identity. The connect-agent service account key is no longer required during installation. The connect agent uses workload identity to authenticate to Google Cloud instead of an exported Google Cloud service account key.

  • You can now use gkectl to rotate system root CA certificates for user clusters.

  • You can now use gkectl to update vCenter CA certificates for both admin clusters and user clusters.

Network feature enhancements:

Preview: Egress NAT gateway is now available in preview. To be able to access off-cluster workloads, traffic originating within the cluster that is related to specific flows must have deterministic source IP addresses. Egress NAT gateway gives you fine-grained control over which traffic gets a deterministic source IP address, and then provides that address. The Egress NAT Gateway functionality is built on top of Dataplane V2.

Storage enhancements:

  • The Anthos vSphere CSI driver now supports both offline and online volume expansion for dynamically and statically created block volumes only.

    • Offline volume expansion is available in vSphere 7.0 and later. Online expansion is available in vSphere 7.0u2 and later.

    • The vSphere CSI driver StorageClass standard-rwo, which is installed in user clusters automatically, sets allowVolumeExpansion to true by default for newly created clusters running on vSphere 7.0 or later. You can use both online and offline expansion for volumes using this StorageClass.

  • The volume snapshot feature now supports v1 versions of VolumeSnapshot, VolumeSnapshotContent, and VolumeSnapshotClass objects. The v1beta1 versions are deprecated and will soon stop being served.

Simplify day-2 operations:

  • You can now use Anthos Identity Service (AIS) and OpenID Connect (OIDC) for authentication to admin clusters in addition to user clusters.

  • Preview: Anthos Identity Service can now resolve groups with Okta as identity provider. This allows administrators to write RBAC policy with Okta groups.

  • Preview: Anthos Identity service now supports LDAP authentication methods in addition to OIDC. You can use AIS with Microsoft Active Directory without the need for provisioning Active Directory Federation Services.

  • The Anthos metadata agent replaces the original metadata agent to collect and send Anthos metadata to Google Cloud Platform, so that Google Cloud Platform can use this metadata to build a better user interface for Anthos clusters. You must 1) enable the Config Monitoring for Ops API in your logging-monitoring project, 2) grant the Ops Config Monitoring Resource Metadata Writer role to your logging-monitoring service account, and 3) add opsconfigmonitoring.googleapis.com to your proxy allowlist (if applicable).

  • You can use gkectl diagnose snapshot --upload-to [GCS_BUCKET] --service-account-key-file [SA_KEY_FILE] to automatically upload snapshots to a Google Cloud Storage (GCS) bucket. The provided service account must have the roles/storage.admin IAM role enabled.

Functionality changes:

  • The admin cluster now uses containerd on all nodes, including the admin cluster control-plane node, admin cluster add-on nodes, and user cluster control-plane nodes. This applies to both new admin clusters and existing admin clusters upgraded from 1.7.x. On user cluster node pools,  containerd is the default container runtime for new node pools, but existing node pools that are upgraded from 1.7.x will continue using Docker Engine. You can continue to use Docker Engine for a new node pool by setting its osImageType to ubuntu.

  • A new ubuntu_containerd OS image type is introduced. ubuntu_containerd uses an identical OS image as ubuntu, but the node is configured to use containerd as the container runtime instead. The ubuntu_containerd OS is used for new node pools by default, but existing node pools upgraded from 1.7.x continue using Docker Engine. Docker Engine support will be removed in Kubernetes 1.24, and you should start converting your node pools to ubuntu_containerd as soon as possible.

  • When installing or upgrading to 1.8.0-gke.21 on a vCenter with a vSphere version older than 6.7 Update 3, you may receive a notification. Note that vSphere versions older than 6.7 Update 3 will no longer be supported in Anthos clusters on VMware in an upcoming version.

  • The create-config Secret is removed in both the admin and the user clusters. If you previously relied on workarounds that modify the secret(s), contact Cloud Support for updates.

  • You can update the CPU and memory configuration for the user cluster control-plane node with gkectl update cluster.

  • You can configure the CPU and memory configurations for the admin control-plane node to non-default settings during admin cluster creation through the newly introduced admin cluster configuration fields.

  • Node auto repairs are throttled at the node pool level. The number of repairs per hour for a node pool is limited to the either 3, or 10% of the number of nodes in the node pool, whichever is greater.

  • Starting from Kubernetes 1.20, timeouts on exec probes are honored, and default to one second if unspecified. If you have Pods using exec probes, ensure they can easily complete in one second or explicitly set an appropriate timeout. See Configure Probes for more details.

  • Starting from Kubernetes 1.20, Kubelet no longer creates the target_path for NodePublishVolume in accordance with the CSI spec. If you have self-managed CSI drivers deployed in your cluster, ensure they are idempotent and do any necessary mount creation/verification. See Kubernetes issue #88759 for details.

  • Non-deterministic treatment of objects with invalid ownerReferences was fixed in Kubernetes 1.20. You can run the kubectl-check-ownerreferences tool prior to upgrade to locate existing objects with invalid ownerReferences. The metadata.selfLink field, deprecated since Kubernetes 1.16, is no longer populated in Kubernetes 1.20. See Kubernetes issue #1164 for details.

Breaking changes:

  • The Istio components have been upgraded to handle ingress support. Previously, using HTTPS for ingress required both an Istio Gateway and Kubernetes Ingress. With this release, the full ingress spec is natively supported. See Ingress migration to manage this upgrade for Istio components.

  • The Cloud Run for Anthos user cluster configuration option is no longer supported. Cloud Run for Anthos is now installed as part of registration with a fleet. This allows for configuring and upgrading Cloud Run separately from Anthos clusters on VMware. To upgrade to the newest version of Cloud Run for Anthos, see Installing Cloud Run for Anthos.

Fixes:

  • Previously, the admin cluster upgrade could be affected by the expired front-proxy-client certificate that persists in the data disk for the admin cluster control-plane node. Now the front-proxy-client certificate is renewed during an upgrade.

  • Fixed an issue where logs are sent to the parent project of the service account specified in the stackdriver.serviceAccountKeyPath field of your cluster configuration file while the value of stackdriver.projectID is ignored.

  • Fixed an issue that Calico-node Pods sometimes use an excessive amount of CPU in large-scale clusters.

The stackdriver-metadata-agent-cluster-level-* Pod might have logs that look like this:

reflector.go:131] third_party/golang/kubeclient/tools/cache/reflector.go:99: Failed to list *unstructured.Unstructured: the server could not find the requested resource

You can safely ignore these logs.

June 17, 2021

When you upgrade an unregistered Anthos cluster on VMware from a version earlier than 1.7.0 to a version 1.7.0 or later, you need to manually install and configure the Anthos Config Management operator. If you had previously installed Anthos Config Management, you need to re-install it. For details on how to do this, see Installing Anthos Config Management.

If you are using a private registry for software images, upgrading an Anthos cluster on VMware will always require special steps, described in Updating Anthos Config Management using a private registry. Upgrading from a version earlier than 1.7.0 to a version 1.7.0 or later additionally requires that you manually install and configure the Anthos Config Management operator as described in Installing Anthos Config Management.

June 08, 2021

Anthos clusters on VMware 1.5.4-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.5.4-gke.2 runs on Kubernetes v.1.17.9-gke.4400. The supported versions that offer the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

Fixes

These security vulnerabilities have been fixed:

Fixed CVE-2021-25735 mentioned in the GCP-2021-003 Security Bulletin, CVE-2021-31535, and other medium and low vulnerability CVEs with fixes available.

June 07, 2021

Anthos clusters on VMware 1.6.3-gke.3 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.3-gke.3 runs on Kubernetes v1.18.18-gke.100. The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

Fixes

These security vulnerabilities have been fixed:

Fixed CVE-2021-25735 mentioned in the GCP-2021-003 Security Bulletin, CVE-2021-31535, and other medium and low vulnerability CVEs with fixes available.

May 27, 2021

Anthos clusters on VMware 1.7.2-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.2-gke.2 runs on Kubernetes 1.19.10-gke.1602.

The supported versions that offer the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

The Ubuntu node image shipped in version 1.7.2 is qualified with the CIS (Center for Internet Security) L1 Server Benchmark.

Fixes:

An admin cluster upgrade may fail due to an expired front-proxy-client certificate on the admin control plane node. Make sure that the certificate is not expired, and recreate it if needed. See: Renew an expired certificate.

May 21, 2021

In Anthos clusters on VMware 1.7, logs are sent to the parent project of your logging-monitoring service account. That is, logs are sent to the parent project of the service account specified in the stackdriver.serviceAccountKeyPath field of your cluster configuration file. The value of stackdriver.projectID is ignored. This issue will be fixed in an upcoming release.

As a workaround, view logs in the parent project of your logging-monitoring service account.

May 20, 2021

In version 1.7.1, the stackdriver-log-forwarder starts to consume significantly increasing memory after a period of time, and the logs show an excessive number of OAuth 2.0 token requests. Follow these steps to mitigate this issue.

May 11, 2021

A recently discovered vulnerability, CVE-2021-31920, affects Istio in respect to its authorization policies. Istio contains a remotely exploitable vulnerability where an HTTP request with multiple slashes or escaped slash characters can bypass Istio authorization policy when path-based authorization rules are used. While Anthos clusters on VMware uses an Istio Gateway object for network ingress traffic into clusters, authorization policies are not a supported or intended use case for Istio as part of the Anthos clusters on VMware prerequisites. For more details, refer to the Istio security bulletin.

May 06, 2021

The Envoy and Istio projects recently announced several new security vulnerabilities (CVE-2021-28683, CVE-2021-28682, and CVE-2021-29258) that could allow an attacker to crash Envoy.

For more information, see the GCP-2021-004 security bulletin.

May 05, 2021

Anthos clusters on VMware 1.7.1-gke.4 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.1-gke.4 runs on Kubernetes 1.19.7-gke.2400.

The supported versions that offer the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

If you upgrade the admin cluster before you upgrade the associated user clusters within the same minor version, such as from 1.7.0 to 1.7.1, the user cluster control-planes will be upgraded together with the admin cluster. This applies even if you use the flag --force-upgrade-admin. This behavior, in versions 1.7.0 and later, is different from versions 1.6 and earlier, and is expected behavior.

Fixes:

  • Fixed a bug, so that the hardware version of a virtual machine is determined based on the ESXi host apiVersion instead of the host version. When host ESXi apiVersion is at least 6.7U2, VMs with version vmx-15 are created. Also, the CSI preflight checks validate the ESXi host API version instead of the host version.

  • Fixed a bug, so that if vSphereCSIDisabled is set to true, Container Storage Interface (CSI) preflight checks do not run when you execute commands such as gkectl check-config or create loadbalancer or create cluster.

  • Fixed CVE-2021-3444, CVE-2021-3449, CVE-2021-3450, CVE-2021-3492, CVE-2021-3493, and CVE-2021-29154 on the Ubuntu operating system used by the admin workstation, cluster nodes, and Seesaw.

  • Fixed a bug where attempting to install or upgrade GKE on-prem 1.7.0 failed with an "/STSService/ 400 Bad Request" when the vCenter is installed with the external platform services controller. Installations where the vCenter server is a single appliance are not affected. Note that VMware deprecated the external platform services controller in 2018.

  • Fixed a bug where auto repair failed to trigger for unhealthy nodes if the cluster-health-controller was restarted while a previously issued repair was in progress.

  • Fixed a bug so that the command gkectl diagnose snapshot output includes the list of containers and the containerd daemon log on Container-Optimized OS (COS) nodes.

  • Fixed a bug that caused gkectl update admin to generate an InternalFields diff unexpectedly.

  • Fixed the issue that the stackdriver-log-forwarder pod was sometimes in crashloop because of fluent-bit segfault.

April 20, 2021

The Kubernetes project recently announced a new security vulnerability, CVE-2021-25735, that could allow node updates to bypass a Validating Admission Webhook. For more details, see the GCP-2021-003 security bulletin.

March 25, 2021

Anthos clusters on VMware 1.7.0-gke.16 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.0-gke.16 runs on Kubernetes 1.19.7-gke.2400.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting GKE On-Prem are 1.6, 1.5, and 1.4.

Cluster lifecycle improvements

  • The cluster upgrade process has changed. Instead of upgrading the admin cluster first, you can upgrade user clusters to the newer version without upgrading the admin cluster. The new flow, which requires upgrading gkeadm, allows you to preview new features before performing a full upgrade with the admin cluster. In addition, the 1.7.0 version of gkectl can perform operations on both 1.6.X and 1.7.0 clusters.

  • Starting with version 1.7.0, you can deploy Anthos clusters on vSphere 7.0 environments in addition to vSphere 6.5 and 6.7. Note that Anthos clusters on VMware will phase out vSphere 6.5 support following VMware end of general support timelines.

  • Published the minimum hardware resource requirements for a proof-of-concept cluster.

Platform enhancements

  • GA: Node auto repair is now generally available and enabled by default for newly created clusters. When the feature is enabled, cluster-health-controller performs periodic health checks, surfaces problems as events on cluster objects, and automatically repairs unhealthy nodes.

  • GA: vSphere resource metrics is now generally available and enabled by default for newly created clusters. When the feature is enabled, VM level resource contention metrics are collected and displayed in the VM health dashboards automatically created through out-of-the-box monitoring. You can use these dashboards to track VM resource contention issues.

  • GA: Dataplane V2 is now generally available and can be enabled in newly created clusters.

  • GA: Network Policy Logging is now generally available. Network policy logging is available only for clusters running Dataplane V2.

  • You can attach vSphere tags to user cluster node pools during cluster creation and update. You can use tags to organize and select VMs in vCenter.

Security enhancements:

  • Preview: You can run Container-Optimized OS on your user cluster worker nodes.

Simplify Day-2 operations:

  • GA: Support for vSphere folders is now generally available. This allows you to install Anthos clusters on VMware in a vSphere folder, reducing the scope of the permission required for the vSphere user.

  • A new gkectl update admin command supports updating certain admin cluster configurations including adding static IP addresses.

  • The central log aggregator component has been removed from the logging pipeline to improve reliability, scalability and resource usage.

  • Cluster scalability has been improved:

    • 50 user clusters per admin cluster

    • With Seesaw, 500 nodes, 15,000 Pods, and 500 LoadBalancer Services per user cluster

    • With F5 BIG-IP, 250 nodes, 7,500 Pods, and 250 LoadBalancer Services per user cluster

Anthos Config Management:

Anthos Config Management (ACM) is now decoupled from Anthos clusters on VMware. This provides multiple benefits including decoupling the ACM release cadence from Anthos clusters on VMware, simplifying the testing and qualification process, and providing a consistent installation and upgrade flow.

Storage enhancements:

GA: The vSphere CSI driver is now generally available. Your vCenter server and ESXi hosts must both be running 6.7 update 3 or newer. The preflight checks and gkectl diagnose cluster have been enhanced to cover the CSI prerequisites.

Functionality changes:

  • gkectl diagnose cluster now includes validation load balancing, including F5, Seesaw, and manual mode.

  • gkectl diagnose snapshot now provides an HTML index file in the snapshot, and collects extra container information from the admin cluster control-plane node when the Kubernetes API server is inaccessible.

  • gkectl update admin has been updated to:

    • Enable or disable auto repair in the admin cluster
    • Add static IP addresses to the admin cluster
    • Enable/disable vSphere resource metrics in the admin cluster
  • gkectl update cluster has been enhanced to enable or disable vSphere resource metrics in a user cluster.

  • Given that we no longer need an allowlisted service account in the admin workstation configuration file, we deprecated the gcp.whitelistedServiceAccountKeyPath field and added a new gcp.componentAccessServiceAccountKeyPath field. For consistency, we also renamed the corresponding gcrKeyPath field in the admin cluster configuration file.

Breaking changes:

  • The following Google Cloud API endpoints must be allowlisted in network proxies and firewalls. These are now required for Connect Agent to authenticate to Google when the cluster is registered in Hub:

    • securetoken.googleapis.com
    • sts.googleapis.com
    • Iamcredentials.googleapis.com
  • gkectl now accepts only v1 cluster configuration files. For instructions on converting your v0 configuration files, see Converting configuration files.

Fixes:

  • Fixed a bug where Grafana dashboards based on the container_cpu_usage_seconds_total metric show no data.

  • Fixed an issue where scheduling Stackdriver components on user cluster control-plane nodes caused resource contention issues.

  • Fixed Stackdriver Daemonsets to tolerate NoSchedule and NoExecute taints.

  • Fixed an HTTP/2 connection issue that sometimes caused problems with connections from the kubelet to the Kubernetes API server. This issue also could lead to nodes becoming not ready.

Known issues:

  • Calico-node Pods sometimes use an excessive amount of CPU in large-scale clusters. You can mitigate the issue by killing such Pods.

  • When running gkectl update admin against a cluster upgraded from 1.6, you might get the following diff:

    - InternalFields: nil,
    - InternalFields: map[string]string{"features.onprem.cluster.gke.io/bundle- 
    vsphere-credentials": "enabled"},
    

    You can safely ignore this and proceed with the update.

February 26, 2021

Anthos clusters on VMware (GKE on-prem) 1.6.2-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.2-gke.0 clusters run on Kubernetes 1.18.13-gke.400.

Fixed in 1.6.2-gke.0:

  • Fixed a kubelet restarting issue that was found when running workloads that rely on kubectl exec/port-forward/attach, such as Jenkins.

  • Fixed CVE-2021-3156 in the node operating system image. CVE-2021-3156 is described in Security bulletins.

GKE on-prem 1.4.5-gke.0 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.5-gke.0 clusters run on Kubernetes 1.16.11-gke.11.

Fixed in 1.4.5-gke.0:

January 27, 2021

Anthos clusters on VMware (GKE on-prem) 1.6.1-gke.1 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.1-gke.1 clusters run on Kubernetes 1.18.13-gke.400.

Fixes:

  • Fixed a bug where the user cluster upgrade is blocked if the vcenter resource pool is neither directly nor indirectly specified (that is, if the vcenter resource pool is inherited and is the one used by the admin cluster) in the configs.
  • Fixed CVE-2020-15157 and CVE-2020-15257 in containerd.
  • Fixed an issue where upgrading the admin cluster from 1.5 to 1.6.0 breaks 1.5 user clusters that use any OIDC provider and that have no value for authentication.oidc.capath in the user cluster configuration file.

January 21, 2021

Anthos GKE on-prem 1.5.3-gke.0 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.3-gke.0 clusters run on Kubernetes 1.17.9-gke.4400.

Fixes:

  • Fixed CVE-2020-15157 and CVE-2020-15257 in containerd.

  • Cloud Run Operator is now able to successfully update custom resource definitions (CRDs).

December 10, 2020

Anthos clusters on VMware 1.6.0-gke.7 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.0-gke.7 clusters run on Kubernetes 1.18.6-gke.6600.

Note: The fully supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.6, 1.5, and 1.4.

Users can use a credential configuration file with gkeadm (credential.yaml), which is generated during running the gkeadm create config command, to improve security by removing credentials from admin-ws-config.yaml.

Node Problem Detector and Node Auto Repair automatically detect and repair additional failures, such as Kubelet-API server connection loss (an OSS issue) and long-lasting DiskPressure conditions.

Preview: Repair administrator master VM failures by using the new command, gkectl repair admin-master.

Preview: Secrets Encryption for user clusters using Thales Luna Network HSM Devices.

Preview: Service Account Key Rotation in gkectl for Usage Metering, Cloud Audit Logs, and Google Cloud's operations suite service accounts.

Anthos Identity Service enables dynamic configuration changes for OpenID Connect (OIDC) configuration without needing to recreate user clusters.

Google Cloud's operations suite support for bundled Seesaw load balancing:

Metrics and logs of bundled Seesaw load balancers are now uploaded to Google Cloud through Google Cloud's operations suite to provide the best observability experience.

Cloud Audit Logs

Offline buffer for Cloud Audit Logs: Audit logs are now buffered on disk if not able to reach Cloud Audit Logs and can withstand at least 4 hours of network outage.

CSI volume snapshots

The CSI snapshot controllers are now automatically deployed in user clusters, enabling the users to create snapshots of persistent volumes and restore the volumes' data by provisioning new volumes from these snapshots.

Functionality changes:

  • Gkectl diagnose cluster and snapshot enhancements:

    • Added a --log-since flag to gkectl diagnose snapshot. Users can use it to collect logs of containers and nodes within a relative time duration in the snapshot.

    • Replaced the --seed-config flag with the --config flag in the gkectl diagnose cluster command. Users can use this command with the seed configuration to rule out the VIP issue and provide more debugging information of the cluster.

    • Added more validations in gkectl diagnose cluster.

  • Added iscsid support: Qualified storage drivers that previously required additional steps benefit from the default iscsi service deployment on the worker nodes.

  • On each cluster node, Anthos clusters on VMware now reserves 330 MiB + 5% of the node's memory capacity for operating system components and core Kubernetes components. This is an increase of 50 MiB. For more information see Resources available for your workloads.

Breaking changes:

Fixes:

  • Security fix: Resolve credential file references when only a subset of credentials are specified by reference.

  • Fixed vSphere credential update when CSI storage is not enabled.

  • Fixed a bug in Fluent Bit in which the buffer for logs might fill up node disk space.

Known issues:

  • gkectl update reverts your edits on clientconfig CR in 1.6.0. We strongly suggest that customers back up the clientconfig CR after every manual change.

  • Kubectl describe CSINode and gkectl diagnose snapshot might sometimes fail due to the OSS Kubernetes issue on dereferencing nil pointer fields.

  • The OIDC provider doesn't use the common CA by default. You must explicitly supply the CA certificate.

  • Upgrading the admin cluster from 1.5 to 1.6.0 breaks 1.5 user clusters that use any OIDC provider and that have no value for authentication.oidc.capath in the user cluster configuration file.

    To work around this issue, run the following script, using your OIDC provider address as the IDENTITY_PROVIDER, YOUR_OIDC_PROVIDER_ADDRESS in the following script:

    USER_CLUSTER_KUBECONFIG=usercluster-kubeconfig

    IDENTITY_PROVIDER=YOUR_OIDC_PROVIDER_ADDRESS

    openssl s_client -showcerts -verify 5 -connect $IDENTITY_PROVIDER:443 < /dev/null | awk '/BEGIN CERTIFICATE/,/END CERTIFICATE/{ if(/BEGIN CERTIFICATE/){i++}; out="tmpcert"i".pem"; print >out}'

    ROOT_CA_ISSUED_CERT=$(ls tmpcert*.pem | tail -1)

    ROOT_CA_CERT="/etc/ssl/certs/$(openssl x509 -in $ROOT_CA_ISSUED_CERT -noout -issuer_hash).0"

    cat tmpcert*.pem $ROOT_CA_CERT > certchain.pem CERT=$(echo $(base64 certchain.pem) | sed 's\ \\g') rm tmpcert1.pem tmpcert2.pem

    kubectl --kubeconfig $USER_CLUSTER_KUBECONFIG patch clientconfig default -n kube-public --type json -p "[{ \"op\": \"replace\", \"path\": \"/spec/authentication/0/oidc/certificateAuthorityData\", \"value\":\"${CERT}\"}]"

November 16, 2020

GKE on-prem 1.5.2-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.2-gke.3 clusters run on Kubernetes 1.17.9-gke.4400.

GKE Data Plane V2 Preview is now available.

  • GKE Data Plane V2 is a new programmable data path that enables Google to offer new network security features like Network Policy Logging and Node Network Policy.

For information about enabling Dataplane V2, see User cluster configuration file. For information about Network Policy Logging, see Logging network policy events.

Binary Authorization for GKE on-prem 0.2.1 is now available.

  • Binary Authorization for GKE on-prem 0.2.1 adds a proxy side cache that caches AdmissionReview responses. This can improve the reliability of the webhook.

Fixes:

  • Fixed false warning in gkectl check-config for admin cluster for manual load balancing category.
  • Updated Istio Ingress (Kubernetes) Custom Resource Definitions (CRDs) to use v1beta1.
  • Fixed issue where GKE on-prem upgrade is stuck because of Cloud Run for Anthos on-prem pods crash looping. Cloud Run for Anthos on-prem causes an operational outage of GKE on-prem when Cloud Run for Anthos on-prem is enabled in upgrade of GKE on-prem 1.4 to 1.5. Fixed webhook; custom resource definition (CRD) is not fixed.

Known issues:

Cloud Run Operator is unable to update custom resource definitions (CRDs). Applying the CRDs manually either before or during the upgrade lets the operator continue the upgrade.

Workaround:

gsutil cat gs://gke-on-prem-release/hotfixes/1.5/cloudrun/crds.yaml | kubectl apply -f -

November 02, 2020

GKE on-prem 1.4.4-gke.1 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.4-gke.1 clusters run on Kubernetes 1.16.11-gke.11.

Fixes:

  • Updated Istio Ingress (Kubernetes) Custom Resource Definitions (CRDs) to use v1beta1.

GKE on-prem 1.3.5-gke.2 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.5-gke.2 clusters run on Kubernetes 1.15.12-gke.6400.

Fixes:

October 23, 2020

GKE on-prem 1.5.1-gke.8 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.1-gke.8 clusters run on Kubernetes 1.17.9-gke.4400.

Binary Authorization for GKE on-prem Preview is now available:

This release enables customers to generate credential configuration templates by using the gkectl create-config credential command.

Published the best practices for how to set up GKE on-prem components for high availability and how to recover from disasters.

Published the best practices for creating, configuring, and operating GKE on-prem clusters at large scale.

Known issues:

The version of Anthos Configuration Management included in the GKE on-prem release 1.5.1-gke.8 had initially referenced a version of the nomos image that had not been moved into the gcr.io/gke-on-prem-release repository, thus preventing a successful installation or upgrade of Anthos Configuration Management. This image has since been pushed to the repository to correct the issue for customers not using private registries. Customers using private registries will need to upgrade to 1.5.2 when it is available (scheduled for November 16, 2020) or manually copy the nomos:v1.5.1-rc.7 image into their private repository.

Fixes:

  • Fixed cluster creation issue when Cloud Run is enabled.
  • Fixed the false positive error in docker registry preflight check where REGISTRY_ADDRESS/NAMESPACE might be mistakenly used as the registry address to store the certs on a test VM, causing authentication errors.

September 24, 2020

GKE on-prem 1.5.0-gke.27 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.0-gke.27 clusters run on Kubernetes 1.17.9-gke.4400.

Improved upgrade and installation:

  • Preflight checks are now blocking with v1 configs for installation and upgrades. Users can use --skip-preflight-check-blocking to unblock the operation.
  • Added support for running gkeadm on macOS Catalina, v10.14.
  • Enabled installation and upgrade by using any Google Cloud–authenticated service account. This removes the need for allowlisting.
  • Improved security by adding support for using an external credential file in admin or user configuration. This enables customers to check in their cluster configuration files in source code repositories without exposing confidential credential information.

Improved HA and failure recovery:

Improved support for Day-2 operations:

  • The gkectl update cluster command is now generally available. Users can use it to change supported features in the user cluster configurations after cluster creation.
  • The gkectl update credentials command for vSphere and F5 credentials is now generally available.
  • Improves scalability with 20 user clusters per admin cluster, and 250 nodes, 7500 pods, 500 load balancing services (using Seesaw), and 250 load balancing services (using F5) per user cluster.
  • Introduces vSphere CSI driver in preview.

Enhanced monitoring with Cloud Monitoring:

  • Introduces out-of-the-box alerts for critical cluster metrics and events in preview.
  • Out-of-the-box monitoring dashboards are automatically created during installation when Cloud Monitoring is enabled.
  • Lets users modify CPU or memory resource settings for Cloud Monitoring components.

Functionality changes:

  • Preflight check failures now block gkectl create loadbalancer for the bundled load balancer with Seesaw.
  • Adds a blocking preflight check for the anthos.googleapis.com API of a configured gkeConnect project.
  • Adds a blocking preflight check on proxy IP and service/pod CIDR overlapping.
  • Adds a non-blocking preflight check on cluster health before an admin or user cluster upgrade.
  • Updates the gkectl diagnose snapshot:
    • Fixes the all scenario to collect all supported Kubernetes resources for the target cluster.
    • Collects F5 load balancer information, including Virtual Server, Virtual Address, Pool, Node, and Monitor.
    • Collects vSphere information, including VM objects and their events based on the resource pool and the Datacenter, Cluster, Network, and Datastore objects that are associated with VMs.
  • Fixes the OIDC proxy configuration issue. Users no longer need to edit NO_PROXY env settings in the cluster configuration to include new node IPs.
  • Adds monitoring.dashboardEditor to the roles granted to the logging-monitoring service account during admin workstation creation with --auto-create-service-accounts.
  • Bundled load balancing with Seesaw switches to the IPVS maglev hashing algorithm, achieving stateless, seamless failover. There is no connection sync daemon anymore.
  • The hostconfig section of the ipBlock file can be specified directly in the cluster yaml file network section and has a streamlined format.

Breaking changes:

  • Starting with version 1.5, instead of using kubectl patch machinedeployment to resize the user cluster and kubectl edit cluster to add static IPs to user clusters, use gkectl update cluster to resize the worker node in user clusters and to add static IPs to user clusters.
  • Starting with version 1.5, the gkectl log is saved in a single file instead of multiple files by log verbosity levels. By default, the gkectl log is saved in the /home/ubuntu/.config/gke-on-prem/logs directory with a symlink created under the ./logs directory for easy access. Users can use --log_dir or --log_file to change this default setting.
  • Starting with version 1.5, the gkeadm log is saved in a single file instead of multiple files by log verbosity levels. By default, the gkeadm log is saved under ./logs. Users can use --log_dir or --log_file to change this default setting.
  • In version 1.5 only, the etcd version is updated from 3.3 to 3.4, which means the etcd image becomes smaller for improved performance and security (distroless), and the admin and user cluster etcd restore process is changed.
  • In 1.5 and later releases, a new firewall rule needs to be enabled from admin cluster add-on nodes to vCenter server API port 443.

Fixes:

  • Fixed an issue that caused approximately 50 seconds of downtime for the user cluster API service during cluster upgrade or update.
  • Corrected the default log verbosity setting in gkectl and gkeadm help messages.

Known issues:

  • Due to a 1.17 Kubernetes issue, kube-apiserver and kube-scheduler don't expose kubernetes_build_info on the /metrics endpoint in the 1.5 release. Customers can use Kubernetes_build_info from kube-controller-manager to get similar information like the Kubernetes major version, minor version, and build date.
  • Cloud Run for Anthos on-prem causes an operational outage of GKE on-prem when Cloud Run for Anthos on-prem is enabled in both installation and upgrade of GKE on-prem 1.5.0.

September 17, 2020

GKE on-prem 1.4.3-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.3-gke.3 clusters run on Kubernetes 1.16.11-gke.11.

Fixes:

  • Fixed CVE-2020-14386 described in Security Bulletin.

  • Preflight check for hostname validation was too strict. We updated the hostname validation following the RFC 1123 DNS subdomain definition.

  • There was an issue in the 1.4.0 and 1.4.2 releases where the node problem detector didn't start when the node restarted. This is fixed in this version.

GKE on-prem 1.3.4-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.4-gke.3 clusters run on Kubernetes 1.15.12-gke.15.

Fixes:

August 20, 2020

GKE on-prem 1.4.2-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.2-gke.3 clusters run on Kubernetes 1.16.11-gke.11.

GPU support (beta solution in collaboration with Nvidia)

In partnership with Nvidia, users can now manually attach a GPU to a worker node VM to run GPU workloads. This requires using the open source Nvidia GPU operator.

Note: Manually attached GPUs do not persist through node lifecycle events. You must manually re-attach them. This is a beta solution and can be used for evaluation and proof of concept.

The Ubuntu image is upgraded to include the newest packages.

gkectl delete loadbalancer is updated to support the new version of configuration files for admin and user clusters.

Fixes:

  • Resolved a few incorrect Kubelet Metrics' names collected by Prometheus.
  • Updated restarting machines process during admin cluster upgrade to make the upgrade process more resilient to transient connection issues.
  • Resolved a preflight check OS image validation error when using a non-default vSphere folder for cluster creation; the OS image template is expected to be in that folder.
  • Resolved a gkectl upgrade loadbalancer issue to avoid validating the upgraded SeesawGroup. This fix lets the existing SeesawGroup config be updated without negatively affecting the upgrade process.
  • Resolved an issue where ClientConfig CRD is deleted when the upgrade to the latest version is run multiple times.
  • Resolved a gkectl update credentials vsphere issue where the vsphere-metrics-exporter was using the old credentials even after updating the credentials.
  • Resolved an issue where the VIP preflight check reported a user cluster add-on load balancer IP false positive.
  • Fixed gkeadm updating config after upgrading on Windows, specifically for the gkeOnPremVersion and bundlePath fields.
  • Automatically mount the data disk after rebooting on admin workstations created using gkeadm 1.4.0 and later.
  • Reverted thin disk provisioning change for boot disks in 1.4.0 and 1.4.1 on all normal (excludes test VMs) cluster nodes.
  • Removed vCenter Server access check from user cluster nodes.

July 30, 2020

Anthos clusters on VMware 1.3.3-gke.0 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.3-gke.0 clusters run on Kubernetes 1.15.12-gke.9.

Fixes:

June 25, 2020

Anthos clusters on VMware 1.4.0-gke.13 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.0-gke.13 clusters run on Kubernetes 1.16.8-gke.6.

Updated to Kubernetes 1.16:

Simplified upgrade:

  • This release provides a simplified upgrade experience via the following changes:

    • Automatically migrate information from the previous version of admin workstation using gkeadm.
    • Extend preflight checks to better prepare for upgrades.
    • Support skip version upgrade to enable users to upgrade the cluster from any patch release of a minor release to any patch release of the next minor release. For more information about the detailed upgrade procedure and limitations, see upgrading GKE on-prem.
    • The alternate upgrade scenario for Common Vulnerabilities and Exposures has been deprecated. All upgrades starting with version 1.3.2 need to upgrade the entire admin workstation.
    • The bundled load balancer is now automatically upgraded during cluster upgrade.

Improved installation and cluster configuration:

  • The user cluster node pools feature is now generally available.
  • This release improves the installation experience via the following changes:

    • Supports gkeadm for Windows OS.
    • Introduces a standalone command for creating admin clusters.
  • Introduce a new version of configuration files to separate admin and user cluster configurations and commands. This is designed to provide a consistent user experience and better configuration management.

Improved disaster recovery capabilities:

  • This release provides enhanced disaster recovery functionality to support backup and restore HA user cluster with etcd.
  • This release also provides a manual process to recover a single etcd replica failure in a HA cluster without any data loss.

Enhanced monitoring with Cloud Monitoring (formerly Stackdriver):

  • This release provides better product monitoring and resource usage management via the following changes:

  • Ubuntu Image now conforms with PCI DSS, NIST Baseline High, and DoD SRG IL2 compliance configurations.

Functionality changes:

  • Enabled Horizontal Pod Autoscaler (HPA) for the Istio ingress gateway.
  • Removed ingress controller from admin cluster.
  • Consolidated sysctl configs with Google Kubernetes Engine.
  • Added etcd defrag pod in admin cluster and user cluster, which will be responsible for monitoring etcd's database size and defragmenting it as needed. This helps reclaim etcd database size and recover etcd when its disk space is exceeded.

Support for a vSphere folder (Preview):

  • This release allows customers to install GKE on-prem in a vSphere folder, reducing the scope of the permission required for the vSphere user.

Improved scale:

Fixes:

  • Fixed the issue of the user cluster's Kubernetes API server not being able to connect to kube-etcd after admin nodes and user cluster master reboot. In previous versions, kube-dns in admin clusters was configured through kubeadm. In 1.4, this configuration is moved from kubeadm to bundle, which enables deploying two kube-dns replicas on two admin nodes. As a result, a single admin node reboot/failure won't disrupt user cluster API access.
  • Fixed the issue that controllers such as calico-typha can't be scheduled on an admin cluster master node, when the admin cluster master node is under disk pressure.
  • Resolved pods failure with MatchNodeSelector on admin cluster master after node reboot or kubelet restart.
  • Tuned etcd quota limit settings based on the etcd data disk size and the settings in GKE Classic.

Known issues:

  • If a user cluster is created without any node pool named the same as the cluster, managing the node pools using gkectl update cluster would fail. To avoid this issue, when creating a user cluster, you need to name one node pool the same as the cluster.
  • The gkectl command might exit with panic when converting config from "/path/to/config.yaml" to v1 config files. When that occurs, you can resolve the issue by removing the unused bundled load balancer section ("loadbalancerconfig") in the config file.
  • When using gkeadm to upgrade an admin workstation on Windows, the info file filled out from this template needs to have the line endings converted to use Unix line endings (LF) instead of Windows line endings (CRLF). You can use Notepad++ to convert the line endings.
  • After upgrading an admin workstation with a static IP using gkeadm, you need to run ssh-keygen -R <admin-workstation-ip> to remove the IP from the known hosts, because the host identification changed after VM re-creation.
  • We have added Horizontal Pod Autoscaler for istio-ingress and istio-pilot deployments. HPA can scale up unnecessarily for istio-ingress and istio-pilot deployments during cluster upgrades. This happens because the metrics server is not able to report usage of some pods (newly created and terminating; for more information, see this Kubernetes issue). No actions are needed; scale down will happen five minutes after the upgrade finishes.
  • When running a preflight check for config.yaml that contains both admincluster and usercluster sections, the "data disk" check in the "user cluster vCenter" category might fail with the message: [FAILURE] Data Disk: Data disk is not in a folder. Use a data disk in a folder when using vSAN datastore. User clusters don't use data disks, and it's safe to ignore the failure.
  • When upgrading the admin cluster, the preflight check for the user cluster OS image validation will fail. The user cluster OS image is not used in this case, and it's safe to ignore the "User Cluster OS Image Exists" failure in this case.
  • A Calico-node pod might be stuck in an unready state after node IP changes. To resolve this issue, you need to delete any unready Calico-node pods.
  • The BIG-IP controller might fail to update F5 VIP after any admin cluster master IP changes. To resolve this, you need to use the admin cluster master node IP in kubeconfig and delete the bigip-controller pod from the admin master.
  • The stackdriver-prometheus-k8s pod could enter a crashloop after host failure. To resolve this, you need to remove any corrupted PersistentVolumes that the stackdriver-prometheus-k8s pod uses.
  • After node IP change, pods running with hostNetwork don't get podIP corrected until Kubelet restarts. To resolve this, you need to restart Kubelet or delete those pods using previous IPs.
  • An admin cluster fails after any admin cluster master node IP address changes. To avoid this, you should avoid changing the admin master IP address if possible by using a static IP or a non-expired DHCP lease instead. If you encounter this issue and need further assistance, please contact Google Support.
  • User cluster upgrade might be stuck with the error: Failed to update machine status: no matches for kind "Machine" in version "cluster.k8s.io/v1alpha1". To resolve this, you need to delete the clusterapi pod in the user cluster namespace in the admin cluster.

If your vSphere environment has fewer than three hosts, user cluster upgrade might fail. To resolve this, you need to disable antiAffinityGroups in the cluster config before upgrading the user cluster. For v1 config, please set antiAffinityGroups.enabled = false; for v0 config, please set usercluster.antiaffinitygroups.enabled = false.

Note: Disabling antiAffinityGroups in the cluster config during upgrade is only allowed for the 1.3.2 to 1.4.x upgrade to resolve the upgrade issue; the support might be removed in the future.

May 21, 2020

Workload Identity is now available in Alpha for GKE on-prem. Please contact support if you are interested in a trial of Workload Identity in GKE on-prem.

Preflight check for VM internet and Docker Registry access validation is updated.

Preflight check for internet validation is updated to not follow redirect. If your organization requires outbound traffic to pass through a proxy server, you no longer need to allowlist the following addresses in your proxy server:

  • console.cloud.google.com
  • cloud.google.com

The Ubuntu image is upgraded to include the newest packages.

Upgraded the Istio image to version 1.4.7 to fix a security vulnerability.

Some ConfigMaps in the admin cluster were refactored to Secrets to allow for more granular access control of sensitive configuration data.

April 23, 2020

Preflight check in gkeadm for access to the Cloud Storage bucket that holds the admin workstation OVA.

Preflight check for internet access includes additional URL www.googleapis.com.

Preflight check for test VM DNS availability.

Preflight check for test VM NTP availability.

Preflight check for test VM F5 access.

Before downloading and creating VM templates from OVAs, GKE on-prem checks if the VM template already exists in vCenter.

Rename gkeadm automatically created service accounts.

OVA download displays download progress.

gkeadm prepopulates bundlepath in the seed config on the admin workstation.

Fix for Docker failed DNS resolution on admin workstation at startup.

Admin workstation provisioned by gkeadm uses thin disk provisioning.

Improved user cluster Istio ingress gateway reliability.

Ubuntu image is upgraded to include newest packages.

Update the vCenter credentials for your clusters using the preview command gkectl update credentials vsphere.

The gkeadm configuration file, admin-ws-config.yaml, accepts paths that are prefixed with ~/ for the Certificate Authority (CA) certificate.

Test VMs wait until the network is ready before starting preflight checks.

Improve the error message in preflight check failure for F5 BIG-IP.

Skip VIP check in preflight check in manual load balancing mode.

Upgraded Calico to version 3.8.8 to fix several security vulnerabilities.

Upgraded F5 BIG-IP Controller Docker image to version 1.14.0 to fix a security vulnerability.

Fixed gkeadm admin workstation gcloud proxy username and password configuration.

Fixed the bug that was preventing gkectl check-config from automatically using the proxy that you set in your configuration file when running the full set of preflight validation checks with any GKE on-prem download image.

Fixed an admin workstation upgrade failure when the upgrade process was unable to retrieve SSH keys, which would cause a Golang segmentation fault.

April 01, 2020

When upgrading from version 1.2.2 to 1.3.0 by using the Bundle download in the alternate upgrade method, a timeout might occur that will cause your user cluster upgrade to fail. To avoid this issue, you must perform the full upgrade process that includes upgrading your admin workstation with the OVA file.

March 23, 2020

Anthos clusters on VMware 1.3.0-gke.16 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.0-gke.16 clusters run on Kubernetes 1.15.7-gke.32.

A new installer helps you create and prepare the admin workstation.

Support for vSAN datastore on your admin and user clusters.

In bundled load balancing mode, GKE on-prem provides and manages the Seesaw load balancer.

The Authentication Plugin for Anthos has been integrated into and replaced with the Google Cloud command-line interface, which improves the authentication process and provides the user consent flow through gcloud commands.

Added support for up to 100 nodes per user cluster.

The Cluster CA now signs the TLS certificates that the Kubelet API serves, and the TLS certificates are auto-rotated.

vSphere credential rotation is enabled. Users can now use Solution User Certificates to authenticate to GKE deployed on-prem.

gkectl automatically uses the proxy URL from config.yaml to configure the proxy on the admin workstation.

Preview Feature: Introducing User cluster Nodepools. A node pool is a group of nodes within a cluster that all have the same configuration. In GKE on-prem 1.3.0, node pools are a preview feature in the user clusters. This feature lets users create multiple node pools in a cluster, and update them as needed.

The metric kubelet_containers_per_pod_count is changed to a histogram metric.

Fixed an issue in the vSphere storage plugin that prevented vSphere storage policies from working. This is an example of how you might use this feature.

Prometheus + Grafana: two graphs on the Machine dashboard don't work because of missing metrics: Disk Usage and Disk Available.

All OOM events for containers trigger a SystemOOM event, even if they are container/pod OOM events. To check whether an OOM is actually a SystemOOM, check the kernel log for a message oom-kill:…. If oom_memcg=/ (instead of oom_memcg=/kubepods/…), then it's a SystemOOM. If it's not a SystemOOM, it's safe to ignore.

Affected versions: 1.3.0-gke.16

If you configured a proxy in the config.yaml and also used a bundle other than the full bundle (static IP | DHCP), you must append the --fast flag to run gkectl check-config. For example: gkectl check-config --config config.yaml --fast.

Running the 1.3 version of the gkectl diagnose command might fail if your clusters:

  • Are older than Anthos clusters on VMware version 1.3.
  • Include manually installed add-ons in the kube-system namespace.

February 21, 2020

GKE on-prem version 1.2.2-gke.2 is now available. To upgrade, see Upgrading GKE on-prem.

Improved gkectl check-config to validate any valid Google Cloud service accounts regardless of whether an IAM role is set.

You need to use vSphere provider version 1.15 when using Terraform to create the admin workstation. vSphere provider version 1.16 introduces breaking changes that would affect all Anthos versions.

Skip the preflight check when resuming cluster creation/upgrade.

Resolved a known issue of cluster upgrade when using a vSAN datastore associated with a GKE on-prem version before 1.2

Resolved the following warning when uploading an OS image with the enableMPTSupport configuration flag set. This flag is used to indicate whether the virtual video card supports mediated passthrough.

Warning: Line 102: Unable to parse 'enableMPTSupport' for attribute 'key' on element 'Config'.

Fixed the BigQuery API service name for the preflight check service requirements validation.

Fixed the preflight check to correctly validate the default resource pool in the case where the resourcepool field in the GKE on-prem configuration file is empty.

Fixed a comment about the workernode.replicas field in the GKE on-prem configuration file to say that the minimum number of worker nodes is three.

Fixed gktctl prepare to skip checking the data disk.

Fixed gktctl check-config so that it cleans up F5 BIG-IP resources on exit.

January 31, 2020

GKE on-prem version 1.2.1-gke.4 is now available. To upgrade, see Upgrading GKE on-prem.

This patch version includes the following changes:

Adds searchdomainsfordns field to static IPs host configuration file. searchdomainsfordns is an array of DNS search domains to use in the cluster. These domains are used as part of a domain search list.

Adds a preflight check that validates an NTP server is available.

gkectl check-config now automatically uploads GKE on-prem's node OS image to vSphere. You no longer need to run gkectl prepare before gkectl check-config.

Adds a --cleanup flag for gkectl check-config. The flag's default value is true.

Passing in --cleanup=false preserves the test VM and associated SSH keys that gkectl check-config creates for its preflight checks. Preserving the VM can be helpful for debugging.

Fixes a known issue from 1.2.0-gke.6 that prevented gkectl check-config from performing all of its validations against clusters in nested resource pools or the default resource pool.

Fixes an issue that caused F5 BIG-IP VIP validation to fail due to timing out. The timeout window for F5 BIG-IP VIP validation is now longer.

Fixes an issue that caused cluster upgrades to overwrite changes to add-on configurations.

Fixes the known issue from 1.2.0-gke.6 that affects routing updates due to the route reflector configuration.

January 28, 2020

Affected versions: 1.2.0-gke.6

In some cases, certain nodes in a user cluster fail to get routing updates from the route reflector. Consequently Pods on a node may not be able to communicate with Pods on other nodes. One possible symptom is a kube-dns resolution error.

To work around this issue, follow these steps to create a BGPPeer object in your user cluster.

Save the following BGPPeer manifest as full-mesh.yaml:

apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
  name: full-mesh
spec:
  nodeSelector: "!has(route-reflector)"
  peerSelector: "!has(route-reflector)" 

Create the BGPPeer in your user cluster:

kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] apply -f full-mesh.yaml

Verify that the full-mesh BGPPeer was created:

kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get bgppeer

The output shows full-mesh in the list of BGPPeers:

NAME            AGE
  full-mesh       61s
  gke-group-1     3d21h
  ...

This issue will be fixed in version 1.2.1.

January 03, 2020

Affected versions: 1.1.0-gke.6 and later

Starting with version 1.1.0-gke.6, the gkeconnect.proxy field is no longer in the GKE on-prem configuration file.

If you include gkeconnect.proxy in the configuration file, the gkectl check-config command can fail with this error:

[FAILURE] Config: Could not parse config file: error unmarshaling JSON: 
while decoding JSON: json: unknown field "proxy"

To correct this issue, remove gkeconnect.proxy from the configuration file.

In versions prior to 1.1.0-gke.6, the Connect Agent used the proxy server specified in gkeconnect.proxy. Starting with version 1.1.0-gke.6, the Connect Agent uses the proxy server specified in the global proxy field.

December 20, 2019

Warning: If you installed GKE on-prem versions before 1.2, and you use a vSAN datastore, you should contact Google Support before attempting an upgrade to 1.2.0-gke.6.

GKE on-prem version 1.2.0-gke.6 is now available. To upgrade, see Upgrading GKE on-prem.

This minor version includes the following changes:

The default Kubernetes version for cluster nodes is now version 1.14.7-gke.24 (previously 1.13.7-gke.20).

GKE on-prem now supports vSphere 6.7 Update 3. Read its release notes.

GKE on-prem now supports VMware NSX-T version 2.4.2.

Any user cluster, even your first use cluster, can now use a datastore that is separate from the admin cluster's datastore. If you specify a separate datastore for a user cluster, the user cluster nodes, PersistentVolumes (PVs) for the user cluster nodes, user control plane VMs, and PVs for the user control plane VMs all use the separate datastore.

Expanded preflight checks for validating your GKE on-prem configuration file before your create your clusters. These new checks can validate that your Google Cloud project, vSphere network, and other elements of your environment are correctly configured.

Published basic installation workflow. This workflow offers a simplified workflow for quickly installing GKE on-prem using static IPs.

Published guidelines for installing Container Storage Interface (CSI) drivers. CSI enables using storage devices not natively supported by Kubernetes.

Updated documentation for authenticating using OpenID Connect (OIDC) with the Anthos Plugin for Kubectl. GKE on-prem's OIDC integration is now generally available.

From the admin workstation, gcloud now requires that you log in to gcloud with a Google Cloud user account. The user account should have at least the Viewer IAM role in all Google Cloud projects associated with your clusters.

You can now create admin and user clusters separately from one another.

Fixes an issue that prevented resuming cluster creation for HA user clusters.

Affected versions: 1.1.0-gke.6, 1.2.0-gke.6

The stackdriver.proxyconfigsecretname field was removed in version 1.1.0-gke.6. GKE on-prem's preflight checks will return an error if the field is present in your configuration file.

To work around this, before you install or upgrade to 1.2.0-gke.6, delete the proxyconfigsecretname field from your configuration file.

Affected versions: 1.2.0-6-gke.6

In user clusters, Prometheus and Grafana get automatically disabled during upgrade. However, the configuration and metrics data are not lost. In admin clusters, Prometheus and Grafana stay enabled.

To work around this issue, after the upgrade, open monitoring-sample for editing and set enablePrometheus to true:

1.kubectl edit monitoring --kubeconfig [USER_CLUSTER_KUBECONFIG] \ -n kube-system monitoring-sample

2. Set the field enablePrometheus to true.

Affected versions: All versions

Before version 1.2.0-gke.6, a known issue prevents Stackdriver from updating its configuration after cluster upgrades. Stackdriver still references an old version, which prevents Stackdriver from receiving the latest features of its telemetry pipeline. This issue can make it difficult for Google Support to troubleshoot clusters.

After you upgrade clusters to 1.2.0-gke.6, run the following command against admin and user clusters:

kubectl --kubeconfig=[KUBECONFIG] \
-n kube-system --type=json patch stackdrivers stackdriver \
-p '[{"op":"remove","path":"/spec/version"}]'

where [KUBECONFIG] is the path to the cluster's kubeconfig file.

November 19, 2019

GKE On-Prem version 1.1.2-gke.0 is now available. To download version 1.1.2-gke.0's OVA, gkectl, and upgrade bundle, see Downloads. Then, see Upgrading admin workstation and Upgrading clusters.

This patch version includes the following changes:

New Features

Published Managing clusters.

Fixes

Fixed the known issue from November 5.

Fixed the known issue from November 8.

Known Issues

If you are running multiple data centers in vSphere, running gkectl diagnose cluster might return the following error, which you can safely ignore:

Checking storage...FAIL path '*' resolves to multiple datacenters

If you are running a vSAN datastore, running gkectl diagnose cluster might return the following error, which you can safely ignore:

PersistentVolume [NAME]: virtual disk "[[DATASTORE_NAME]] [PVC]" IS NOT attached to machine "[MACHINE_NAME]" but IS listed in the Node.Status

November 08, 2019

In GKE On-Prem version 1.1.1-gke.2, a known issue prevents creation of clusters configured to use a Docker registry. You configure a Docker registry by populating the GKE On-Prem configuration file's privateregistryconfig field. Cluster creation fails with an error such as Failed to create root cluster: could not create external client: could not create external control plane: docker run error: exit status 125

A fix is targeted for version 1.1.2. In the meantime, if you want to create a cluster configured to use a Docker registry, pass in the --skip-validation-docker flag to gkectl create cluster.

November 05, 2019

GKE On-Prem's configuration file has a field, vcenter.datadisk, which looks for a path to a virtual machine disk (VMDK) file. During installation, you choose a name for the VMDK. By default, GKE On-Prem creates a VMDK and saves it to the root of your vSphere datastore.

If you are using a vSAN datastore, you need to create a folder in the datastore in which to save the VMDK. You provide the full path to the field—for example, datadisk: gke-on-prem/datadisk.vmdk—and GKE On-Prem saves the VMDK in that folder.

When you create the folder, vSphere assigns the folder a universally unique identifier (UUID). Although you provide the folder path to the GKE On-Prem config, the vSphere API looks for the folder's UUID. Currently, this mismatch can cause cluster creation and upgrades to fail.

A fix is targeted for version 1.1.2. In the meantime, you need to provide the folder's UUID instead of the folder's path. Follow the workaround instructions currently available in the upgrading clusters and installation topics.

October 25, 2019

GKE On-Prem version 1.1.1-gke.2 is now available. To download version 1.1.1-gke.2's OVA, gkectl, and upgrade bundle, see Downloads. Then, see Upgrading admin workstation and Upgrading clusters.

This patch version includes the following changes:

New Features

Action required: This version upgrades the minimum gcloud version on the admin workstation to 256.0.0. You should upgrade your admin workstation. Then, you should upgrade your clusters.

The open source CoreOS toolbox is now included in all GKE On-Prem cluster nodes. This suite of tools is useful for troubleshooting node issues. See Debugging node issues using toolbox.

Fixes

Fixed an issue that prevented clusters configured with OIDC from being upgraded.

Fixed CVE-2019-11253 described in Security bulletins.

Fixed an issue that caused cluster metrics to be lost due to a lost connection to Google Cloud. When a GKE On-Prem cluster's connection to Google Cloud is lost for a period of time, that cluster's metrics are now fully recovered.

Fixed an issue that caused ingestion of admin cluster metrics to be slower than ingesting user cluster metrics.

Known Issues

For user clusters that are using static IPs and a different network than their admin cluster: If you overwrite the user cluster's network configuration, the user control plane might not be able to start. This occurs because it's using the user cluster's network, but allocates an IP address and gateway from the admin cluster.

As a workaround, you can update each user control plane's MachineDeployment specification to use the correct network. Then, delete each user control plane Machine, causing the MachineDeployment to create new Machines:

  1. List MachineDeployments in the admin cluster

    kubectl get machinedeployments --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]
    
  2. Update a user control plane MachineDeployment from your shell

    kubectl edit machinedeployment --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] [MACHINEDEPLOYMENT_NAME]
    
  3. List Machines in the admin cluster

    kubectl get machines --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]
    
  4. Delete user control plane Machines in the admin cluster

    kubectl delete machines --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] [MACHINE_NAME]
    

September 26, 2019

GKE On-Prem version 1.1.0-gke.6 is now available. To download version 1.1.0-gke.6's gkectl and upgrade bundle, see Downloads. Then, see Upgrading clusters.

This minor version includes the following changes:

The default Kubernetes version for cluster nodes is now version 1.13.7-gke.20 (previously 1.12.7-gke.19).

Action required: As of version 1.1.0-gke.6, GKE On-Prem now creates vSphere Distributed Resource Scheduler (DRS) rules for your user cluster's nodes (vSphere VMs), causing them to be spread across at least three physical hosts in your datacenter.

This feature is enabled by default for all new and existing user clusters running version 1.1.0-gke.6.

The feature requires that your vSphere environment meet the following conditions:

  • VMware DRS must be enabled. VMware DRS requires vSphere Enterprise Plus license edition. To learn how to enable DRS, see Creating a DRS Cluster.
  • The vSphere user account provided in your GKE On-Prem configuration file's vcenter field must have the Host.Inventory.EditCluster permission.
  • There are at least three physical hosts available.

If you do not want to enable this feature for your existing user clusters—for example, if you don't have enough hosts to accommodate the feature—perform the following steps before you upgrade your user clusters:

  1. Open your existing GKE On-Prem configuration file.
  2. Under the usercluster specification, add the antiaffinitygroups field as described in the antiaffinitygroups documentation: usercluster: ... antiaffinitygroups: enabled: false

  3. Save the file.

  4. Use the configuration file to upgrade. Your clusters are upgraded, but the feature is not enabled.

You can now set the default storage class for your clusters.

You can now use Container Storage Interface (CSI) 1.0 as a storage class for your clusters.

You can now delete broken or unhealthy user clusters with gkectl delete cluster --force

You can now diagnose node issues using the debug-toolbox container image.

You can now skip validatations run by gkectl commands.

The tarball that gkectl diagnose snapshot creates now includes a log of the command's output by default.

Adds gkectl diagnose snapshot flag --seed-config. When you pass the flag, it includes your clusters' GKE On-Prem configuration file in the tarball procduced by snapshot.

The gkeplatformversion field has been removed from the GKE On-Prem configuration file. To specify a cluster's version, provide the version's bundle to the bundlepath field.

You need to add the vSphere permission, Host.Inventory.EditCluster, before you can use antiaffinitygroups.

You now specify a configuration file in gkectl diagnose snapshot by passing the --snapshot-config (previously --config). See Diagnosing cluster issues.

You now capture your cluster's configuration file with gkectl diagnose snapshot by passing --snapshot-config (previously --config). See Diagnosing cluster issues.

gkectl diagnose commands now return an error if you provide a user cluster's kubeconfig, rather than an admin cluster's kubeconfig.

Cloud Console now notifies you when an upgrade is available for a registered user cluster.

A known issue prevents version 1.0.11, 1.0.1-gke.5, and 1.0.2-gke.3 clusters using OIDC from being upgraded to version 1.1. A fix is targeted for version 1.1.1. If you configured a version 1.0.11, 1.0.1-gke.5, or 1.0.2-gke.3 cluster with OIDC, you are not able to upgrade it. Create a version 1.1 cluster by following Installing GKE On-Prem.

August 22, 2019

GKE On-Prem version 1.0.2-gke.3 is now available. This patch release includes the following changes:

Seesaw is now supported for manual load balancing.

You can now specify a different vSphere network for admin and user clusters.

You can now delete user clusters using gkectl. See Deleting a user cluster.

gkectl diagnose snapshot now gets logs from the user cluster control planes.

GKE On-Prem OIDC specification has been updated with several new fields: kubectlredirecturl, scopes, extraparams, and usehttpproxy.

Calico updated to version 3.7.4.

Stackdriver Monitoring's system metrics prefixed changed from external.googleapis.com/prometheus/ to kubernetes.io/anthos/. If you are tracking metrics or alerts, update your dashbaords with the next prefix.

July 30, 2019

GKE On-Prem version 1.0.1-gke.5 is now available. This patch release includes the following changes:

New Features

Changes

gkectl check-config now also checks node IP availability if you are using static IPs.

gkectl prepare now checks if a VM exists and is marked as a template in vSphere before attempting to upload the VM's OVA image.

Adds support for specifying a vCenter cluster, and resource pool in that cluster.

Upgrades F5 BIG-IP controller to version 1.9.0.

Upgrades Istio ingress controller to version 1.2.2.

Fixes

Fixes registry data persistence issues with the admin workstation's Docker registry.

Fixes validation that checks whether a user cluster's name is already in use.

July 25, 2019

GKE On-Prem version 1.0.11 is now available.

June 17, 2019

GKE On-Prem is now generally available. Version 1.0.10 includes the following changes:

Upgrading from beta-1.4 to 1.0.10

Before upgrading your beta clusters to the first general availability version, perform the steps described in Installing GKE On-Prem, and review the following points:

  • If you are running a beta version before beta-1.4, be sure to upgrade to beta-1.4 first.

  • If your beta clusters are running their own L4 load balancers (not the default, F5 BIG-IP), you need to delete and recreate your clusters to run the latest GKE On-Prem version.

  • If your clusters were upgraded to beta-1.4 from beta-1.3, run the following command for each user cluster before upgrading:

    kubectl delete crd networkpolicies.crd.projectcalico.org

  • vCenter certificate verification is now required. (vsphereinsecure is no longer supported.) If you're upgrading your beta 1.4 clusters to 1.0.10, you need to provide a vCenter trusted root CA public certificate in the upgrade configuration file.

  • You need to upgrade all of your running clusters. For this upgrade to succeed, your clusters can't run in a mixed version state.

  • You need to upgrade your admin clusters to the latest version first, then upgrade your user clusters.

New Features

You can now enable the Manual load balancing mode to configure a L4 load balancer. You can still choose to use the default load balancer, F5 BIG-IP.

GKE On-Prem's configuration-driven installation process has been updated. You now declaratively install using a singular configuration file.

Adds gkectl create-config, which generates a configuration file for installing GKE On-Prem, upgrading existing clusters, and for creating additional user clusters in an existing installation. This replaces the installation wizard and create-config.yaml from previous versions. See the updated documentation for installing GKE On-Prem.

Adds gkectl check-config, which validates the GKE On-Prem configuration file. See the updated documentation for installing GKE On-Prem.

Adds an optional --validate-attestations flag to gkectl prepare. This flag verifies that the container images included in your admin workstationwere built and signed by Google and are ready for deployment. See the updated documentation for installing GKE On-Prem.

Changes

Upgrades Kubernetes version to 1.12.7-gke.19. You can now upgrade your clusters to this version. You can no longer create clusters that run Kubernetes version 1.11.2-gke.19.

We recommend upgrading your admin cluster before you upgrade your user clusters.

Upgrades Istio ingress controller to version 1.1.7.

vCenter certificate verification is now required. vsphereinsecure is no longer supported). You provide the certificate in the GKE On-Prem configration file's cacertpath field.

When a client calls the vCenter server, the vCenter server must prove its identity to the client by presenting a certificate. That certificate must be signed by a certificate authority (CA). The certificate is must not be self-signed.

If you're upgrading your beta 1.4 clusters to 1.0.10, you need to provide a vCenter trusted root CA public certificate in the upgrade configuration file.

Known Issues

Upgrading clusters can cause disruption or downtime for workloads that use PodDisruptionBudgets (PDBs).

You might not be able to upgrade beta clusters that use the Manual load balancing mode to GKE On-Prem version 1.0.10. To upgrade and continue using your own load balancer with these clusters, you need to recreate the clusters.

May 24, 2019

GKE On-Prem beta version 1.4.7 is now available. This release includes the following changes:

New Features

In the gkectl diagnose snapshot command, the --admin-ssh-key-path parameter is now optional.

Changes

On May 8, 2019, we introduced a change to Connect, the service that enables you to interact with your GKE On-Prem clusters using Cloud Console. To use the new Connect agent, you must re-register your clusters with Cloud Console, or you must upgrade to GKE On-Prem beta-1.4.

Your GKE On-Prem clusters and the workloads running on them will continue to operate uninterrupted. However, your clusters will not be visible in Cloud Console until you re-register them or upgrade to beta-1.4.

Before you re-register or upgrade, make sure your service account has the gkehub.connect role. Also, if your service account has the old clusterregistry.connect role, it's a good idea to remove that role.

Grant your service account the gkehub.connect role:

gcloud projects add-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/gkehub.connect"

If your service account has the old clusterregistry.connect role, remove the old role:

gcloud projects remove-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/clusterregistry.connect"

Re-register your cluster, or upgrade to GKE On-Prem beta-1.4.

To re-register your cluster:

gcloud alpha container hub register-cluster [CLUSTER_NAME] \
    --context=[USER_CLUSTER_CONTEXT] \
    --service-account-key-file=[LOCAL_KEY_PATH] \
    --kubeconfig-file=[KUBECONFIG_PATH] \
    --project=[PROJECT_ID]

To upgrade to GKE On-Prem beta-1.4:

gkectl upgrade --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]

Known Issues

There is an issue that prevents the Connect agent from being updated to the new version during an upgrade. To work around this issue, run the following command after you upgrade a cluster:

kubectl delete pod gke-connect-agent-install -n gke-connect

May 13, 2019

Known Issues

Clusters upgraded from version beta-1.2 to beta-1.3 might be affected by a known issue that damages the cluster's configuration file and prevents future cluster upgrades. This issue affects all future cluster upgrades.

You can resolve this issue by deleting and recreating clusters upgraded from beta-1.2 to beta-1.3.

To resolve the issue without deleting and recreating the cluster, you need to re-encode and apply each cluster's Secrets. Perform the following steps:

  1. Get the contents of the create-config Secrets stored in the admin cluster. This must be done for the create-config Secret in the kube-system namespace, and for the create-config Secrets in each user cluster's namespace:

    kubectl get secret create-config -n [USER_CLUSTER_NAME] -o jsonpath={.data.cfg} | base64 -d > [USER_CLUSTER_NAME]_create_secret.yaml

    For example:

    kubectl get secret create-config -n kube-system -o jsonpath={.data.cfg} | base64 -d > kube-system_create_secret.yaml

  2. For each user cluster, open the [USER_CLUSTER_NAME]_create_secret.yaml file in an editor.

    If the values for registerserviceaccountkey and connectserviceaccountkey are not REDACTED, no further action is required: the Secrets do not need to be re-encoded and written to the cluster.

  3. Open the original create_config.yaml file in another editor.

  4. In [USER_CLUSTER_NAME]_create_secret.yaml, replace the registerserviceaccountkey and connectserviceaccountkey values with the values from the original create_config.yaml file. Save the changed file.

  5. Repeat steps 2-4 for each [USER_CLUSTER_NAME]_create_secret.yaml, and for the kube-system_create_secret.yaml file.

  6. Base64-encode each [USER_CLUSTER_NAME]_create_secret.yaml file and the kube-system_create_secret.yaml file:

    cat [USER_CLUSTER_NAME]_create_secret.yaml | base64 > [USER_CLUSTER_NAME]_create_secret_create_secret.b64

    cat kube-system-cluster_create_secret.yaml | base64 > kube-system-cluster_create_secret.b64

  7. Replace the data[cfg] field in each Secret in the cluster with the contents of the corresponding file:

    kubectl edit secret create-config -n [USER_CLUSTER_NAME]
      # kubectl edit opens the file in the shell's default text editor
      # Open `first-user-cluster_create_secret.b64` in another editor, and replace
      # the `cfg` value with the copied value
      # Make sure the copied string has no newlines in it
    
  8. Repeat step 7 for each [USER_CLUSTER_NAME]_create_secret.yaml Secret, and for the kube-system_create_secret.yaml Secret.

  9. To ensure that the update was successful, repeat step 1.

May 07, 2019

GKE On-Prem beta version 1.4.1 is now available. This release includes the following changes:

New Features

In the gkectl diagnose snapshot command, the --admin-ssh-key-path parameter is now optional.

Changes

On May 8, 2019, we introduced a change to Connect, the service that enables you to interact with your GKE On-Prem clusters using Cloud Console. To use the new Connect agent, you must re-register your clusters with Cloud Console, or you must upgrade to GKE On-Prem beta-1.4.

Your GKE On-Prem clusters and the workloads running on them will continue to operate uninterrupted. However, your clusters will not be visible in Cloud Console until you re-register them or upgrade to beta-1.4.

Before your re-register or upgrade, make sure your service account has the gkehub.connect role. Also, if your service account has the old clusterregistry.connect role, it's a good idea to remove that role.

Grant your service account the gkehub.connect role:

gcloud projects add-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/gkehub.connect"

If your service account has the old clusterregistry.connect role, remove the old role:

gcloud projects remove-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/clusterregistry.connect"

Re-register you cluster, or upgrade to GKE On-Prem beta-1.4.

To re-register your cluster:

gcloud alpha container hub register-cluster [CLUSTER_NAME] \
    --context=[USER_CLUSTER_CONTEXT] \
    --service-account-key-file=[LOCAL_KEY_PATH] \
    --kubeconfig-file=[KUBECONFIG_PATH] \
    --project=[PROJECT_ID]

To upgrade to GKE On-Prem beta-1.4:

gkectl upgrade --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]

Known Issues

There is an issue that prevents the Connect agent from being updated to the new version during an upgrade. To work around this issue, run the following command after you upgrade a cluster:

kubectl delete pod gke-connect-agent-install -n gke-connect

April 25, 2019

GKE On-Prem beta version 1.3.1 is now available. This release includes the following changes:

New Features

The gkectl diagnose snapshot command now has a --dry-run flag.

The gkectl diagnose snapshot command now supports four scenarios.

The gkectl diagnose snapshot command now supports regular expressions for specifying namespaces.

Changes

Istio 1.1 is now the default ingress controller. The ingress controller runs in the gke-system namespace for both admin and user clusters. This enables easier TLS management for Ingress. To enable ingress, or to re-enable ingress after an upgrade, follow the instructions under Enabling ingress.

The gkectl tool no longer uses Minikube and KVM for bootstrapping. This means you do not have to enable nested virtualization on your admin workstation VM.

Known Issues

GKE On-Prem's ingress controller uses Istio 1.1 with automatic Secret discovery. However, the node agent for Secret discovery may fail to get Secret updates after Secret deletion. So avoid deleting Secrets. If you must delete a Secret and Ingress TLS fails afterwards, manually restart the Ingress Pod in the gke-system namespace.

April 11, 2019

GKE On-Prem beta version 1.2.1 is now available. This release includes the following changes:

New Features

GKE On-Prem clusters now automatically connect back to Google using Connect.

You can now run up to three control planes per user cluster.

Changes

gkectl now validates vSphere and F5 BIG-IP credentials creating clusters.

Known Issues

A regression causes gkectl diagnose snapshot commands to use the wrong SSH key, which prevents the command from collecting information from user clusters. As a workaround for support cases, you might need to SSH into individual user cluster nodes and manually gather data.

April 02, 2019

GKE On-Prem beta version 1.1.1 is now available. This release includes the following changes:

New Features

You now install GKE On-Prem with an Open Virtual Appliance (OVA), a pre-configured virtual machine image that includes several command-line interface tools. This change makes installations easier and removes a layer of virtualization. You no longer need to run gkectl inside a Docker container.

If you installed GKE On-Prem versions before beta-1.1.1, you should create a new admin workstation following the documented instructions. After you install the new admin workstation, copy over any SSH keys, configuration files, kubeconfigs, and any other files you need, from your previous workstation to the new one.

Added documentation for backing up and restoring clusters.

You can now configure authentication for clusters using OIDC and ADFS. To learn more, refer to Authenticating with OIDC and AD FS and Authentication.

Changes

You now must use an admin cluster's private key to run gkectl diagnose snapshot.

Added a configuration option during installation for deploying multi-master user clusters.

Connect documentation has been migrated.

Fixes

Fixed an issue where cluster networking could be interrupted when a node is removed unexpectedly.

Known Issues

GKE On-Prem's Configuration Management has been upgraded from version 0.11 to 0.13. Several components of the system have been renamed. You need to take some steps to clean up the previous versions' resources and install a new instance.

If you have an active instance of Configuration Management:

  1. Uninstall the instance:

    kubectl -n=nomos-system delete nomos --all

  2. Make sure that the instance's namespace has no resources:

    kubectl -n nomos-system get all

  3. Delete the namespace:

    kubectl delete ns nomos-system

  4. Delete the CRD:

    kubectl delete crd nomos.addons.sigs.k8s.io

  5. Delete all kube-system resources for the operator:

    kubectl -n kube-system delete all -l k8s-app=nomos-operator

If you don't have an active instance of Configuration Management:

  1. Delete the Configuration Management namespace:

    kubectl delete ns nomos-system

  2. Delete the CRD:

    kubectl delete crd nomos.addons.sigs.k8s.io

  3. Delete all kube-system resources for the operator:

    kubectl -n kube-system delete all -l k8s-app=nomos-operator

March 12, 2019

GKE On-Prem beta version 1.0.3 is now available. This release includes the following changes:

Fixes

Fixed an issue that caused Docker certificates to be saved to the wrong location.

March 04, 2019

GKE On-Prem beta version 1.0.2 is now available. This release includes the following changes:

New Features

You can now run gkectl version to check which version of gkectl you're running.

You can now upgrade user clusters to future beta versions.

Anthos Config Management version 0.11.6 is now available.

Stackdriver Logging is now enabled on each node. By default, the logging agent replicates logs to your GCP project for only control plane services, cluster API, vSphere controller, Calico, BIG-IP controller, Envoy proxy, Connect, Anthos Config Management, Prometheus and Grafana services, Istio control plane, and Docker. Application container logs are excluded by default, but can be optionally enabled.

Stackdriver Prometheus Sidecar captures metrics for the same components as the logging agent.

Kubernetes Network Policies are now supported.

Changes

You can now update IP blocks in the cluster specification to expand the IP range for a given cluster.

If clusters you installed during alpha were disconnected from Google after beta, you might need to connect them again. Refer to Registering a cluster.

Getting started has been updated with steps for activating your service account and running gkectl prepare.

gkectl diagnose snapshot now only collects configuration data and excludes logs.  This tool is used to capture details of your environment prior to opening a support case.

Support for optional SNAT pool name configuration for F5 BIG-IP at cluster-creation time. This can be used to configure "--vs-snat-pool-name" value on F5 BIG-IP controller.

You now need to provide a VIP for add-ons that run in the admin cluster.

Fixes

Cluster resizing operations improved to prevent unintended node deletion.

February 07, 2019

GKE On-Prem alpha version 1.3 is now available. This release includes the following changes:

New Features

During installation, you can now provide YAML files with nodeip blocks to configure static IPAM.

Changes

You now need to provision a 100GB disk in vSphere Datastore. GKE On-Prem uses the disk to store some of its vital data, such as etcd. See Data center requirements.

You can now only provide lowercase hostnames to nodeip blocks.

GKE On-Prem now enforces unique names for user clusters.

Metrics endpoints and APIs that use Istio endpoints are now secured using mTLS and role-based access control.

External communication by Grafana is disabled.

Improvements to Prometheus and Alertmanager health-checking.

Prometheus now uses secured port for scraping metrics.

Several updates to Grafana dashboards.

Known Issues

If your vCenter user account uses a format like DOMAINUSER, you might need to escape the backslash (DOMAIN\USER). Be sure to do this when prompted to enter the user account during installation.

January 23, 2019

GKE On-Prem alpha version 1.2.1 is now available. This release includes the following changes:

New Features

You can now use gkectl to delete admin clusters.

Changes

gkectl diagnose snapshot commands now allow you to specify nodes while capturing snapshots of remote command results and files.

January 14, 2019

GKE On-Prem alpha version 1.1.2 is now available. This release includes the following changes:

New Features

You can now use the gkectl prepare command to pull and push GKE On-Prem's container images, which deprecates the populate_registry.sh script.

gkectl prepare now prompts you to enter information about your vSphere cluster and resource pool.

You can now use the gkectl create command to create and add user clusters to existing admin control planes by passing in an existing kubeconfig file when prompted during cluster creation.

You can now pass in a Ingress TLS Secret for admin and user clusters at cluster creation time. You will see the following new prompt:

Do you want to use TLS for Admin Control Plane/User Cluster ingress?

Providing the TLS Secret and certs allows gkectl to set up the Ingress TLS. HTTP is not automatically disabled with TLS installation.

Changes

GKE On-Prem now runs Kubernetes version 1.11.2-gke.19.

The default footprint for GKE On-Prem has changed:

  • Minimum memory requirement for user cluster nodes is now 8192M.

GKE On-Prem now runs minikube version 0.28.0.

GKE Policy Management has been upgraded to version 0.11.1.

gkectl no longer prompts you to provide a proxy configuration by default.

There are three new ConfigMap resources in the user cluster namespace: cluster-api-etcd-metrics-config, kube-etcd-metrics-config, and kube-apiserver-config. GKE On-Prem uses these files to quickly bootstrap the metrics proxy container.

kube-apiserver events now live in their own etcd. You can see kube-etcd-events in your user cluster's namespace.

Cluster API controllers now use leader election.

vSphere credentials are now pulled from credential files.

gkectl diagnose commands now work with both admin and user clusters.

gkectl diagnose snapshot can now take snapshots of remote files on the node, results of remote commands on the nodes, and Prometheus queries.

gkectl diagnose snapshot can now take snapshots in multiple parallel threads.

gkectl diagnose snapshot now allows you to specify words to be excluded from the snapshot results.

Fixes

Fixed issues with minikube caching that caused unexpected network calls.

Fixed an issue with pulling F5 BIG-IP credentials. Credentials are now read from a credentials file instead of using environment variables.

Known Issues

You might encounter the following govmomi warning when you run gkectl prepare:

Warning: Line 102: Unable to parse 'enableMPTSupport' for attribute 'key' on element 'Config'

Resizing user clusters can cause inadvertent node deletion or recreation.

PersistentVolumes can fail to mount, producing the error devicePath is empty. As a workaround, delete and re-create the associated PersistentVolumeClaim.

Resizing IPAM address blocks if using static IP allocation for nodes, is not supported in alpha. To work around this, consider allocating more IP addresses than you currently need.

On slow disks, VM creation can timeout and cause deployments to fail. If this occurs, delete all resources and try again.

December 19, 2018

GKE On-Prem alpha 1.0.4 is now available. This release includes the following changes:

Fixes

The vulnerability caused by CVE-2018-1002105 has been patched.

November 30, 2018

GKE On-Prem alpha 1.0 is now available. The following changes are included in this release:

Changes

GKE On-Prem alpha 1.0 runs Kubernetes 1.11.

The default footprint for GKE On-Prem has changed:

  • The admin control plane runs three nodes, which use 4 CPUs and 16GB memory.
  • The user control plane runs one node that uses 4 CPUs 16GB memory.
  • User clusters run a minimum of three nodes, which use 4 CPUs and 16GB memory.

Support for high-availability Prometheus setup.

Support for custom Alert Manager configuration.

Prometheus upgraded from 2.3.2 to 2.4.3.

Grafana upgraded from 5.0.4 to 5.3.4.

kube-state-metrics upgraded from 1.3.1 to 1.4.0.

Alert Manager upgraded from 1.14.0 to 1.15.2.

node_exporter upgraded from 1.15.2 to 1.16.0.

Fixes

The vulnerability caused by CVE-2018-1002103 has been patched.

Known Issues

PersistentVolumes can fail to mount, producing the error devicePath is empty. As a workaround, delete and re-create the associated PersistentVolumeClaim.

Resizing IPAM address blocks if using static IP allocation for nodes, is not supported in alpha. To work around this, consider allocating more IP addresses than you currently need.

GKE On-Prem alpha 1.0 does not yet pass all conformance tests.

Only one user cluster per admin cluster can be created. To create additional user clusters, create another admin cluster.

October 31, 2018

GKE On-Prem EAP 2.1 is now available. The following changes are included in this release:

Changes

When you create admin and user clusters at the same time, you can now re-use the admin cluster's F5 BIG-IP credentials to create the user cluster. Also, the CLI now requires that BIG-IP credentials be provided; this requirement cannot be skipped using --dry-run.

F5 BIG-IP controller upgraded to use the latest OSS version, 1.7.0.

To improve stability for slow vSphere machines, cluster machine creation timeout is now 15 minutes (previously five minutes).

October 17, 2018

GKE On-Prem EAP 2.0 is now available. The following changes are included in this release:

Changes

Support for GKE Connect.

Support for Monitoring.

Support for installation using private registries.

Support for front-ending the L7 load-balancer as a L4 VIP on F5 BIG-IP.

Support for static IP allocation for nodes during cluster bootstrap.

Known Issues

Only one user cluster per admin cluster can be created. To create additional user clusters, create another admin cluster.

Cluster upgrades are not supported in EAP 2.0.

On slow disks, VM creation can timeout and cause deployments to fail. If this occurs, delete all resources and try again.

As part of the cluster bootstrapping process, a short-lived minikube instance is run. The minikube version used has security vulnerability CVE-2018-1002103.