Version 1.0. This version is no longer supported as outlined in the Anthos version support policy. For the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware (GKE on-prem), upgrade to a supported version. You can find the most recent version here.

Release notes

This page documents production updates to GKE On-Prem. You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

See also:

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud Console, or you can programmatically access release notes in BigQuery.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly: https://cloud.google.com/feeds/gkeonprem-release-notes.xml

November 30, 2021

Anthos clusters on VMware 1.7.6-gke.6 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.6-gke.6 runs on Kubernetes v1.19.15-gke.1900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

  • Fixed issue where special characters in the vSphere username are not properly escaped.
  • Alleviated the high CPU and memory usage by /etc/cron.daily/aide discussed in this issue.
  • Fixed issue where user cluster node is not synching time.
  • Fixed CVE-2021-41103. Because of Ubuntu PPA version pinning, this vulnerability may still be reported by certain vulnerability scanning tools, and appear as a false positive even though the underlying vulnerability has been patched.

November 29, 2021

Anthos clusters on VMware 1.8.5-gke.3 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.5-gke.3 runs on Kubernetes v1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

  • Fixed issue where special characters in the vSphere username are not properly escaped.
  • Alleviated the high CPU and memory usage by /etc/cron.daily/aide discussed in this issue.
  • Fixed issue where user cluster node is not synching time.
  • Fixed CVE-2021-41103. Because of Ubuntu PPA version pinning, this vulnerability may still be reported by certain vulnerability scanning tools, and appear as a false positive even though the underlying vulnerability has been patched.

November 18, 2021

Anthos clusters on VMware 1.9.2-gke.4 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.2-gke.4 runs on Kubernetes v1.21.5-gke.1200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

With version 1.9.2, cert-manager is installed in the cert-manager namespace. Previously, for versions 1.8.2 to 1.9.1, cert-manager was installed in the kube-system namespace.

The cert-manager version is upgraded from 1.0.3 to 1.5.4.

If you already use any ClusterIssuer with a different cluster resource namespace from the default cert-manager namespace, follow these steps if you upgrade to version 1.9.2.

   * Manually copy the related certificates, secrets, or issuers to the cert-manager namespace to use the installed cert-manager after upgrading to 1.9.2.    

   * If you need to use a different version of cert-manager, or if you need to install it in a different namespace, follow these instructions each time that you upgrade your cluster. 

Fixes:

  • Fixed issue with cilium-operator not reconciling CiliumNode for Windows nodes when updating the cluster to add Windows node pools.
  • Fixed issue which could temporarily result in no healthy CoreDNS pods present during cluster operations.
  • Fixed issue where you cannot run gkectl upgrade loadbalancer on a user cluster seesaw load balancer.
  • Fixed issue where node_filesystem metrics report gives wrong size for /run.
  • Fixed CVE-2021-37159. Because of Ubuntu PPA version pinning, this vulnerability may still be reported as false positive by certain vulnerability scanning tools, even though the underlying vulnerability has been patched in the 1.9.2 release.
  • Fixed issue where user cluster node is not synching time.
  • Alleviated the high CPU and memory usage by /etc/cron.daily/aide discussed in this issue.

October 29, 2021

The security community recently disclosed a new security vulnerability CVE-2021-30465 found in runc that has the potential to allow full access to a node filesystem.

For more information, see the GCP-2021-011 security bulletin.

October 27, 2021

Anthos clusters on VMware 1.8.4-gke.1 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.4-gke.1 runs on Kubernetes v1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

Fixes for version 1.8.4:

  • Fixed high-severity CVE-2021-3711.
  • Fixed gkectl check-config failure when Anthos clusters are configured with a proxy whose url contains special characters.
  • Fixed "cert-manager" cainjector leader-election failure.

Known issue in version 1.8.4:

If you have already installed your own cert-manager in your cluster, read the suggested mitigation before upgrading to a version >=1.8.2 in order to avoid an installation conflict with the cert-manager deployed by Anthos clusters on VMware.

  • Installing your cert-manager with Apigee may also result in a conflict with the cert-manager deployed by Anthos clusters on VMware. To avoid this, read the suggested mitigation before upgrading to this version.

Anthos clusters on VMware 1.7.5-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.5-gke.0 runs on Kubernetes v1.19.12-gke.2101.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

Fixes for version 1.7.5:

Fixed gkectl check-config failure when Anthos clusters are configured with a proxy whose url contains special characters.

October 21, 2021

A security issue was discovered in the Kubernetes ingress-nginx controller, CVE-2021-25742. Ingress-nginx custom snippets allow retrieval of ingress-nginx service account tokens and secrets across all namespaces. For more information, see the GCP-2021-024 security bulletin.

October 20, 2021

Anthos clusters on VMware 1.9.1-gke.6 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.1-gke.6 runs on Kubernetes v1.21.5-gke.400.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

  • In version 1.9.0, there was a known issue with restoring an admin cluster using a backup when using a private registry. That has been fixed in version 1.9.1.
  • Fixed gkectl check-config failure that occurs when Anthos clusters are configured with a proxy whose url contains special characters.
  • Fixed "cert-manager" cainjector leader-election failure.

If you have already installed your own cert-manager in your cluster, read the suggested mitigation before upgrading to a version >=1.8.2 in order to avoid an installation conflict with the cert-manager deployed by Anthos clusters on VMware.

  • Installing your cert-manager with Apigee may also result in a conflict with the cert-manager deployed by Anthos clusters on VMware. To avoid this, read the suggested mitigation before upgrading to this version.

October 04, 2021

A security vulnerability, CVE-2020-8561, has been discovered in Kubernetes where certain webhooks can be made to redirect kube-apiserver requests to private networks of that API server. For more information, see the GCP-2021-021 security bulletin.

September 29, 2021

Anthos clusters on VMware 1.9.0-gke.8 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.9.0-gke.8 runs on Kubernetes v1.21.4-gke.200.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.9, 1.8, and 1.7.

Features:

Cluster lifecycle Improvements:

  • GA: You can register an admin cluster during its creation by filling in the gkeConnect section in the admin cluster configuration file, similar to user cluster registration.

Platform enhancements:

  • Preview: User clusters can now be in a different vSphere datacenter from the admin cluster, resulting in datacenter isolation between the admin cluster and user clusters. This provides greater resiliency in the case of vSphere environment failures.

  • GA: Support for Windows node pools is generally available.This release adds:

    • Preview: Windows DataplaneV2 support, which allows for using Windows Network Policy
    • Node Problem Detector (NPD) support on Windows
    • Streamlined process for preparing Windows images in a private registry
    • Enhanced Flannel CNI support on Windows

    The upstream fixes for the "Windows Pod stuck at terminating status" error are also applied to this release, which improves the stability of running Windows workloads.

  • GA: Support for Container-Optimized OS (COS) node pools is generally available.

  • GA: CoreDNS is now the cluster DNS provider.

    • Clusters that are upgraded to 1.9 will have their KubeDNS provider replaced with CoreDNS. During the upgrade, CoreDNS is first deployed and then KubeDNS is removed, so applications should not observe DNS unavailability. However before upgrading, ensure that your cluster has enough additional resources to deploy CoreDNS. CoreDNS requires 100 millicpu and 170 MiB of memory per instance, all clusters require a minimum of 2 instances, and there is an additional instance deployed for every 16 nodes in the cluster.
    • You can configure cluster DNS options such as upstream name servers by using the new ClusterDNS custom resource.

Security enhancements:

  • GA: Always-on secrets encryption: You can enable secrets encryption with internally generated keys instead of a hardware security module (HSM). Use the gkectl update command to rotate these keys or to enable or disable secrets encryption after cluster creation.
  • Preview: Windows network policy support. This release introduces a new network plugin, Antrea, for Windows nodes. In addition to network connectivity and services support, it provides network policy support. When creating a user cluster, you can set enableWindowsDataplaneV2 to true to enable this feature. Enabling this feature replaces Flannel with Antrea on Windows nodes.
  • Preview: Azure AD group support for Authentication: This feature allows cluster admins to configure RBAC policies based on Azure AD groups for authorization in clusters. This supports retrieval of groups information for users belonging to more than 200 groups, thus overcoming a limitation of regular OIDC configured with Azure AD as the identity provider.

Simplify day-2 operations:

  • Preview: When creating a user cluster, you can set enableVMTracking in the configuration file to true to enable vSphere tag creation and attachment to the VMs in the user cluster. This allows easy mapping of VMs to clusters and node pools. See Enable VM tracking.
  • GA: New metrics agents based on open telemetry are introduced to improve reliability, scalability and resource usage.
  • Preview: You can enable or disable Stackdriver with gkectl update on existing user clusters. You can enable or disable cloud audit logging and monitoring with gkectl update on both admin and user clusters.

Breaking changes:

  • User cluster registration is now required and enforced. You must fill in the gkeConnect section of the user cluster configuration file before creating a new user cluster. You cannot upgrade a user cluster unless that cluster is registered. To unblock the cluster upgrade, add the gkeConnect section to the configuration file and run gkectl update cluster to register an existing 1.8 user cluster.

  • User clusters must be upgraded before the admin cluster. The flag --force-upgrade-admin to allow the old upgrade flow (admin cluster upgrade first) is no longer supported.

  • The following requirements are now enforced when you create a cluster that has logging and monitoring enabled.

    • The Config Monitoring for Ops API is enabled in your logging-monitoring project.
    • The Ops Config Monitoring Resource Metadata Writer role is granted to your logging-monitoring service account.
    • The URL opsconfigmonitoring.googleapis.com is added to your proxy allowlist (if applicable).

Changes:

  • There is now a checkpoint file for the admin cluster, located in the same datastore folder as the admin cluster data disk, with the name DATA_DISK_NAME-checkpoint.yaml, or DATA_DISK_NAME.yaml if the length of DATA_DISK_NAME is greater than the filename length limit. This file is required for future upgrades and should be considered as important as the admin cluster data disk.

    Note: If you have enabled VM encryption in vCenter, you must grant Cryptographer.Access permission to the vCenter credentials specified in your admin cluster configuration file, before trying to create or upgrade your admin cluster.

  • The admin cluster backup with gkectl preview feature introduced in 1.8 now allows updates to clusterBackup.datastore. This datastore may be different from vCenter.datastore so long as it is in the same datacenter as the cluster.

  • The k8s 1.21 release includes the following metrics changes:

    • Add new field status for storage_operation_duration_seconds, so that you can know about all status storage operation latency.
    • The storage metrics storage_operation_errors_total and storage_operation_status_count are marked deprecated. In both cases, the storage_operation_duration_seconds metric can be used to recover equivalent counts (using status=fail-unknown in the case of storage_operations_errors_total).

    • Rename the metric etcd_object_counts to apiserver_storage_object_counts and mark it as stable. The original etcd_object_counts metrics name is marked as "Deprecated" and will be removed in the future.

  • A new GKE on-prem control plane uptime dashboard is introduced with a new metric, kubernetes.io/anthos/container/uptime, for component availability. The old GKE on-prem control plane status dashboard and old kubernetes.io/anthos/up metric are deprecated. New alerts for admin cluster control plane components availability and user cluster control plane components availability are introduced with a new kubernetes.io/anthos/container/uptime metric to replace deprecated alerts and the old kubernetes.io/anthos/up metric.

  • You can now skip certain health checks performed by gkectl diagnose cluster with the –skip-validation-xxx flag.

Fixes:

  • Fixed the issue of gkeadm trying to set permissions for the component access service account when --auto-create-service-accounts=false.
  • Fixed the timeout issue for admin cluster creation or upgrade that was caused by high network latency to reach the container registry.
  • Fixed the gkectl create-config admin and gkectl create-config cluster panic issue in the 1.8.0-1.8.3 releases.
  • Fixed the /run/aide disk usage issue that was caused by the accumulated cron log for aide.

Restoring an admin cluster from a backup using gkectl repair admin-master –restore-from-backup fails when using a private registry. The issue will be resolved in a future release.

September 23, 2021

Anthos clusters on VMware 1.7.4-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.4-gke.2 runs on Kubernetes v1.19.12-gke.2101.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • Fixed high-severity CVE-2021-3711.
  • Fixed CVE-2021-25741 mentioned in the GCP-2021-018 security bulletin.
  • Fixed the Istio security vulnerabilities listed in the GCP-2021-016 security bulletin.
  • Fixed the issue that gkeadm tries to set permissions for the component access service account when --auto-create-service-accounts=false.

September 21, 2021

Anthos clusters on VMware 1.8.3-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.3-gke.0 runs on Kubernetes v1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • Fixed high-severity CVE-2021-3711.
  • Fixed CVE-2021-25741 mentioned in the GCP-2021-018 security bulletin.
  • Fixed the Istio security vulnerabilities listed in the GCP-2021-016 security bulletin.
  • Fixed the issue that gkeadm tries to set permissions for the component access service account when --auto-create-service-accounts=false.

In versions 1.8.0-1.8.3, the gkectl create-config admin/cluster command panics with the message panic: invalid version: "latest". As a workaround, use gkectl create-config admin/cluster --gke-on-prem-version=$DESIRED_CLUSTER_VERSION. Replace DESIRED_CLUSTER_VERSION with the desired version.

September 17, 2021

A security issue was discovered in Kubernetes, CVE-2021-25741, where a user may be able to create a container with subpath volume mounts to access files and directories outside of the volume, including on the host filesystem. For more information, see the GCP-2021-018 security bulletin.

September 16, 2021

Anthos clusters on VMware 1.6.5-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.5-gke.0 runs on Kubernetes 1.18.20-gke.4501.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

September 03, 2021

Anthos clusters on VMware 1.7.3-gke.6 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.3-gke.X runs on Kubernetes v1.19.12-gke.1100

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • Fixed the Ubuntu user password expiration issue. This is a required fix for customers running 1.7.2 or 1.7.3-gke.2. Either use the suggested workaround to fix this issue, or upgrade to get this fix.

  • Fixed the issue that the stackdriver-log-forwarder pod was sometimes in crashloop because of fluent-bit segfault.

August 31, 2021

Anthos clusters on VMware 1.8.2-gke.11 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.2-gke.11 runs on Kubernetes 1.20.9-gke.701.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Starting from version 1.8.2, Anthos clusters on VMware uses cert-manager instead of Istio Citadel for issuing TLS certificates used by metrics endpoints.

Fixes:

  • Fixed the Ubuntu user password expiration issue. You must get this fix. Either use the suggested workaround to fix this issue, or upgrade to get this fix.
  • Enhanced the admin cluster upgrade logic to prevent the admin cluster state (that is, the admin master data disk) from being lost in those cases when the disk is renamed or migrated accidentally.
  • Fixed the issue that the GKE connect-register service account key is printed in the klog in 1.8.0 and 1.8.1 when users run gkectl update cluster to update the GKE connect spec, such as to register an existing user cluster.
  • Fixed issue that when ESXi hosts were unavailable in the vCenter cluster (such as when disconnected from vCenter or in maintenance mode), the Cluster API controller and cluster health controllers would crash loop, and the gkectl diagnose cluster command would crash.
  • Fixed the issue that an admin cluster upgrade might be blocked indefinitely if admin node machines are upgraded before the new Cluster API controller is ready.
  • Fixed the issue that the onprem-user-cluster-controller might leak vCenter sessions over time.

  • Fixed the issue that the gateway IP was assigned to a Windows Pod, which made it unable to have network connectivity.

  • Fixed CVE-2021-33909 and CVE-2021-33910 on Ubuntu and COS.

HPA with custom metrics doesn't work in version 1.8.2 due to the migration from Istio to cert-manager for the monitoring pipeline. Customers using the HPA custom metrics with the monitoring pipeline should wait for a future release that will include this fix.

August 09, 2021

Anthos clusters on VMware 1.7.3-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.3-gke.2 runs on Kubernetes 1.19.12-gke.1100.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • These security vulnerabilities have been fixed: CVE-2021-3520, CVE-2021-33909, and CVE-2021-33910.

  • Fixed the issue that the /etc/cron.daily/aide` script uses up all existing space in /run, causing a crashloop in Pods.

  • Fixed the issue that admin cluster upgrade may fail due to an expired front-proxy-client certificate on the admin control plane node.

August 05, 2021

Anthos clusters on VMware 1.6.4-gke.7 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.4-gke.7 runs on Kubernetes 1.18.20-gke.2900.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • These security vulnerabilities have been fixed: CVE-2021-3520, CVE-2021-33909, and CVE-2021-33910.

  • Fixed the issue that admin cluster upgrade may fail due to an expired front-proxy-client certificate on the admin control plane node.

July 22, 2021

Anthos clusters on VMware 1.8.1-gke.7 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.1-gke.7 runs on Kubernetes v1.20.8-gke.1500.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Fixes:

  • The issue that the etc/cron.daily/aide script uses up all existing space in /run, causing a crashloop in Pods, has been fixed. The files located under /run/aide/ will be cleaned up periodically.
  • If you use the gkectl upgrade loadbalancer to attempt to update some parameters of the Seesaw load balancer in version 1.8.0, this will not work in either DHCP or IPAM mode. If your setup includes this configuration, do not upgrade to version 1.8.0, but instead to version 1.8.1 or later. If you are already at version 1.8.0, you can upgrade to 1.8.1 first before updating any parameters. See Upgrading Seesaw load balancer with version 1.8.0.
  • For Windows nodes, fixed an issue by adding a step for automatically detecting the network interface name instead of hard-coding it, since this name might be different depending on the network adapter being used in the base VM template.
  • Fixed an issue for building a Windows VM template that avoids retrying the VM shutdown in the gkectl prepare windows command, as this retrying caused the command to be stuck for a long time.
  • Fixed an issue where snapshot.storage.k8s.io/v1 resources were rejected by the snapshot admission webhook.
  • The CVE-2021-3520 security vulnerability has been fixed. 

July 08, 2021

Anthos clusters on VMware 1.8.0-gke.25 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.0-gke.25 runs on Kubernetes v1.20.5-gke.1301.

Fixes:

Fixed CVE-2021-34824 that could expose private keys and certificates from Kubernetes secrets through the credentialName field when using Gateway or DestinationRule. This vulnerability affects all clusters created or upgraded with Anthos clusters on VMware version 1.8.0.21. For more information, see the GCP-2021-012 security bulletin.

July 07, 2021

Anthos clusters on VMware 1.8.0-gke.25 is now available to resolve this issue.

The Istio project recently disclosed a new security vulnerability, CVE-2021-34824, affecting Istio. Istio contains a remotely exploitable vulnerability where credentials specified in the credentialName field for Gateway or DestinationRule can be accessed from different namespaces.

For more information, see the GCP-2021-012 security bulletin.

June 28, 2021

Anthos clusters on VMware 1.8.0-gke.21 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.8.0-gke.21 runs on Kubernetes v1.20.5-gke.1301.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.8, 1.7, and 1.6.

Cluster lifecycle improvements:

You should no longer use gcloud to unregister a user cluster, because clusters are registered automatically. Instead, register existing user clusters by using gkectl update cluster. You can also use gkectl update cluster to consolidate out-of-band registration that was done using gcloud. For more information, see Cluster registration.

Platform enhancements:

  • Preview: Cluster autoscaling is now available in preview. With cluster autoscaling, you can horizontally scale node pools in proportion to workload demand. When demand is high, the cluster autoscaler adds nodes to the node pool. When demand is low, the cluster autoscaler removes nodes from the node pool, scaling back down to a minimum size that you designate. Cluster autoscaling can increase the availability of your workloads while controlling costs.

  • Preview: User cluster control-plane node and admin cluster add-on node auto sizing are now available in preview. The features can be enabled separately in user cluster or admin cluster configurations. When you enable user cluster control-plane node auto sizing, user cluster control-plane nodes are automatically resized in proportion to the number of node pool nodes in the given user cluster. When you enable admin cluster add-on node auto sizing, admin cluster add-on nodes are automatically resized in proportion to the number nodes in the admin cluster.

  • Preview: Windows Server container support for Anthos clusters on VMware is now available in preview. This allows you to modernize and run your Windows-based apps more efficiently in your data centers without having to go through risky application rewrites. You can use Windows containers alongside Linux containers for your container workloads. The same experience and benefits that you have come to enjoy with Anthos clusters on VMware using Linux--application portability, consolidation, cost savings, and agility--can now be applied to Windows Server applications also.

  • Preview: Admin cluster backup is now available in preview. With this feature enabled, admin cluster backups are automatically performed before and after user and admin cluster creation, update, and upgrade. A new gkectl backup admin command performs manual backup. Upon admin cluster storage failure, you can restore the admin cluster from a backup with the gkectl repair admin-cluster --restore-from-backup command.

Security enhancements:

  • The Ubuntu node image is qualified with the CIS (Center for Internet Security) L1/L2 Server Benchmark.

  • Generally available: Workload identity support is now generally available. For more information, see Fleet workload identity. The connect-agent service account key is no longer required during installation. The connect agent uses workload identity to authenticate to Google Cloud instead of an exported Google Cloud service account key.

  • You can now use gkectl to rotate system root CA certificates for user clusters.

  • You can now use gkectl to update vCenter CA certificates for both admin clusters and user clusters.

Network feature enhancements:

Preview: Egress NAT gateway is now available in preview. To be able to access off-cluster workloads, traffic originating within the cluster that is related to specific flows must have deterministic source IP addresses. Egress NAT gateway gives you fine-grained control over which traffic gets a deterministic source IP address, and then provides that address. The Egress NAT Gateway functionality is built on top of Dataplane V2.

Storage enhancements:

  • The Anthos vSphere CSI driver now supports both offline and online volume expansion for dynamically and statically created block volumes only.

    • Offline volume expansion is available in vSphere 7.0 and later. Online expansion is available in vSphere 7.0u2 and later.

    • The vSphere CSI driver StorageClass standard-rwo, which is installed in user clusters automatically, sets allowVolumeExpansion to true by default for newly created clusters running on vSphere 7.0 or later. You can use both online and offline expansion for volumes using this StorageClass.

  • The volume snapshot feature now supports v1 versions of VolumeSnapshot, VolumeSnapshotContent, and VolumeSnapshotClass objects. The v1beta1 versions are deprecated and will soon stop being served.

Simplify day-2 operations:

  • You can now use Anthos Identity Service (AIS) and OpenID Connect (OIDC) for authentication to admin clusters in addition to user clusters.

  • Preview: Anthos Identity Service can now resolve groups with Okta as identity provider. This allows administrators to write RBAC policy with Okta groups.

  • Preview: Anthos Identity service now supports LDAP authentication methods in addition to OIDC. You can use AIS with Microsoft Active Directory without the need for provisioning Active Directory Federation Services.

  • The Anthos metadata agent replaces the original metadata agent to collect and send Anthos metadata to Google Cloud Platform, so that Google Cloud Platform can use this metadata to build a better user interface for Anthos clusters. You must 1) enable the Config Monitoring for Ops API in your logging-monitoring project, 2) grant the Ops Config Monitoring Resource Metadata Writer role to your logging-monitoring service account, and 3) add opsconfigmonitoring.googleapis.com to your proxy allowlist (if applicable).

  • You can use gkectl diagnose snapshot --upload-to [GCS_BUCKET] --service-account-key-file [SA_KEY_FILE] to automatically upload snapshots to a Google Cloud Storage (GCS) bucket. The provided service account must have the roles/storage.admin IAM role enabled.

Functionality changes:

  • The admin cluster now uses containerd on all nodes, including the admin cluster control-plane node, admin cluster add-on nodes, and user cluster control-plane nodes. This applies to both new admin clusters and existing admin clusters upgraded from 1.7.x. On user cluster node pools,  containerd is the default container runtime for new node pools, but existing node pools that are upgraded from 1.7.x will continue using Docker Engine. You can continue to use Docker Engine for a new node pool by setting its osImageType to ubuntu.

  • A new ubuntu_containerd OS image type is introduced. ubuntu_containerd uses an identical OS image as ubuntu, but the node is configured to use containerd as the container runtime instead. The ubuntu_containerd OS is used for new node pools by default, but existing node pools upgraded from 1.7.x continue using Docker Engine. Docker Engine support will be removed in Kubernetes 1.24, and you should start converting your node pools to ubuntu_containerd as soon as possible.

  • When installing or upgrading to 1.8.0-gke.21 on a vCenter with a vSphere version older than 6.7 Update 3, you may receive a notification. Note that vSphere versions older than 6.7 Update 3 will no longer be supported in Anthos clusters on VMware in an upcoming version.

  • The create-config Secret is removed in both the admin and the user clusters. If you previously relied on workarounds that modify the secret(s), contact Cloud Support for updates.

  • You can update the CPU and memory configuration for the user cluster control-plane node with gkectl update cluster.

  • You can configure the CPU and memory configurations for the admin control-plane node to non-default settings during admin cluster creation through the newly introduced admin cluster configuration fields.

  • Node auto repairs are throttled at the node pool level. The number of repairs per hour for a node pool is limited to the either 3, or 10% of the number of nodes in the node pool, whichever is greater.

  • Starting from Kubernetes 1.20, timeouts on exec probes are honored, and default to one second if unspecified. If you have Pods using exec probes, ensure they can easily complete in one second or explicitly set an appropriate timeout. See Configure Probes for more details.

  • Starting from Kubernetes 1.20, Kubelet no longer creates the target_path for NodePublishVolume in accordance with the CSI spec. If you have self-managed CSI drivers deployed in your cluster, ensure they are idempotent and do any necessary mount creation/verification. See Kubernetes issue #88759 for details.

  • Non-deterministic treatment of objects with invalid ownerReferences was fixed in Kubernetes 1.20. You can run the kubectl-check-ownerreferences tool prior to upgrade to locate existing objects with invalid ownerReferences. The metadata.selfLink field, deprecated since Kubernetes 1.16, is no longer populated in Kubernetes 1.20. See Kubernetes issue #1164 for details.

Breaking changes:

  • The Istio components have been upgraded to handle ingress support. Previously, using HTTPS for ingress required both an Istio Gateway and Kubernetes Ingress. With this release, the full ingress spec is natively supported. See Ingress migration to manage this upgrade for Istio components.

  • The Cloud Run for Anthos user cluster configuration option is no longer supported. Cloud Run for Anthos is now installed as part of registration with a fleet. This allows for configuring and upgrading Cloud Run separately from Anthos clusters on VMware. To upgrade to the newest version of Cloud Run for Anthos, see Installing Cloud Run for Anthos.

Fixes:

  • Previously, the admin cluster upgrade could be affected by the expired front-proxy-client certificate that persists in the data disk for the admin cluster control-plane node. Now the front-proxy-client certificate is renewed during an upgrade.

  • Fixed an issue where logs are sent to the parent project of the service account specified in the stackdriver.serviceAccountKeyPath field of your cluster configuration file while the value of stackdriver.projectID is ignored.

  • Fixed an issue that Calico-node Pods sometimes use an excessive amount of CPU in large-scale clusters.

The stackdriver-metadata-agent-cluster-level-* Pod might have logs that look like this:

reflector.go:131] third_party/golang/kubeclient/tools/cache/reflector.go:99: Failed to list *unstructured.Unstructured: the server could not find the requested resource

You can safely ignore these logs.

June 17, 2021

When you upgrade an unregistered Anthos cluster on VMware from a version earlier than 1.7.0 to a version 1.7.0 or later, you need to manually install and configure the Anthos Config Management operator. If you had previously installed Anthos Config Management, you need to re-install it. For details on how to do this, see Installing Anthos Config Management.

If you are using a private registry for software images, upgrading an Anthos cluster on VMware will always require special steps, described in Updating Anthos Config Management using a private registry. Upgrading from a version earlier than 1.7.0 to a version 1.7.0 or later additionally requires that you manually install and configure the Anthos Config Management operator as described in Installing Anthos Config Management.

June 08, 2021

Anthos clusters on VMware 1.5.4-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.5.4-gke.2 runs on Kubernetes v.1.17.9-gke.4400. The supported versions that offer the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

Fixes

These security vulnerabilities have been fixed:

Fixed CVE-2021-25735 mentioned in the GCP-2021-003 Security Bulletin, CVE-2021-31535, and other medium and low vulnerability CVEs with fixes available.

June 07, 2021

Anthos clusters on VMware 1.6.3-gke.3 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.3-gke.3 runs on Kubernetes v1.18.18-gke.100. The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

Fixes

These security vulnerabilities have been fixed:

Fixed CVE-2021-25735 mentioned in the GCP-2021-003 Security Bulletin, CVE-2021-31535, and other medium and low vulnerability CVEs with fixes available.

May 27, 2021

Anthos clusters on VMware 1.7.2-gke.2 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.2-gke.2 runs on Kubernetes 1.19.10-gke.1602.

The supported versions that offer the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

The Ubuntu node image shipped in version 1.7.2 is qualified with the CIS (Center for Internet Security) L1 Server Benchmark.

Fixes:

An admin cluster upgrade may fail due to an expired front-proxy-client certificate on the admin control plane node. Make sure that the certificate is not expired, and recreate it if needed. See: Renew an expired certificate.

May 21, 2021

In Anthos clusters on VMware 1.7, logs are sent to the parent project of your logging-monitoring service account. That is, logs are sent to the parent project of the service account specified in the stackdriver.serviceAccountKeyPath field of your cluster configuration file. The value of stackdriver.projectID is ignored. This issue will be fixed in an upcoming release.

As a workaround, view logs in the parent project of your logging-monitoring service account.

May 20, 2021

In version 1.7.1, the stackdriver-log-forwarder starts to consume significantly increasing memory after a period of time, and the logs show an excessive number of OAuth 2.0 token requests. Follow these steps to mitigate this issue.

May 11, 2021

A recently discovered vulnerability, CVE-2021-31920, affects Istio in respect to its authorization policies. Istio contains a remotely exploitable vulnerability where an HTTP request with multiple slashes or escaped slash characters can bypass Istio authorization policy when path-based authorization rules are used. While Anthos clusters on VMware uses an Istio Gateway object for network ingress traffic into clusters, authorization policies are not a supported or intended use case for Istio as part of the Anthos clusters on VMware prerequisites. For more details, refer to the Istio security bulletin.

May 06, 2021

The Envoy and Istio projects recently announced several new security vulnerabilities (CVE-2021-28683, CVE-2021-28682, and CVE-2021-29258) that could allow an attacker to crash Envoy.

For more information, see the GCP-2021-004 security bulletin.

May 05, 2021

Anthos clusters on VMware 1.7.1-gke.4 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.1-gke.4 runs on Kubernetes 1.19.7-gke.2400.

The supported versions that offer the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.7, 1.6, and 1.5.

If you upgrade the admin cluster before you upgrade the associated user clusters within the same minor version, such as from 1.7.0 to 1.7.1, the user cluster control-planes will be upgraded together with the admin cluster. This applies even if you use the flag --force-upgrade-admin. This behavior, in versions 1.7.0 and later, is different from versions 1.6 and earlier, and is expected behavior.

Fixes:

  • Fixed a bug, so that the hardware version of a virtual machine is determined based on the ESXi host apiVersion instead of the host version. When host ESXi apiVersion is at least 6.7U2, VMs with version vmx-15 are created. Also, the CSI preflight checks validate the ESXi host API version instead of the host version.

  • Fixed a bug, so that if vSphereCSIDisabled is set to true, Container Storage Interface (CSI) preflight checks do not run when you execute commands such as gkectl check-config or create loadbalancer or create cluster.

  • Fixed CVE-2021-3444, CVE-2021-3449, CVE-2021-3450, CVE-2021-3492, CVE-2021-3493, and CVE-2021-29154 on the Ubuntu operating system used by the admin workstation, cluster nodes, and Seesaw.

  • Fixed a bug where attempting to install or upgrade GKE on-prem 1.7.0 failed with an "/STSService/ 400 Bad Request" when the vCenter is installed with the external platform services controller. Installations where the vCenter server is a single appliance are not affected. Note that VMware deprecated the external platform services controller in 2018.

  • Fixed a bug where auto repair failed to trigger for unhealthy nodes if the cluster-health-controller was restarted while a previously issued repair was in progress.

  • Fixed a bug so that the command gkectl diagnose snapshot output includes the list of containers and the containerd daemon log on Container-Optimized OS (COS) nodes.

  • Fixed a bug that caused gkectl update admin to generate an InternalFields diff unexpectedly.

  • Fixed the issue that the stackdriver-log-forwarder pod was sometimes in crashloop because of fluent-bit segfault.

April 20, 2021

The Kubernetes project recently announced a new security vulnerability, CVE-2021-25735, that could allow node updates to bypass a Validating Admission Webhook. For more details, see the GCP-2021-003 security bulletin.

March 25, 2021

Anthos clusters on VMware 1.7.0-gke.16 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.7.0-gke.16 runs on Kubernetes 1.19.7-gke.2400.

The supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting GKE On-Prem are 1.6, 1.5, and 1.4.

Cluster lifecycle improvements

  • The cluster upgrade process has changed. Instead of upgrading the admin cluster first, you can upgrade user clusters to the newer version without upgrading the admin cluster. The new flow, which requires upgrading gkeadm, allows you to preview new features before performing a full upgrade with the admin cluster. In addition, the 1.7.0 version of gkectl can perform operations on both 1.6.X and 1.7.0 clusters.

  • Starting with version 1.7.0, you can deploy Anthos clusters on vSphere 7.0 environments in addition to vSphere 6.5 and 6.7. Note that Anthos clusters on VMware will phase out vSphere 6.5 support following VMware end of general support timelines.

  • Published the minimum hardware resource requirements for a proof-of-concept cluster.

Platform enhancements

  • GA: Node auto repair is now generally available and enabled by default for newly created clusters. When the feature is enabled, cluster-health-controller performs periodic health checks, surfaces problems as events on cluster objects, and automatically repairs unhealthy nodes.

  • GA: vSphere resource metrics is now generally available and enabled by default for newly created clusters. When the feature is enabled, VM level resource contention metrics are collected and displayed in the VM health dashboards automatically created through out-of-the-box monitoring. You can use these dashboards to track VM resource contention issues.

  • GA: Dataplane V2 is now generally available and can be enabled in newly created clusters.

  • GA: Network Policy Logging is now generally available. Network policy logging is available only for clusters running Dataplane V2.

  • You can attach vSphere tags to user cluster node pools during cluster creation and update. You can use tags to organize and select VMs in vCenter.

Security enhancements:

  • Preview: You can run Container-Optimized OS on your user cluster worker nodes.

Simplify Day-2 operations:

  • GA: Support for vSphere folders is now generally available. This allows you to install Anthos clusters on VMware in a vSphere folder, reducing the scope of the permission required for the vSphere user.

  • A new gkectl update admin command supports updating certain admin cluster configurations including adding static IP addresses.

  • The central log aggregator component has been removed from the logging pipeline to improve reliability, scalability and resource usage.

  • Cluster scalability has been improved:

    • 50 user clusters per admin cluster

    • With Seesaw, 500 nodes, 15,000 Pods, and 500 LoadBalancer Services per user cluster

    • With F5 BIG-IP, 250 nodes, 7,500 Pods, and 250 LoadBalancer Services per user cluster

Anthos Config Management:

Anthos Config Management (ACM) is now decoupled from Anthos clusters on VMware. This provides multiple benefits including decoupling the ACM release cadence from Anthos clusters on VMware, simplifying the testing and qualification process, and providing a consistent installation and upgrade flow.

Storage enhancements:

GA: The vSphere CSI driver is now generally available. Your vCenter server and ESXi hosts must both be running 6.7 update 3 or newer. The preflight checks and gkectl diagnose cluster have been enhanced to cover the CSI prerequisites.

Functionality changes:

  • gkectl diagnose cluster now includes validation load balancing, including F5, Seesaw, and manual mode.

  • gkectl diagnose snapshot now provides an HTML index file in the snapshot, and collects extra container information from the admin cluster control-plane node when the Kubernetes API server is inaccessible.

  • gkectl update admin has been updated to:

    • Enable or disable auto repair in the admin cluster
    • Add static IP addresses to the admin cluster
    • Enable/disable vSphere resource metrics in the admin cluster
  • gkectl update cluster has been enhanced to enable or disable vSphere resource metrics in a user cluster.

  • Given that we no longer need an allowlisted service account in the admin workstation configuration file, we deprecated the gcp.whitelistedServiceAccountKeyPath field and added a new gcp.componentAccessServiceAccountKeyPath field. For consistency, we also renamed the corresponding gcrKeyPath field in the admin cluster configuration file.

Breaking changes:

  • The following Google Cloud API endpoints must be allowlisted in network proxies and firewalls. These are now required for Connect Agent to authenticate to Google when the cluster is registered in Hub:

    • securetoken.googleapis.com
    • sts.googleapis.com
    • Iamcredentials.googleapis.com
  • gkectl now accepts only v1 cluster configuration files. For instructions on converting your v0 configuration files, see Converting configuration files.

Fixes:

  • Fixed a bug where Grafana dashboards based on the container_cpu_usage_seconds_total metric show no data.

  • Fixed an issue where scheduling Stackdriver components on user cluster control-plane nodes caused resource contention issues.

  • Fixed Stackdriver Daemonsets to tolerate NoSchedule and NoExecute taints.

  • Fixed an HTTP/2 connection issue that sometimes caused problems with connections from the kubelet to the Kubernetes API server. This issue also could lead to nodes becoming not ready.

Known issues:

  • Calico-node Pods sometimes use an excessive amount of CPU in large-scale clusters. You can mitigate the issue by killing such Pods.

  • When running gkectl update admin against a cluster upgraded from 1.6, you might get the following diff:

    - InternalFields: nil,
    - InternalFields: map[string]string{"features.onprem.cluster.gke.io/bundle- 
    vsphere-credentials": "enabled"},
    

    You can safely ignore this and proceed with the update.

February 26, 2021

Anthos clusters on VMware (GKE on-prem) 1.6.2-gke.0 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.2-gke.0 clusters run on Kubernetes 1.18.13-gke.400.

Fixed in 1.6.2-gke.0:

  • Fixed a kubelet restarting issue that was found when running workloads that rely on kubectl exec/port-forward/attach, such as Jenkins.

  • Fixed CVE-2021-3156 in the node operating system image. CVE-2021-3156 is described in Security bulletins.

GKE on-prem 1.4.5-gke.0 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.5-gke.0 clusters run on Kubernetes 1.16.11-gke.11.

Fixed in 1.4.5-gke.0:

January 27, 2021

Anthos clusters on VMware (GKE on-prem) 1.6.1-gke.1 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.1-gke.1 clusters run on Kubernetes 1.18.13-gke.400.

Fixes:

  • Fixed a bug where the user cluster upgrade is blocked if the vcenter resource pool is neither directly nor indirectly specified (that is, if the vcenter resource pool is inherited and is the one used by the admin cluster) in the configs.
  • Fixed CVE-2020-15157 and CVE-2020-15257 in containerd.
  • Fixed an issue where upgrading the admin cluster from 1.5 to 1.6.0 breaks 1.5 user clusters that use any OIDC provider and that have no value for authentication.oidc.capath in the user cluster configuration file.

January 21, 2021

Anthos GKE on-prem 1.5.3-gke.0 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.3-gke.0 clusters run on Kubernetes 1.17.9-gke.4400.

Fixes:

  • Fixed CVE-2020-15157 and CVE-2020-15257 in containerd.

  • Cloud Run Operator is now able to successfully update custom resource definitions (CRDs).

December 10, 2020

Anthos clusters on VMware 1.6.0-gke.7 is now available. To upgrade, see Upgrading Anthos clusters on VMware. Anthos clusters on VMware 1.6.0-gke.7 clusters run on Kubernetes 1.18.6-gke.6600.

Note: The fully supported versions offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on VMware are 1.6, 1.5, and 1.4.

Users can use a credential configuration file with gkeadm (credential.yaml), which is generated during running the gkeadm create config command, to improve security by removing credentials from admin-ws-config.yaml.

Node Problem Detector and Node Auto Repair automatically detect and repair additional failures, such as Kubelet-API server connection loss (an OSS issue) and long-lasting DiskPressure conditions.

Preview: Repair administrator master VM failures by using the new command, gkectl repair admin-master.

Preview: Secrets Encryption for user clusters using Thales Luna Network HSM Devices.

Preview: Service Account Key Rotation in gkectl for Usage Metering, Cloud Audit Logs, and Google Cloud's operations suite service accounts.

Anthos Identity Service enables dynamic configuration changes for OpenID Connect (OIDC) configuration without needing to recreate user clusters.

Google Cloud's operations suite support for bundled Seesaw load balancing:

Metrics and logs of bundled Seesaw load balancers are now uploaded to Google Cloud through Google Cloud's operations suite to provide the best observability experience.

Cloud Audit Logs

Offline buffer for Cloud Audit Logs: Audit logs are now buffered on disk if not able to reach Cloud Audit Logs and can withstand at least 4 hours of network outage.

CSI volume snapshots

The CSI snapshot controllers are now automatically deployed in user clusters, enabling the users to create snapshots of persistent volumes and restore the volumes' data by provisioning new volumes from these snapshots.

Functionality changes:

  • Gkectl diagnose cluster and snapshot enhancements:

    • Added a --log-since flag to gkectl diagnose snapshot. Users can use it to collect logs of containers and nodes within a relative time duration in the snapshot.

    • Replaced the --seed-config flag with the --config flag in the gkectl diagnose cluster command. Users can use this command with the seed configuration to rule out the VIP issue and provide more debugging information of the cluster.

    • Added more validations in gkectl diagnose cluster.

  • Added iscsid support: Qualified storage drivers that previously required additional steps benefit from the default iscsi service deployment on the worker nodes.

  • On each cluster node, Anthos clusters on VMware now reserves 330 MiB + 5% of the node's memory capacity for operating system components and core Kubernetes components. This is an increase of 50 MiB. For more information see Resources available for your workloads.

Breaking changes:

Fixes:

  • Security fix: Resolve credential file references when only a subset of credentials are specified by reference.

  • Fixed vSphere credential update when CSI storage is not enabled.

  • Fixed a bug in Fluent Bit in which the buffer for logs might fill up node disk space.

Known issues:

  • gkectl update reverts your edits on clientconfig CR in 1.6.0. We strongly suggest that customers back up the clientconfig CR after every manual change.

  • Kubectl describe CSINode and gkectl diagnose snapshot might sometimes fail due to the OSS Kubernetes issue on dereferencing nil pointer fields.

  • The OIDC provider doesn't use the common CA by default. You must explicitly supply the CA certificate.

  • Upgrading the admin cluster from 1.5 to 1.6.0 breaks 1.5 user clusters that use any OIDC provider and that have no value for authentication.oidc.capath in the user cluster configuration file.

    To work around this issue, run the following script, using your OIDC provider address as the IDENTITY_PROVIDER, YOUR_OIDC_PROVIDER_ADDRESS in the following script:

    USER_CLUSTER_KUBECONFIG=usercluster-kubeconfig

    IDENTITY_PROVIDER=YOUR_OIDC_PROVIDER_ADDRESS

    openssl s_client -showcerts -verify 5 -connect $IDENTITY_PROVIDER:443 < /dev/null | awk '/BEGIN CERTIFICATE/,/END CERTIFICATE/{ if(/BEGIN CERTIFICATE/){i++}; out="tmpcert"i".pem"; print >out}'

    ROOT_CA_ISSUED_CERT=$(ls tmpcert*.pem | tail -1)

    ROOT_CA_CERT="/etc/ssl/certs/$(openssl x509 -in $ROOT_CA_ISSUED_CERT -noout -issuer_hash).0"

    cat tmpcert*.pem $ROOT_CA_CERT > certchain.pem CERT=$(echo $(base64 certchain.pem) | sed 's\ \\g') rm tmpcert1.pem tmpcert2.pem

    kubectl --kubeconfig $USER_CLUSTER_KUBECONFIG patch clientconfig default -n kube-public --type json -p "[{ \"op\": \"replace\", \"path\": \"/spec/authentication/0/oidc/certificateAuthorityData\", \"value\":\"${CERT}\"}]"

November 16, 2020

GKE on-prem 1.5.2-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.2-gke.3 clusters run on Kubernetes 1.17.9-gke.4400.

GKE Data Plane V2 Preview is now available.

  • GKE Data Plane V2 is a new programmable data path that enables Google to offer new network security features like Network Policy Logging and Node Network Policy.

For information about enabling Dataplane V2, see User cluster configuration file. For information about Network Policy Logging, see Logging network policy events.

Binary Authorization for GKE on-prem 0.2.1 is now available.

  • Binary Authorization for GKE on-prem 0.2.1 adds a proxy side cache that caches AdmissionReview responses. This can improve the reliability of the webhook.

Fixes:

  • Fixed false warning in gkectl check-config for admin cluster for manual load balancing category.
  • Updated Istio Ingress (Kubernetes) Custom Resource Definitions (CRDs) to use v1beta1.
  • Fixed issue where GKE on-prem upgrade is stuck because of Cloud Run for Anthos on-prem pods crash looping. Cloud Run for Anthos on-prem causes an operational outage of GKE on-prem when Cloud Run for Anthos on-prem is enabled in upgrade of GKE on-prem 1.4 to 1.5. Fixed webhook; custom resource definition (CRD) is not fixed.

Known issues:

Cloud Run Operator is unable to update custom resource definitions (CRDs). Applying the CRDs manually either before or during the upgrade lets the operator continue the upgrade.

Workaround:

gsutil cat gs://gke-on-prem-release/hotfixes/1.5/cloudrun/crds.yaml | kubectl apply -f -

November 02, 2020

GKE on-prem 1.4.4-gke.1 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.4-gke.1 clusters run on Kubernetes 1.16.11-gke.11.

Fixes:

  • Updated Istio Ingress (Kubernetes) Custom Resource Definitions (CRDs) to use v1beta1.

GKE on-prem 1.3.5-gke.2 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.5-gke.2 clusters run on Kubernetes 1.15.12-gke.6400.

Fixes:

October 23, 2020

GKE on-prem 1.5.1-gke.8 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.1-gke.8 clusters run on Kubernetes 1.17.9-gke.4400.

Binary Authorization for GKE on-prem Preview is now available:

This release enables customers to generate credential configuration templates by using the gkectl create-config credential command.

Published the best practices for how to set up GKE on-prem components for high availability and how to recover from disasters.

Published the best practices for creating, configuring, and operating GKE on-prem clusters at large scale.

Known issues:

The version of Anthos Configuration Management included in the GKE on-prem release 1.5.1-gke.8 had initially referenced a version of the nomos image that had not been moved into the gcr.io/gke-on-prem-release repository, thus preventing a successful installation or upgrade of Anthos Configuration Management. This image has since been pushed to the repository to correct the issue for customers not using private registries. Customers using private registries will need to upgrade to 1.5.2 when it is available (scheduled for November 16, 2020) or manually copy the nomos:v1.5.1-rc.7 image into their private repository.

Fixes:

  • Fixed cluster creation issue when Cloud Run is enabled.
  • Fixed the false positive error in docker registry preflight check where REGISTRY_ADDRESS/NAMESPACE might be mistakenly used as the registry address to store the certs on a test VM, causing authentication errors.

September 24, 2020

GKE on-prem 1.5.0-gke.27 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.5.0-gke.27 clusters run on Kubernetes 1.17.9-gke.4400.

Improved upgrade and installation:

  • Preflight checks are now blocking with v1 configs for installation and upgrades. Users can use --skip-preflight-check-blocking to unblock the operation.
  • Added support for running gkeadm on macOS Catalina, v10.14.
  • Enabled installation and upgrade by using any Google Cloud–authenticated service account. This removes the need for allowlisting.
  • Improved security by adding support for using an external credential file in admin or user configuration. This enables customers to check in their cluster configuration files in source code repositories without exposing confidential credential information.

Improved HA and failure recovery:

Improved support for Day-2 operations:

  • The gkectl update cluster command is now generally available. Users can use it to change supported features in the user cluster configurations after cluster creation.
  • The gkectl update credentials command for vSphere and F5 credentials is now generally available.
  • Improves scalability with 20 user clusters per admin cluster, and 250 nodes, 7500 pods, 500 load balancing services (using Seesaw), and 250 load balancing services (using F5) per user cluster.
  • Introduces vSphere CSI driver in preview.

Enhanced monitoring with Cloud Monitoring:

  • Introduces out-of-the-box alerts for critical cluster metrics and events in preview.
  • Out-of-the-box monitoring dashboards are automatically created during installation when Cloud Monitoring is enabled.
  • Lets users modify CPU or memory resource settings for Cloud Monitoring components.

Functionality changes:

  • Preflight check failures now block gkectl create loadbalancer for the bundled load balancer with Seesaw.
  • Adds a blocking preflight check for the anthos.googleapis.com API of a configured gkeConnect project.
  • Adds a blocking preflight check on proxy IP and service/pod CIDR overlapping.
  • Adds a non-blocking preflight check on cluster health before an admin or user cluster upgrade.
  • Updates the gkectl diagnose snapshot:
    • Fixes the all scenario to collect all supported Kubernetes resources for the target cluster.
    • Collects F5 load balancer information, including Virtual Server, Virtual Address, Pool, Node, and Monitor.
    • Collects vSphere information, including VM objects and their events based on the resource pool and the Datacenter, Cluster, Network, and Datastore objects that are associated with VMs.
  • Fixes the OIDC proxy configuration issue. Users no longer need to edit NO_PROXY env settings in the cluster configuration to include new node IPs.
  • Adds monitoring.dashboardEditor to the roles granted to the logging-monitoring service account during admin workstation creation with --auto-create-service-accounts.
  • Bundled load balancing with Seesaw switches to the IPVS maglev hashing algorithm, achieving stateless, seamless failover. There is no connection sync daemon anymore.
  • The hostconfig section of the ipBlock file can be specified directly in the cluster yaml file network section and has a streamlined format.

Breaking changes:

  • Starting with version 1.5, instead of using kubectl patch machinedeployment to resize the user cluster and kubectl edit cluster to add static IPs to user clusters, use gkectl update cluster to resize the worker node in user clusters and to add static IPs to user clusters.
  • Starting with version 1.5, the gkectl log is saved in a single file instead of multiple files by log verbosity levels. By default, the gkectl log is saved in the /home/ubuntu/.config/gke-on-prem/logs directory with a symlink created under the ./logs directory for easy access. Users can use --log_dir or --log_file to change this default setting.
  • Starting with version 1.5, the gkeadm log is saved in a single file instead of multiple files by log verbosity levels. By default, the gkeadm log is saved under ./logs. Users can use --log_dir or --log_file to change this default setting.
  • In version 1.5 only, the etcd version is updated from 3.3 to 3.4, which means the etcd image becomes smaller for improved performance and security (distroless), and the admin and user cluster etcd restore process is changed.
  • In 1.5 and later releases, a new firewall rule needs to be enabled from admin cluster add-on nodes to vCenter server API port 443.

Fixes:

  • Fixed an issue that caused approximately 50 seconds of downtime for the user cluster API service during cluster upgrade or update.
  • Corrected the default log verbosity setting in gkectl and gkeadm help messages.

Known issues:

  • Due to a 1.17 Kubernetes issue, kube-apiserver and kube-scheduler don't expose kubernetes_build_info on the /metrics endpoint in the 1.5 release. Customers can use Kubernetes_build_info from kube-controller-manager to get similar information like the Kubernetes major version, minor version, and build date.
  • Cloud Run for Anthos on-prem causes an operational outage of GKE on-prem when Cloud Run for Anthos on-prem is enabled in both installation and upgrade of GKE on-prem 1.5.0.

September 17, 2020

GKE on-prem 1.4.3-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.3-gke.3 clusters run on Kubernetes 1.16.11-gke.11.

Fixes:

  • Fixed CVE-2020-14386 described in Security Bulletin.

  • Preflight check for hostname validation was too strict. We updated the hostname validation following the RFC 1123 DNS subdomain definition.

  • There was an issue in the 1.4.0 and 1.4.2 releases where the node problem detector didn't start when the node restarted. This is fixed in this version.

GKE on-prem 1.3.4-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.4-gke.3 clusters run on Kubernetes 1.15.12-gke.15.

Fixes:

August 20, 2020

GKE on-prem 1.4.2-gke.3 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.2-gke.3 clusters run on Kubernetes 1.16.11-gke.11.

GPU support (beta solution in collaboration with Nvidia)

In partnership with Nvidia, users can now manually attach a GPU to a worker node VM to run GPU workloads. This requires using the open source Nvidia GPU operator.

Note: Manually attached GPUs do not persist through node lifecycle events. You must manually re-attach them. This is a beta solution and can be used for evaluation and proof of concept.

The Ubuntu image is upgraded to include the newest packages.

gkectl delete loadbalancer is updated to support the new version of configuration files for admin and user clusters.

Fixes:

  • Resolved a few incorrect Kubelet Metrics' names collected by Prometheus.
  • Updated restarting machines process during admin cluster upgrade to make the upgrade process more resilient to transient connection issues.
  • Resolved a preflight check OS image validation error when using a non-default vSphere folder for cluster creation; the OS image template is expected to be in that folder.
  • Resolved a gkectl upgrade loadbalancer issue to avoid validating the upgraded SeesawGroup. This fix lets the existing SeesawGroup config be updated without negatively affecting the upgrade process.
  • Resolved an issue where ClientConfig CRD is deleted when the upgrade to the latest version is run multiple times.
  • Resolved a gkectl update credentials vsphere issue where the vsphere-metrics-exporter was using the old credentials even after updating the credentials.
  • Resolved an issue where the VIP preflight check reported a user cluster add-on load balancer IP false positive.
  • Fixed gkeadm updating config after upgrading on Windows, specifically for the gkeOnPremVersion and bundlePath fields.
  • Automatically mount the data disk after rebooting on admin workstations created using gkeadm 1.4.0 and later.
  • Reverted thin disk provisioning change for boot disks in 1.4.0 and 1.4.1 on all normal (excludes test VMs) cluster nodes.
  • Removed vCenter Server access check from user cluster nodes.

July 30, 2020

Anthos clusters on VMware 1.3.3-gke.0 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.3-gke.0 clusters run on Kubernetes 1.15.12-gke.9.

Fixes:

June 25, 2020

Anthos clusters on VMware 1.4.0-gke.13 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.4.0-gke.13 clusters run on Kubernetes 1.16.8-gke.6.

Updated to Kubernetes 1.16:

Simplified upgrade:

  • This release provides a simplified upgrade experience via the following changes:

    • Automatically migrate information from the previous version of admin workstation using gkeadm.
    • Extend preflight checks to better prepare for upgrades.
    • Support skip version upgrade to enable users to upgrade the cluster from any patch release of a minor release to any patch release of the next minor release. For more information about the detailed upgrade procedure and limitations, see upgrading GKE on-prem.
    • The alternate upgrade scenario for Common Vulnerabilities and Exposures has been deprecated. All upgrades starting with version 1.3.2 need to upgrade the entire admin workstation.
    • The bundled load balancer is now automatically upgraded during cluster upgrade.

Improved installation and cluster configuration:

  • The user cluster node pools feature is now generally available.
  • This release improves the installation experience via the following changes:

    • Supports gkeadm for Windows OS.
    • Introduces a standalone command for creating admin clusters.
  • Introduce a new version of configuration files to separate admin and user cluster configurations and commands. This is designed to provide a consistent user experience and better configuration management.

Improved disaster recovery capabilities:

  • This release provides enhanced disaster recovery functionality to support backup and restore HA user cluster with etcd.
  • This release also provides a manual process to recover a single etcd replica failure in a HA cluster without any data loss.

Enhanced monitoring with Cloud Monitoring (formerly Stackdriver):

  • This release provides better product monitoring and resource usage management via the following changes:

  • Ubuntu Image now conforms with PCI DSS, NIST Baseline High, and DoD SRG IL2 compliance configurations.

Functionality changes:

  • Enabled Horizontal Pod Autoscaler (HPA) for the Istio ingress gateway.
  • Removed ingress controller from admin cluster.
  • Consolidated sysctl configs with Google Kubernetes Engine.
  • Added etcd defrag pod in admin cluster and user cluster, which will be responsible for monitoring etcd's database size and defragmenting it as needed. This helps reclaim etcd database size and recover etcd when its disk space is exceeded.

Support for a vSphere folder (Preview):

  • This release allows customers to install GKE on-prem in a vSphere folder, reducing the scope of the permission required for the vSphere user.

Improved scale:

Fixes:

  • Fixed the issue of the user cluster's Kubernetes API server not being able to connect to kube-etcd after admin nodes and user cluster master reboot. In previous versions, kube-dns in admin clusters was configured through kubeadm. In 1.4, this configuration is moved from kubeadm to bundle, which enables deploying two kube-dns replicas on two admin nodes. As a result, a single admin node reboot/failure won't disrupt user cluster API access.
  • Fixed the issue that controllers such as calico-typha can't be scheduled on an admin cluster master node, when the admin cluster master node is under disk pressure.
  • Resolved pods failure with MatchNodeSelector on admin cluster master after node reboot or kubelet restart.
  • Tuned etcd quota limit settings based on the etcd data disk size and the settings in GKE Classic.

Known issues:

  • If a user cluster is created without any node pool named the same as the cluster, managing the node pools using gkectl update cluster would fail. To avoid this issue, when creating a user cluster, you need to name one node pool the same as the cluster.
  • The gkectl command might exit with panic when converting config from "/path/to/config.yaml" to v1 config files. When that occurs, you can resolve the issue by removing the unused bundled load balancer section ("loadbalancerconfig") in the config file.
  • When using gkeadm to upgrade an admin workstation on Windows, the info file filled out from this template needs to have the line endings converted to use Unix line endings (LF) instead of Windows line endings (CRLF). You can use Notepad++ to convert the line endings.
  • After upgrading an admin workstation with a static IP using gkeadm, you need to run ssh-keygen -R <admin-workstation-ip> to remove the IP from the known hosts, because the host identification changed after VM re-creation.
  • We have added Horizontal Pod Autoscaler for istio-ingress and istio-pilot deployments. HPA can scale up unnecessarily for istio-ingress and istio-pilot deployments during cluster upgrades. This happens because the metrics server is not able to report usage of some pods (newly created and terminating; for more information, see this Kubernetes issue). No actions are needed; scale down will happen five minutes after the upgrade finishes.
  • When running a preflight check for config.yaml that contains both admincluster and usercluster sections, the "data disk" check in the "user cluster vCenter" category might fail with the message: [FAILURE] Data Disk: Data disk is not in a folder. Use a data disk in a folder when using vSAN datastore. User clusters don't use data disks, and it's safe to ignore the failure.
  • When upgrading the admin cluster, the preflight check for the user cluster OS image validation will fail. The user cluster OS image is not used in this case, and it's safe to ignore the "User Cluster OS Image Exists" failure in this case.
  • A Calico-node pod might be stuck in an unready state after node IP changes. To resolve this issue, you need to delete any unready Calico-node pods.
  • The BIG-IP controller might fail to update F5 VIP after any admin cluster master IP changes. To resolve this, you need to use the admin cluster master node IP in kubeconfig and delete the bigip-controller pod from the admin master.
  • The stackdriver-prometheus-k8s pod could enter a crashloop after host failure. To resolve this, you need to remove any corrupted PersistentVolumes that the stackdriver-prometheus-k8s pod uses.
  • After node IP change, pods running with hostNetwork don't get podIP corrected until Kubelet restarts. To resolve this, you need to restart Kubelet or delete those pods using previous IPs.
  • An admin cluster fails after any admin cluster master node IP address changes. To avoid this, you should avoid changing the admin master IP address if possible by using a static IP or a non-expired DHCP lease instead. If you encounter this issue and need further assistance, please contact Google Support.
  • User cluster upgrade might be stuck with the error: Failed to update machine status: no matches for kind "Machine" in version "cluster.k8s.io/v1alpha1". To resolve this, you need to delete the clusterapi pod in the user cluster namespace in the admin cluster.

If your vSphere environment has fewer than three hosts, user cluster upgrade might fail. To resolve this, you need to disable antiAffinityGroups in the cluster config before upgrading the user cluster. For v1 config, please set antiAffinityGroups.enabled = false; for v0 config, please set usercluster.antiaffinitygroups.enabled = false.

Note: Disabling antiAffinityGroups in the cluster config during upgrade is only allowed for the 1.3.2 to 1.4.x upgrade to resolve the upgrade issue; the support might be removed in the future.

May 21, 2020

Workload Identity is now available in Alpha for GKE on-prem. Please contact support if you are interested in a trial of Workload Identity in GKE on-prem.

Preflight check for VM internet and Docker Registry access validation is updated.

Preflight check for internet validation is updated to not follow redirect. If your organization requires outbound traffic to pass through a proxy server, you no longer need to allowlist the following addresses in your proxy server:

  • console.cloud.google.com
  • cloud.google.com

The Ubuntu image is upgraded to include the newest packages.

Upgraded the Istio image to version 1.4.7 to fix a security vulnerability.

Some ConfigMaps in the admin cluster were refactored to Secrets to allow for more granular access control of sensitive configuration data.

April 23, 2020

Preflight check in gkeadm for access to the Cloud Storage bucket that holds the admin workstation OVA.

Preflight check for internet access includes additional URL www.googleapis.com.

Preflight check for test VM DNS availability.

Preflight check for test VM NTP availability.

Preflight check for test VM F5 access.

Before downloading and creating VM templates from OVAs, GKE on-prem checks if the VM template already exists in vCenter.

Rename gkeadm automatically created service accounts.

OVA download displays download progress.

gkeadm prepopulates bundlepath in the seed config on the admin workstation.

Fix for Docker failed DNS resolution on admin workstation at startup.

Admin workstation provisioned by gkeadm uses thin disk provisioning.

Improved user cluster Istio ingress gateway reliability.

Ubuntu image is upgraded to include newest packages.

Update the vCenter credentials for your clusters using the preview command gkectl update credentials vsphere.

The gkeadm configuration file, admin-ws-config.yaml, accepts paths that are prefixed with ~/ for the Certificate Authority (CA) certificate.

Test VMs wait until the network is ready before starting preflight checks.

Improve the error message in preflight check failure for F5 BIG-IP.

Skip VIP check in preflight check in manual load balancing mode.

Upgraded Calico to version 3.8.8 to fix several security vulnerabilities.

Upgraded F5 BIG-IP Controller Docker image to version 1.14.0 to fix a security vulnerability.

Fixed gkeadm admin workstation gcloud proxy username and password configuration.

Fixed the bug that was preventing gkectl check-config from automatically using the proxy that you set in your configuration file when running the full set of preflight validation checks with any GKE on-prem download image.

Fixed an admin workstation upgrade failure when the upgrade process was unable to retrieve SSH keys, which would cause a Golang segmentation fault.

April 01, 2020

When upgrading from version 1.2.2 to 1.3.0 by using the Bundle download in the alternate upgrade method, a timeout might occur that will cause your user cluster upgrade to fail. To avoid this issue, you must perform the full upgrade process that includes upgrading your admin workstation with the OVA file.

March 23, 2020

Anthos clusters on VMware 1.3.0-gke.16 is now available. To upgrade, see Upgrading GKE on-prem. GKE on-prem 1.3.0-gke.16 clusters run on Kubernetes 1.15.7-gke.32.

A new installer helps you create and prepare the admin workstation.

Support for vSAN datastore on your admin and user clusters.

In bundled load balancing mode, GKE on-prem provides and manages the Seesaw load balancer.

The Authentication Plugin for Anthos has been integrated into and replaced with the Google Cloud command-line interface, which improves the authentication process and provides the user consent flow through gcloud commands.

Added support for up to 100 nodes per user cluster.

The Cluster CA now signs the TLS certificates that the Kubelet API serves, and the TLS certificates are auto-rotated.

vSphere credential rotation is enabled. Users can now use Solution User Certificates to authenticate to GKE deployed on-prem.

gkectl automatically uses the proxy URL from config.yaml to configure the proxy on the admin workstation.

Preview Feature: Introducing User cluster Nodepools. A node pool is a group of nodes within a cluster that all have the same configuration. In GKE on-prem 1.3.0, node pools are a preview feature in the user clusters. This feature lets users create multiple node pools in a cluster, and update them as needed.

The metric kubelet_containers_per_pod_count is changed to a histogram metric.

Fixed an issue in the vSphere storage plugin that prevented vSphere storage policies from working. This is an example of how you might use this feature.

Prometheus + Grafana: two graphs on the Machine dashboard don't work because of missing metrics: Disk Usage and Disk Available.

All OOM events for containers trigger a SystemOOM event, even if they are container/pod OOM events. To check whether an OOM is actually a SystemOOM, check the kernel log for a message oom-kill:…. If oom_memcg=/ (instead of oom_memcg=/kubepods/…), then it's a SystemOOM. If it's not a SystemOOM, it's safe to ignore.

Affected versions: 1.3.0-gke.16

If you configured a proxy in the config.yaml and also used a bundle other than the full bundle (static IP | DHCP), you must append the --fast flag to run gkectl check-config. For example: gkectl check-config --config config.yaml --fast.

Running the 1.3 version of the gkectl diagnose command might fail if your clusters:

  • Are older than Anthos clusters on VMware version 1.3.
  • Include manually installed add-ons in the kube-system namespace.

February 21, 2020

GKE on-prem version 1.2.2-gke.2 is now available. To upgrade, see Upgrading GKE on-prem.

Improved gkectl check-config to validate any valid Google Cloud service accounts regardless of whether an IAM role is set.

You need to use vSphere provider version 1.15 when using Terraform to create the admin workstation. vSphere provider version 1.16 introduces breaking changes that would affect all Anthos versions.

Skip the preflight check when resuming cluster creation/upgrade.

Resolved a known issue of cluster upgrade when using a vSAN datastore associated with a GKE on-prem version before 1.2

Resolved the following warning when uploading an OS image with the enableMPTSupport configuration flag set. This flag is used to indicate whether the virtual video card supports mediated passthrough.

Warning: Line 102: Unable to parse 'enableMPTSupport' for attribute 'key' on element 'Config'.

Fixed the BigQuery API service name for the preflight check service requirements validation.

Fixed the preflight check to correctly validate the default resource pool in the case where the resourcepool field in the GKE on-prem configuration file is empty.

Fixed a comment about the workernode.replicas field in the GKE on-prem configuration file to say that the minimum number of worker nodes is three.

Fixed gktctl prepare to skip checking the data disk.

Fixed gktctl check-config so that it cleans up F5 BIG-IP resources on exit.

January 31, 2020

GKE on-prem version 1.2.1-gke.4 is now available. To upgrade, see Upgrading GKE on-prem.

This patch version includes the following changes:

Adds searchdomainsfordns field to static IPs host configuration file. searchdomainsfordns is an array of DNS search domains to use in the cluster. These domains are used as part of a domain search list.

Adds a preflight check that validates an NTP server is available.

gkectl check-config now automatically uploads GKE on-prem's node OS image to vSphere. You no longer need to run gkectl prepare before gkectl check-config.

Adds a --cleanup flag for gkectl check-config. The flag's default value is true.

Passing in --cleanup=false preserves the test VM and associated SSH keys that gkectl check-config creates for its preflight checks. Preserving the VM can be helpful for debugging.

Fixes a known issue from 1.2.0-gke.6 that prevented gkectl check-config from performing all of its validations against clusters in nested resource pools or the default resource pool.

Fixes an issue that caused F5 BIG-IP VIP validation to fail due to timing out. The timeout window for F5 BIG-IP VIP validation is now longer.

Fixes an issue that caused cluster upgrades to overwrite changes to add-on configurations.

Fixes the known issue from 1.2.0-gke.6 that affects routing updates due to the route reflector configuration.

January 28, 2020

Affected versions: 1.2.0-gke.6

In some cases, certain nodes in a user cluster fail to get routing updates from the route reflector. Consequently Pods on a node may not be able to communicate with Pods on other nodes. One possible symptom is a kube-dns resolution error.

To work around this issue, follow these steps to create a BGPPeer object in your user cluster.

Save the following BGPPeer manifest as full-mesh.yaml:

apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
  name: full-mesh
spec:
  nodeSelector: "!has(route-reflector)"
  peerSelector: "!has(route-reflector)" 

Create the BGPPeer in your user cluster:

kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] apply -f full-mesh.yaml

Verify that the full-mesh BGPPeer was created:

kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get bgppeer

The output shows full-mesh in the list of BGPPeers:

NAME            AGE
  full-mesh       61s
  gke-group-1     3d21h
  ...

This issue will be fixed in version 1.2.1.

January 03, 2020

Affected versions: 1.1.0-gke.6 and later

Starting with version 1.1.0-gke.6, the gkeconnect.proxy field is no longer in the GKE on-prem configuration file.

If you include gkeconnect.proxy in the configuration file, the gkectl check-config command can fail with this error:

[FAILURE] Config: Could not parse config file: error unmarshaling JSON: 
while decoding JSON: json: unknown field "proxy"

To correct this issue, remove gkeconnect.proxy from the configuration file.

In versions prior to 1.1.0-gke.6, the Connect Agent used the proxy server specified in gkeconnect.proxy. Starting with version 1.1.0-gke.6, the Connect Agent uses the proxy server specified in the global proxy field.

December 20, 2019

Warning: If you installed GKE on-prem versions before 1.2, and you use a vSAN datastore, you should contact Google Support before attempting an upgrade to 1.2.0-gke.6.

GKE on-prem version 1.2.0-gke.6 is now available. To upgrade, see Upgrading GKE on-prem.

This minor version includes the following changes:

The default Kubernetes version for cluster nodes is now version 1.14.7-gke.24 (previously 1.13.7-gke.20).

GKE on-prem now supports vSphere 6.7 Update 3. Read its release notes.

GKE on-prem now supports VMware NSX-T version 2.4.2.

Any user cluster, even your first use cluster, can now use a datastore that is separate from the admin cluster's datastore. If you specify a separate datastore for a user cluster, the user cluster nodes, PersistentVolumes (PVs) for the user cluster nodes, user control plane VMs, and PVs for the user control plane VMs all use the separate datastore.

Expanded preflight checks for validating your GKE on-prem configuration file before your create your clusters. These new checks can validate that your Google Cloud project, vSphere network, and other elements of your environment are correctly configured.

Published basic installation workflow. This workflow offers a simplified workflow for quickly installing GKE on-prem using static IPs.

Published guidelines for installing Container Storage Interface (CSI) drivers. CSI enables using storage devices not natively supported by Kubernetes.

Updated documentation for authenticating using OpenID Connect (OIDC) with the Anthos Plugin for Kubectl. GKE on-prem's OIDC integration is now generally available.

From the admin workstation, gcloud now requires that you log in to gcloud with a Google Cloud user account. The user account should have at least the Viewer IAM role in all Google Cloud projects associated with your clusters.

You can now create admin and user clusters separately from one another.

Fixes an issue that prevented resuming cluster creation for HA user clusters.

Affected versions: 1.1.0-gke.6, 1.2.0-gke.6

The stackdriver.proxyconfigsecretname field was removed in version 1.1.0-gke.6. GKE on-prem's preflight checks will return an error if the field is present in your configuration file.

To work around this, before you install or upgrade to 1.2.0-gke.6, delete the proxyconfigsecretname field from your configuration file.

Affected versions: 1.2.0-6-gke.6

In user clusters, Prometheus and Grafana get automatically disabled during upgrade. However, the configuration and metrics data are not lost. In admin clusters, Prometheus and Grafana stay enabled.

To work around this issue, after the upgrade, open monitoring-sample for editing and set enablePrometheus to true:

1.kubectl edit monitoring --kubeconfig [USER_CLUSTER_KUBECONFIG] \ -n kube-system monitoring-sample

2. Set the field enablePrometheus to true.

Affected versions: All versions

Before version 1.2.0-gke.6, a known issue prevents Stackdriver from updating its configuration after cluster upgrades. Stackdriver still references an old version, which prevents Stackdriver from receiving the latest features of its telemetry pipeline. This issue can make it difficult for Google Support to troubleshoot clusters.

After you upgrade clusters to 1.2.0-gke.6, run the following command against admin and user clusters:

kubectl --kubeconfig=[KUBECONFIG] \
-n kube-system --type=json patch stackdrivers stackdriver \
-p '[{"op":"remove","path":"/spec/version"}]'

where [KUBECONFIG] is the path to the cluster's kubeconfig file.

November 19, 2019

GKE On-Prem version 1.1.2-gke.0 is now available. To download version 1.1.2-gke.0's OVA, gkectl, and upgrade bundle, see Downloads. Then, see Upgrading admin workstation and Upgrading clusters.

This patch version includes the following changes:

New Features

Published Managing clusters.

Fixes

Fixed the known issue from November 5.

Fixed the known issue from November 8.

Known Issues

If you are running multiple data centers in vSphere, running gkectl diagnose cluster might return the following error, which you can safely ignore:

Checking storage...FAIL path '*' resolves to multiple datacenters

If you are running a vSAN datastore, running gkectl diagnose cluster might return the following error, which you can safely ignore:

PersistentVolume [NAME]: virtual disk "[[DATASTORE_NAME]] [PVC]" IS NOT attached to machine "[MACHINE_NAME]" but IS listed in the Node.Status

November 08, 2019

In GKE On-Prem version 1.1.1-gke.2, a known issue prevents creation of clusters configured to use a Docker registry. You configure a Docker registry by populating the GKE On-Prem configuration file's privateregistryconfig field. Cluster creation fails with an error such as Failed to create root cluster: could not create external client: could not create external control plane: docker run error: exit status 125

A fix is targeted for version 1.1.2. In the meantime, if you want to create a cluster configured to use a Docker registry, pass in the --skip-validation-docker flag to gkectl create cluster.

November 05, 2019

GKE On-Prem's configuration file has a field, vcenter.datadisk, which looks for a path to a virtual machine disk (VMDK) file. During installation, you choose a name for the VMDK. By default, GKE On-Prem creates a VMDK and saves it to the root of your vSphere datastore.

If you are using a vSAN datastore, you need to create a folder in the datastore in which to save the VMDK. You provide the full path to the field—for example, datadisk: gke-on-prem/datadisk.vmdk—and GKE On-Prem saves the VMDK in that folder.

When you create the folder, vSphere assigns the folder a universally unique identifier (UUID). Although you provide the folder path to the GKE On-Prem config, the vSphere API looks for the folder's UUID. Currently, this mismatch can cause cluster creation and upgrades to fail.

A fix is targeted for version 1.1.2. In the meantime, you need to provide the folder's UUID instead of the folder's path. Follow the workaround instructions currently available in the upgrading clusters and installation topics.

October 25, 2019

GKE On-Prem version 1.1.1-gke.2 is now available. To download version 1.1.1-gke.2's OVA, gkectl, and upgrade bundle, see Downloads. Then, see Upgrading admin workstation and Upgrading clusters.

This patch version includes the following changes:

New Features

Action required: This version upgrades the minimum gcloud version on the admin workstation to 256.0.0. You should upgrade your admin workstation. Then, you should upgrade your clusters.

The open source CoreOS toolbox is now included in all GKE On-Prem cluster nodes. This suite of tools is useful for troubleshooting node issues. See Debugging node issues using toolbox.

Fixes

Fixed an issue that prevented clusters configured with OIDC from being upgraded.

Fixed CVE-2019-11253 described in Security bulletins.

Fixed an issue that caused cluster metrics to be lost due to a lost connection to Google Cloud. When a GKE On-Prem cluster's connection to Google Cloud is lost for a period of time, that cluster's metrics are now fully recovered.

Fixed an issue that caused ingestion of admin cluster metrics to be slower than ingesting user cluster metrics.

Known Issues

For user clusters that are using static IPs and a different network than their admin cluster: If you overwrite the user cluster's network configuration, the user control plane might not be able to start. This occurs because it's using the user cluster's network, but allocates an IP address and gateway from the admin cluster.

As a workaround, you can update each user control plane's MachineDeployment specification to use the correct network. Then, delete each user control plane Machine, causing the MachineDeployment to create new Machines:

  1. List MachineDeployments in the admin cluster

    kubectl get machinedeployments --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]
    
  2. Update a user control plane MachineDeployment from your shell

    kubectl edit machinedeployment --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] [MACHINEDEPLOYMENT_NAME]
    
  3. List Machines in the admin cluster

    kubectl get machines --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]
    
  4. Delete user control plane Machines in the admin cluster

    kubectl delete machines --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] [MACHINE_NAME]
    

September 26, 2019

GKE On-Prem version 1.1.0-gke.6 is now available. To download version 1.1.0-gke.6's gkectl and upgrade bundle, see Downloads. Then, see Upgrading clusters.

This minor version includes the following changes:

The default Kubernetes version for cluster nodes is now version 1.13.7-gke.20 (previously 1.12.7-gke.19).

Action required: As of version 1.1.0-gke.6, GKE On-Prem now creates vSphere Distributed Resource Scheduler (DRS) rules for your user cluster's nodes (vSphere VMs), causing them to be spread across at least three physical hosts in your datacenter.

This feature is enabled by default for all new and existing user clusters running version 1.1.0-gke.6.

The feature requires that your vSphere environment meet the following conditions:

  • VMware DRS must be enabled. VMware DRS requires vSphere Enterprise Plus license edition. To learn how to enable DRS, see Creating a DRS Cluster.
  • The vSphere user account provided in your GKE On-Prem configuration file's vcenter field must have the Host.Inventory.EditCluster permission.
  • There are at least three physical hosts available.

If you do not want to enable this feature for your existing user clusters—for example, if you don't have enough hosts to accommodate the feature—perform the following steps before you upgrade your user clusters:

  1. Open your existing GKE On-Prem configuration file.
  2. Under the usercluster specification, add the antiaffinitygroups field as described in the antiaffinitygroups documentation: usercluster: ... antiaffinitygroups: enabled: false

  3. Save the file.

  4. Use the configuration file to upgrade. Your clusters are upgraded, but the feature is not enabled.

You can now set the default storage class for your clusters.

You can now use Container Storage Interface (CSI) 1.0 as a storage class for your clusters.

You can now delete broken or unhealthy user clusters with gkectl delete cluster --force

You can now diagnose node issues using the debug-toolbox container image.

You can now skip validatations run by gkectl commands.

The tarball that gkectl diagnose snapshot creates now includes a log of the command's output by default.

Adds gkectl diagnose snapshot flag --seed-config. When you pass the flag, it includes your clusters' GKE On-Prem configuration file in the tarball procduced by snapshot.

The gkeplatformversion field has been removed from the GKE On-Prem configuration file. To specify a cluster's version, provide the version's bundle to the bundlepath field.

You need to add the vSphere permission, Host.Inventory.EditCluster, before you can use antiaffinitygroups.

You now specify a configuration file in gkectl diagnose snapshot by passing the --snapshot-config (previously --config). See Diagnosing cluster issues.

You now capture your cluster's configuration file with gkectl diagnose snapshot by passing --snapshot-config (previously --config). See Diagnosing cluster issues.

gkectl diagnose commands now return an error if you provide a user cluster's kubeconfig, rather than an admin cluster's kubeconfig.

Cloud Console now notifies you when an upgrade is available for a registered user cluster.

A known issue prevents version 1.0.11, 1.0.1-gke.5, and 1.0.2-gke.3 clusters using OIDC from being upgraded to version 1.1. A fix is targeted for version 1.1.1. If you configured a version 1.0.11, 1.0.1-gke.5, or 1.0.2-gke.3 cluster with OIDC, you are not able to upgrade it. Create a version 1.1 cluster by following Installing GKE On-Prem.

August 22, 2019

GKE On-Prem version 1.0.2-gke.3 is now available. This patch release includes the following changes:

Seesaw is now supported for manual load balancing.

You can now specify a different vSphere network for admin and user clusters.

You can now delete user clusters using gkectl. See Deleting a user cluster.

gkectl diagnose snapshot now gets logs from the user cluster control planes.

GKE On-Prem OIDC specification has been updated with several new fields: kubectlredirecturl, scopes, extraparams, and usehttpproxy.

Calico updated to version 3.7.4.

Stackdriver Monitoring's system metrics prefixed changed from external.googleapis.com/prometheus/ to kubernetes.io/anthos/. If you are tracking metrics or alerts, update your dashbaords with the next prefix.

July 30, 2019

GKE On-Prem version 1.0.1-gke.5 is now available. This patch release includes the following changes:

New Features

Changes

gkectl check-config now also checks node IP availability if you are using static IPs.

gkectl prepare now checks if a VM exists and is marked as a template in vSphere before attempting to upload the VM's OVA image.

Adds support for specifying a vCenter cluster, and resource pool in that cluster.

Upgrades F5 BIG-IP controller to version 1.9.0.

Upgrades Istio ingress controller to version 1.2.2.

Fixes

Fixes registry data persistence issues with the admin workstation's Docker registry.

Fixes validation that checks whether a user cluster's name is already in use.

July 25, 2019

GKE On-Prem version 1.0.11 is now available.

June 17, 2019

GKE On-Prem is now generally available. Version 1.0.10 includes the following changes:

Upgrading from beta-1.4 to 1.0.10

Before upgrading your beta clusters to the first general availability version, perform the steps described in Installing GKE On-Prem, and review the following points:

  • If you are running a beta version before beta-1.4, be sure to upgrade to beta-1.4 first.

  • If your beta clusters are running their own L4 load balancers (not the default, F5 BIG-IP), you need to delete and recreate your clusters to run the latest GKE On-Prem version.

  • If your clusters were upgraded to beta-1.4 from beta-1.3, run the following command for each user cluster before upgrading:

    kubectl delete crd networkpolicies.crd.projectcalico.org

  • vCenter certificate verification is now required. (vsphereinsecure is no longer supported.) If you're upgrading your beta 1.4 clusters to 1.0.10, you need to provide a vCenter trusted root CA public certificate in the upgrade configuration file.

  • You need to upgrade all of your running clusters. For this upgrade to succeed, your clusters can't run in a mixed version state.

  • You need to upgrade your admin clusters to the latest version first, then upgrade your user clusters.

New Features

You can now enable the Manual load balancing mode to configure a L4 load balancer. You can still choose to use the default load balancer, F5 BIG-IP.

GKE On-Prem's configuration-driven installation process has been updated. You now declaratively install using a singular configuration file.

Adds gkectl create-config, which generates a configuration file for installing GKE On-Prem, upgrading existing clusters, and for creating additional user clusters in an existing installation. This replaces the installation wizard and create-config.yaml from previous versions. See the updated documentation for installing GKE On-Prem.

Adds gkectl check-config, which validates the GKE On-Prem configuration file. See the updated documentation for installing GKE On-Prem.

Adds an optional --validate-attestations flag to gkectl prepare. This flag verifies that the container images included in your admin workstationwere built and signed by Google and are ready for deployment. See the updated documentation for installing GKE On-Prem.

Changes

Upgrades Kubernetes version to 1.12.7-gke.19. You can now upgrade your clusters to this version. You can no longer create clusters that run Kubernetes version 1.11.2-gke.19.

We recommend upgrading your admin cluster before you upgrade your user clusters.

Upgrades Istio ingress controller to version 1.1.7.

vCenter certificate verification is now required. vsphereinsecure is no longer supported). You provide the certificate in the GKE On-Prem configration file's cacertpath field.

When a client calls the vCenter server, the vCenter server must prove its identity to the client by presenting a certificate. That certificate must be signed by a certificate authority (CA). The certificate is must not be self-signed.

If you're upgrading your beta 1.4 clusters to 1.0.10, you need to provide a vCenter trusted root CA public certificate in the upgrade configuration file.

Known Issues

Upgrading clusters can cause disruption or downtime for workloads that use PodDisruptionBudgets (PDBs).

You might not be able to upgrade beta clusters that use the Manual load balancing mode to GKE On-Prem version 1.0.10. To upgrade and continue using your own load balancer with these clusters, you need to recreate the clusters.

May 24, 2019

GKE On-Prem beta version 1.4.7 is now available. This release includes the following changes:

New Features

In the gkectl diagnose snapshot command, the --admin-ssh-key-path parameter is now optional.

Changes

On May 8, 2019, we introduced a change to Connect, the service that enables you to interact with your GKE On-Prem clusters using Cloud Console. To use the new Connect agent, you must re-register your clusters with Cloud Console, or you must upgrade to GKE On-Prem beta-1.4.

Your GKE On-Prem clusters and the workloads running on them will continue to operate uninterrupted. However, your clusters will not be visible in Cloud Console until you re-register them or upgrade to beta-1.4.

Before you re-register or upgrade, make sure your service account has the gkehub.connect role. Also, if your service account has the old clusterregistry.connect role, it's a good idea to remove that role.

Grant your service account the gkehub.connect role:

gcloud projects add-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/gkehub.connect"

If your service account has the old clusterregistry.connect role, remove the old role:

gcloud projects remove-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/clusterregistry.connect"

Re-register your cluster, or upgrade to GKE On-Prem beta-1.4.

To re-register your cluster:

gcloud alpha container hub register-cluster [CLUSTER_NAME] \
    --context=[USER_CLUSTER_CONTEXT] \
    --service-account-key-file=[LOCAL_KEY_PATH] \
    --kubeconfig-file=[KUBECONFIG_PATH] \
    --project=[PROJECT_ID]

To upgrade to GKE On-Prem beta-1.4:

gkectl upgrade --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]

Known Issues

There is an issue that prevents the Connect agent from being updated to the new version during an upgrade. To work around this issue, run the following command after you upgrade a cluster:

kubectl delete pod gke-connect-agent-install -n gke-connect

May 13, 2019

Known Issues

Clusters upgraded from version beta-1.2 to beta-1.3 might be affected by a known issue that damages the cluster's configuration file and prevents future cluster upgrades. This issue affects all future cluster upgrades.

You can resolve this issue by deleting and recreating clusters upgraded from beta-1.2 to beta-1.3.

To resolve the issue without deleting and recreating the cluster, you need to re-encode and apply each cluster's Secrets. Perform the following steps:

  1. Get the contents of the create-config Secrets stored in the admin cluster. This must be done for the create-config Secret in the kube-system namespace, and for the create-config Secrets in each user cluster's namespace:

    kubectl get secret create-config -n [USER_CLUSTER_NAME] -o jsonpath={.data.cfg} | base64 -d > [USER_CLUSTER_NAME]_create_secret.yaml

    For example:

    kubectl get secret create-config -n kube-system -o jsonpath={.data.cfg} | base64 -d > kube-system_create_secret.yaml

  2. For each user cluster, open the [USER_CLUSTER_NAME]_create_secret.yaml file in an editor.

    If the values for registerserviceaccountkey and connectserviceaccountkey are not REDACTED, no further action is required: the Secrets do not need to be re-encoded and written to the cluster.

  3. Open the original create_config.yaml file in another editor.

  4. In [USER_CLUSTER_NAME]_create_secret.yaml, replace the registerserviceaccountkey and connectserviceaccountkey values with the values from the original create_config.yaml file. Save the changed file.

  5. Repeat steps 2-4 for each [USER_CLUSTER_NAME]_create_secret.yaml, and for the kube-system_create_secret.yaml file.

  6. Base64-encode each [USER_CLUSTER_NAME]_create_secret.yaml file and the kube-system_create_secret.yaml file:

    cat [USER_CLUSTER_NAME]_create_secret.yaml | base64 > [USER_CLUSTER_NAME]_create_secret_create_secret.b64

    cat kube-system-cluster_create_secret.yaml | base64 > kube-system-cluster_create_secret.b64

  7. Replace the data[cfg] field in each Secret in the cluster with the contents of the corresponding file:

    kubectl edit secret create-config -n [USER_CLUSTER_NAME]
      # kubectl edit opens the file in the shell's default text editor
      # Open `first-user-cluster_create_secret.b64` in another editor, and replace
      # the `cfg` value with the copied value
      # Make sure the copied string has no newlines in it
    
  8. Repeat step 7 for each [USER_CLUSTER_NAME]_create_secret.yaml Secret, and for the kube-system_create_secret.yaml Secret.

  9. To ensure that the update was successful, repeat step 1.

May 07, 2019

GKE On-Prem beta version 1.4.1 is now available. This release includes the following changes:

New Features

In the gkectl diagnose snapshot command, the --admin-ssh-key-path parameter is now optional.

Changes

On May 8, 2019, we introduced a change to Connect, the service that enables you to interact with your GKE On-Prem clusters using Cloud Console. To use the new Connect agent, you must re-register your clusters with Cloud Console, or you must upgrade to GKE On-Prem beta-1.4.

Your GKE On-Prem clusters and the workloads running on them will continue to operate uninterrupted. However, your clusters will not be visible in Cloud Console until you re-register them or upgrade to beta-1.4.

Before your re-register or upgrade, make sure your service account has the gkehub.connect role. Also, if your service account has the old clusterregistry.connect role, it's a good idea to remove that role.

Grant your service account the gkehub.connect role:

gcloud projects add-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/gkehub.connect"

If your service account has the old clusterregistry.connect role, remove the old role:

gcloud projects remove-iam-policy-binding [PROJECT_ID] \
    --member="serviceAccount:[SERVICE_ACCOUNT_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \
    --role="roles/clusterregistry.connect"

Re-register you cluster, or upgrade to GKE On-Prem beta-1.4.

To re-register your cluster:

gcloud alpha container hub register-cluster [CLUSTER_NAME] \
    --context=[USER_CLUSTER_CONTEXT] \
    --service-account-key-file=[LOCAL_KEY_PATH] \
    --kubeconfig-file=[KUBECONFIG_PATH] \
    --project=[PROJECT_ID]

To upgrade to GKE On-Prem beta-1.4:

gkectl upgrade --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]

Known Issues

There is an issue that prevents the Connect agent from being updated to the new version during an upgrade. To work around this issue, run the following command after you upgrade a cluster:

kubectl delete pod gke-connect-agent-install -n gke-connect

April 25, 2019

GKE On-Prem beta version 1.3.1 is now available. This release includes the following changes:

New Features

The gkectl diagnose snapshot command now has a --dry-run flag.

The gkectl diagnose snapshot command now supports four scenarios.

The gkectl diagnose snapshot command now supports regular expressions for specifying namespaces.

Changes

Istio 1.1 is now the default ingress controller. The ingress controller runs in the gke-system namespace for both admin and user clusters. This enables easier TLS management for Ingress. To enable ingress, or to re-enable ingress after an upgrade, follow the instructions under Enabling ingress.

The gkectl tool no longer uses Minikube and KVM for bootstrapping. This means you do not have to enable nested virtualization on your admin workstation VM.

Known Issues

GKE On-Prem's ingress controller uses Istio 1.1 with automatic Secret discovery. However, the node agent for Secret discovery may fail to get Secret updates after Secret deletion. So avoid deleting Secrets. If you must delete a Secret and Ingress TLS fails afterwards, manually restart the Ingress Pod in the gke-system namespace.

April 11, 2019

GKE On-Prem beta version 1.2.1 is now available. This release includes the following changes:

New Features

GKE On-Prem clusters now automatically connect back to Google using Connect.

You can now run up to three control planes per user cluster.

Changes

gkectl now validates vSphere and F5 BIG-IP credentials creating clusters.

Known Issues

A regression causes gkectl diagnose snapshot commands to use the wrong SSH key, which prevents the command from collecting information from user clusters. As a workaround for support cases, you might need to SSH into individual user cluster nodes and manually gather data.

April 02, 2019

GKE On-Prem beta version 1.1.1 is now available. This release includes the following changes:

New Features

You now install GKE On-Prem with an Open Virtual Appliance (OVA), a pre-configured virtual machine image that includes several command-line interface tools. This change makes installations easier and removes a layer of virtualization. You no longer need to run gkectl inside a Docker container.

If you installed GKE On-Prem versions before beta-1.1.1, you should create a new admin workstation following the documented instructions. After you install the new admin workstation, copy over any SSH keys, configuration files, kubeconfigs, and any other files you need, from your previous workstation to the new one.

Added documentation for backing up and restoring clusters.

You can now configure authentication for clusters using OIDC and ADFS. To learn more, refer to Authenticating with OIDC and AD FS and Authentication.

Changes

You now must use an admin cluster's private key to run gkectl diagnose snapshot.

Added a configuration option during installation for deploying multi-master user clusters.

Connect documentation has been migrated.

Fixes

Fixed an issue where cluster networking could be interrupted when a node is removed unexpectedly.

Known Issues

GKE On-Prem's Configuration Management has been upgraded from version 0.11 to 0.13. Several components of the system have been renamed. You need to take some steps to clean up the previous versions' resources and install a new instance.

If you have an active instance of Configuration Management:

  1. Uninstall the instance:

    kubectl -n=nomos-system delete nomos --all

  2. Make sure that the instance's namespace has no resources:

    kubectl -n nomos-system get all

  3. Delete the namespace:

    kubectl delete ns nomos-system

  4. Delete the CRD:

    kubectl delete crd nomos.addons.sigs.k8s.io

  5. Delete all kube-system resources for the operator:

    kubectl -n kube-system delete all -l k8s-app=nomos-operator

If you don't have an active instance of Configuration Management:

  1. Delete the Configuration Management namespace:

    kubectl delete ns nomos-system

  2. Delete the CRD:

    kubectl delete crd nomos.addons.sigs.k8s.io

  3. Delete all kube-system resources for the operator:

    kubectl -n kube-system delete all -l k8s-app=nomos-operator

March 12, 2019

GKE On-Prem beta version 1.0.3 is now available. This release includes the following changes:

Fixes

Fixed an issue that caused Docker certificates to be saved to the wrong location.

March 04, 2019

GKE On-Prem beta version 1.0.2 is now available. This release includes the following changes:

New Features

You can now run gkectl version to check which version of gkectl you're running.

You can now upgrade user clusters to future beta versions.

Anthos Config Management version 0.11.6 is now available.

Stackdriver Logging is now enabled on each node. By default, the logging agent replicates logs to your GCP project for only control plane services, cluster API, vSphere controller, Calico, BIG-IP controller, Envoy proxy, Connect, Anthos Config Management, Prometheus and Grafana services, Istio control plane, and Docker. Application container logs are excluded by default, but can be optionally enabled.

Stackdriver Prometheus Sidecar captures metrics for the same components as the logging agent.

Kubernetes Network Policies are now supported.

Changes

You can now update IP blocks in the cluster specification to expand the IP range for a given cluster.

If clusters you installed during alpha were disconnected from Google after beta, you might need to connect them again. Refer to Registering a cluster.

Getting started has been updated with steps for activating your service account and running gkectl prepare.

gkectl diagnose snapshot now only collects configuration data and excludes logs.  This tool is used to capture details of your environment prior to opening a support case.

Support for optional SNAT pool name configuration for F5 BIG-IP at cluster-creation time. This can be used to configure "--vs-snat-pool-name" value on F5 BIG-IP controller.

You now need to provide a VIP for add-ons that run in the admin cluster.

Fixes

Cluster resizing operations improved to prevent unintended node deletion.

February 07, 2019

GKE On-Prem alpha version 1.3 is now available. This release includes the following changes:

New Features

During installation, you can now provide YAML files with nodeip blocks to configure static IPAM.

Changes

You now need to provision a 100GB disk in vSphere Datastore. GKE On-Prem uses the disk to store some of its vital data, such as etcd. See Data center requirements.

You can now only provide lowercase hostnames to nodeip blocks.

GKE On-Prem now enforces unique names for user clusters.

Metrics endpoints and APIs that use Istio endpoints are now secured using mTLS and role-based access control.

External communication by Grafana is disabled.

Improvements to Prometheus and Alertmanager health-checking.

Prometheus now uses secured port for scraping metrics.

Several updates to Grafana dashboards.

Known Issues

If your vCenter user account uses a format like DOMAINUSER, you might need to escape the backslash (DOMAIN\USER). Be sure to do this when prompted to enter the user account during installation.

January 23, 2019

GKE On-Prem alpha version 1.2.1 is now available. This release includes the following changes:

New Features

You can now use gkectl to delete admin clusters.

Changes

gkectl diagnose snapshot commands now allow you to specify nodes while capturing snapshots of remote command results and files.

January 14, 2019

GKE On-Prem alpha version 1.1.2 is now available. This release includes the following changes:

New Features

You can now use the gkectl prepare command to pull and push GKE On-Prem's container images, which deprecates the populate_registry.sh script.

gkectl prepare now prompts you to enter information about your vSphere cluster and resource pool.

You can now use the gkectl create command to create and add user clusters to existing admin control planes by passing in an existing kubeconfig file when prompted during cluster creation.

You can now pass in a Ingress TLS Secret for admin and user clusters at cluster creation time. You will see the following new prompt:

Do you want to use TLS for Admin Control Plane/User Cluster ingress?

Providing the TLS Secret and certs allows gkectl to set up the Ingress TLS. HTTP is not automatically disabled with TLS installation.

Changes

GKE On-Prem now runs Kubernetes version 1.11.2-gke.19.

The default footprint for GKE On-Prem has changed:

  • Minimum memory requirement for user cluster nodes is now 8192M.

GKE On-Prem now runs minikube version 0.28.0.

GKE Policy Management has been upgraded to version 0.11.1.

gkectl no longer prompts you to provide a proxy configuration by default.

There are three new ConfigMap resources in the user cluster namespace: cluster-api-etcd-metrics-config, kube-etcd-metrics-config, and kube-apiserver-config. GKE On-Prem uses these files to quickly bootstrap the metrics proxy container.

kube-apiserver events now live in their own etcd. You can see kube-etcd-events in your user cluster's namespace.

Cluster API controllers now use leader election.

vSphere credentials are now pulled from credential files.

gkectl diagnose commands now work with both admin and user clusters.

gkectl diagnose snapshot can now take snapshots of remote files on the node, results of remote commands on the nodes, and Prometheus queries.

gkectl diagnose snapshot can now take snapshots in multiple parallel threads.

gkectl diagnose snapshot now allows you to specify words to be excluded from the snapshot results.

Fixes

Fixed issues with minikube caching that caused unexpected network calls.

Fixed an issue with pulling F5 BIG-IP credentials. Credentials are now read from a credentials file instead of using environment variables.

Known Issues

You might encounter the following govmomi warning when you run gkectl prepare:

Warning: Line 102: Unable to parse 'enableMPTSupport' for attribute 'key' on element 'Config'

Resizing user clusters can cause inadvertent node deletion or recreation.

PersistentVolumes can fail to mount, producing the error devicePath is empty. As a workaround, delete and re-create the associated PersistentVolumeClaim.

Resizing IPAM address blocks if using static IP allocation for nodes, is not supported in alpha. To work around this, consider allocating more IP addresses than you currently need.

On slow disks, VM creation can timeout and cause deployments to fail. If this occurs, delete all resources and try again.

December 19, 2018

GKE On-Prem alpha 1.0.4 is now available. This release includes the following changes:

Fixes

The vulnerability caused by CVE-2018-1002105 has been patched.

November 30, 2018

GKE On-Prem alpha 1.0 is now available. The following changes are included in this release:

Changes

GKE On-Prem alpha 1.0 runs Kubernetes 1.11.

The default footprint for GKE On-Prem has changed:

  • The admin control plane runs three nodes, which use 4 CPUs and 16GB memory.
  • The user control plane runs one node that uses 4 CPUs 16GB memory.
  • User clusters run a minimum of three nodes, which use 4 CPUs and 16GB memory.

Support for high-availability Prometheus setup.

Support for custom Alert Manager configuration.

Prometheus upgraded from 2.3.2 to 2.4.3.

Grafana upgraded from 5.0.4 to 5.3.4.

kube-state-metrics upgraded from 1.3.1 to 1.4.0.

Alert Manager upgraded from 1.14.0 to 1.15.2.

node_exporter upgraded from 1.15.2 to 1.16.0.

Fixes

The vulnerability caused by CVE-2018-1002103 has been patched.

Known Issues

PersistentVolumes can fail to mount, producing the error devicePath is empty. As a workaround, delete and re-create the associated PersistentVolumeClaim.

Resizing IPAM address blocks if using static IP allocation for nodes, is not supported in alpha. To work around this, consider allocating more IP addresses than you currently need.

GKE On-Prem alpha 1.0 does not yet pass all conformance tests.

Only one user cluster per admin cluster can be created. To create additional user clusters, create another admin cluster.

October 31, 2018

GKE On-Prem EAP 2.1 is now available. The following changes are included in this release:

Changes

When you create admin and user clusters at the same time, you can now re-use the admin cluster's F5 BIG-IP credentials to create the user cluster. Also, the CLI now requires that BIG-IP credentials be provided; this requirement cannot be skipped using --dry-run.

F5 BIG-IP controller upgraded to use the latest OSS version, 1.7.0.

To improve stability for slow vSphere machines, cluster machine creation timeout is now 15 minutes (previously five minutes).

October 17, 2018

GKE On-Prem EAP 2.0 is now available. The following changes are included in this release:

Changes

Support for GKE Connect.

Support for Monitoring.

Support for installation using private registries.

Support for front-ending the L7 load-balancer as a L4 VIP on F5 BIG-IP.

Support for static IP allocation for nodes during cluster bootstrap.

Known Issues

Only one user cluster per admin cluster can be created. To create additional user clusters, create another admin cluster.

Cluster upgrades are not supported in EAP 2.0.

On slow disks, VM creation can timeout and cause deployments to fail. If this occurs, delete all resources and try again.

As part of the cluster bootstrapping process, a short-lived minikube instance is run. The minikube version used has security vulnerability CVE-2018-1002103.