GDCV for Bare Metal 1.16 release notes

This document lists production updates to GDCV for Bare Metal. We recommend that GKE on Bare Metal developers periodically check this list for any new announcements.

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly: https://cloud.google.com/feeds/anthos-bare-metal-release-notes.xml

April 25, 2024

Release 1.16.8

GKE on Bare Metal 1.16.8 is now available for download. To upgrade, see Upgrade clusters. GKE on Bare Metal 1.16.8 runs on Kubernetes 1.27.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on Bare Metal.

Fixes:

The following container image security vulnerabilities have been fixed in 1.16.8:

Known issues:

For information about the latest known issues, see GKE on Bare Metal known issues in the Troubleshooting section.

April 08, 2024

Release 1.16.7

GKE on Bare Metal 1.16.7 is now available for download. To upgrade, see Upgrade clusters. GKE on Bare Metal 1.16.7 runs on Kubernetes 1.27.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on Bare Metal.

Fixes:

  • Fixed an issue with configuring a proxy for your cluster that required you to manually set HTTPS_PROXY and NO_PROXY environment variables on the admin workstation.

The following container image security vulnerabilities have been fixed in 1.16.7:

Known issues:

For information about the latest known issues, see GKE on Bare Metal known issues in the Troubleshooting section.

April 03, 2024

A Denial-of-Service (DoS) vulnerability (CVE-2023-45288) was recently discovered in multiple implementations of the HTTP/2 protocol, including the golang HTTP server used by Kubernetes. The vulnerability could lead to a DoS of the Google Kubernetes Engine (GKE) control plane. For more information, see the GCP-2024-022 security bulletin.

February 20, 2024

Release 1.16.6

GKE on Bare Metal 1.16.6 is now available for download. To upgrade, see Upgrade clusters. GKE on Bare Metal 1.16.6 runs on Kubernetes 1.27.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on Bare Metal.

Fixes:

  • Fixed an issue where upgrades are blocked because cluster-operator can't delete stale, failing preflight check resources.

  • Cleaned up stale etcd-events membership to enhance control plane initialization reliability in the event of a node join failure.

Fixes:

The following container image security vulnerabilities have been fixed in 1.16.6:

Known issues:

For information about the latest known issues, see GKE on Bare Metal known issues in the Troubleshooting section.

January 31, 2024

Security bulletin (all minor versions)

A security vulnerability, CVE-2024-21626, has been discovered in runc where a user with permission to create Pods might be able to gain full access to the node filesystem.

For instructions and more details, see the GCP-2024-005 security bulletin.

January 30, 2024

Release 1.16.5

GKE on Bare Metal 1.16.5 is now available for download. To upgrade, see Upgrade clusters. GKE on Bare Metal 1.16.5 runs on Kubernetes 1.27.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on Bare Metal.

Known issues:

For information about the latest known issues, see GKE on Bare Metal known issues in the Troubleshooting section.

December 15, 2023

Release 1.16.4

GKE on Bare Metal 1.16.4 is now available for download. To upgrade, see Upgrade clusters. GKE on Bare Metal 1.16.4 runs on Kubernetes 1.27.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on Bare Metal.

Functionality changes:

  • Changed upgrade preflight checks behavior to skip kubeadm job creation check to improve upgrade reliability.

Supported node pool versions:

If you use selective worker node pool upgrades to upgrade a cluster to version 1.16.4, see Node pool versioning rules for a list of the versions that are supported for the worker node pools.

Fixes:

  • Fixed an issue where the network check ConfigMap wasn't being updated when nodes were added or removed.

  • Fixed an issue where excessive stackdriver-operator reconciliations resulted in high CPU usage.

Fixes:

The following container image security vulnerabilities have been fixed in 1.16.4:

Known issues:

For information about the latest known issues, see GKE on Bare Metal issues in the Troubleshooting section.

November 28, 2023

Release 1.16.3

GKE on Bare Metal 1.16.3 is now available for download. To upgrade, see Upgrade clusters. GKE on Bare Metal 1.16.3 runs on Kubernetes 1.27.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on Bare Metal.

Functionality changes:

  • Increased the certificate time to live (TTL) for metrics-providers-ca and stackdriver-prometheus-scrape for third-party monitoring.

Supported node pool versions:

If you use selective worker node pool upgrades to upgrade a cluster to version 1.16.4, see Node pool versioning rules for a list of the versions that are supported for the worker node pools.

Fixes:

  • Fixed an issue where CoreDNS Pods can get stuck in an unready state.

  • Fixed an issue that caused application metrics to be unavailable in Anthos clusters on bare metal versions 1.16.0 and 1.16.1.

Fixes:

The following container image security vulnerabilities have been fixed in 1.16.3:

Known issues:

For information about the latest known issues, see GKE on Bare Metal known issues in the Troubleshooting section.

October 30, 2023

Release 1.16.2

Anthos clusters on bare metal 1.16.2 is now available for download. To upgrade, see Upgrading Anthos on bare metal. Anthos clusters on bare metal 1.16.2 runs on Kubernetes 1.27.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on Bare Metal.

Functionality changes:

  • Increased the certificate time to live (TTL) for metrics-providers-ca and stackdriver-prometheus-scrape for third-party monitoring.

  • Removed hardcoded timeout value for the bmctl backup operation.

Supported node pool versions:

If you use selective worker node pool upgrades to upgrade a cluster to version 1.16.4, see Node pool versioning rules for a list of the versions that are supported for the worker node pools.

Fixes:

  • Fixed the spec.featureGates.annotationBasedApplicationMetrics feature gate in the stackdriver custom resource to enable collection of annotation-based workload metrics. This function is broken in Anthos clusters on bare metal versions 1.16.0 and 1.16.1.

  • Fixed a memory leak in Dataplane V2.

  • Fixed an issue where garbage collection deleted Source Network Address Translation (SNAT) entries for long-lived egress NAT connections, causing connection resets.

  • Fixed an issue that caused file and directory permissions to be set incorrectly after backing up and restoring a cluster.

  • Added direct dependencies on systemd, containerd, and kubelet over their mount point folders in /var/lib/.

  • Fixed an issue where etcd blocked upgrades due to an incorrect initial-cluster-state.

  • Fixed an issue that blocked upgrades to version 1.16 for clusters that have secure computing mode (seccomp) disabled.

The following container image security vulnerabilities have been fixed in release 1.16.2:

Known issues:

For information about the latest known issues, see Anthos clusters on bare metal known issues in the Troubleshooting section.

September 21, 2023

Release 1.16.1

Anthos clusters on bare metal 1.16.1 is now available for download. To upgrade, see Upgrading Anthos on bare metal. Anthos clusters on bare metal 1.16 runs on Kubernetes 1.27.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on Bare Metal.

Supported node pool versions:

If you use selective worker node pool upgrades to upgrade a cluster to version 1.16.4, see Node pool versioning rules for a list of the versions that are supported for the worker node pools.

Functionality changes:

  • Added the optional userClaim field to the ClientConfig custom resource definition bundled with Anthos clusters on bare metal. This change improves support for Azure AD integrations with Anthos Identity Service.

  • Updated constraint on NodePool spec.upgradeStrategy.concurrentNodes to be the smaller of either 15 nodes or 50% of the size of the node pool.

Fixes:

  • Fixed an issue where etcd blocked upgrades due to an incorrect initial-cluster-state.

  • Fixed an issue that blocked upgrades to version 1.16 for clusters that have secure computing mode (seccomp) disabled.

  • Fixed an issue to prevent cluster upgrades from starting on a node before either all Pods have been drained or the Pod draining timeout has been reached.

  • Fixed an issue where the memory resource requests value wasn't set properly for etcd-events.

Fixes:

The following container image security vulnerabilities have been fixed in 1.16.1:

Known issues:

For information about the latest known issues, see Anthos clusters on bare metal known issues in the Troubleshooting section.

August 25, 2023

Release 1.16.0

Anthos clusters on bare metal 1.16.0 is now available for download. To upgrade, see Upgrading Anthos on bare metal. Anthos clusters on bare metal 1.16.0 runs on Kubernetes 1.27.

If you use a third-party storage vendor, check the GDCV Ready storage partners document to make sure the storage vendor has already passed the qualification for this release of GKE on Bare Metal.

Version 1.13 end of life: In accordance with the Anthos Version Support Policy, version 1.13 (all patch releases) of Anthos clusters on bare metal has reached its end of life and is no longer supported.

Red Hat Enterprise Linux (RHEL) 8 minor versions 8.2, 8.3, 8.4, and 8.5 have reached their end of life. Please ensure you're using a supported version of your operating system.

Cluster lifecycle:

  • Upgraded to Kubernetes version 1.27.4.

  • Added support for Red Hat Enterprise Linux (RHEL) version 8.8.

  • GA: Added support for parallel upgrades of worker node pools.

  • GA: Added support to upgrade specific worker node pools separately from the rest of the cluster.

  • GA: Added a separate instance of etcd for the etcd-events object. This new etcd instance is always on and requires ports 2382 and 2383 to be open on control plane nodes for inbound TCP traffic. If these ports aren't opened, cluster creation and cluster upgrades are blocked.

  • GA: Updated preflight checks for cluster installation and upgrades to use changes from the latest Anthos clusters on bare metal patch version to address known issues and provide more useful checks.

  • GA: Support enrolling admin and user clusters in the Anthos On-Prem API automatically to enable cluster lifecycle management from the Google Cloud CLI, the Google Cloud console, and Terraform when the Anthos On-Prem API is enabled. If needed, you have the option to disable enrollment. For more information, see the description for the gkeOnPremAPI field in the cluster configuration file.

  • GA: Added ability to configure kubelet image pull settings for node pools. For more information, see Configure kubelet image pull settings.

  • Added new health check to detect any unsupported drift in the custom resources managed by Anthos clusters on bare metal. Unsupported resource changes can lead to cluster problems.

  • Added a new flag, --target-cluster-name, that is supported by the bmctl register bootstrap command.

Networking:

  • GA: Added support for Services of type LoadBalancer to use externalTrafficPolicy=Local with bundled load balancing with BGP.

  • Preview: Added support for enabling Direct Server Return (DSR) load balancing for clusters configured with flat-mode networking. DSR load balancing is enabled with an annotation, preview.baremetal.cluster.gke.io/dpv2-lbmode-dsr: enable.

  • Preview: Upgraded wherabouts to v0.6.1-gke.1 to support dual-stack networking.

  • Added support for multiple BGP load balancer (BGPLoadBalancer) resources and BGP Community. Multiple BGP load balancer resources provide more flexibility to define which peers advertise specific load balancer nodes and Services. BGP Community support helps you to distinguish routes coming from BGP load balancers from other routes in your network.

Observability:

Security and Identity:

  • GA: Added support for Binary Authorization, a service on Google Cloud that provides software supply-chain security for container-based applications. For more information, see Set up Binary Authorization policy enforcement.

  • GA: Added support for VPC Service Controls, which provides additional security for your clusters to help mitigate the risk of data exfiltration.

  • Preview: Added support for using custom cluster certificate authorities (CAs) to enable secure authentication and encryption between cluster components.

  • Preview: Added support for configuring the Subject Alternative Names (SANs) of the kubeadm generated certificate for the kube-apiserver.

  • Added support to run keepalived as a non-root user.

Supported node pool versions:

If you use selective worker node pool upgrades to upgrade a cluster to version 1.16.4, see Node pool versioning rules for a list of the versions that are supported for the worker node pools.

Functionality changes:

  • Updated constraint on NodePool spec.upgradeStrategy.concurrentNodes to be the smaller of 15 nodes or 50% of the size of the node pool.

  • Replaced legacy method of enabling application logging in the cluster configuration file with two fields, enableCloudLoggingForApplications and enableGMPForApplications, in the stackdriver custom resource.

    The spec.clusterOperations.enableApplication field in the cluster configuration file has no effect on version 1.16.0 and higher clusters. This field populated the enableStackdriverForApplications field in the stackdriver custom resource, which enabled annotation based workload metric collection. If you need this capability, use the annotationBasedApplicationMetrics feature gate in the stackdriver custom resource as shown in the following sample to keep the same behavior:

    kind:stackdriver
    spec:
      enableCloudLoggingForApplications: true
      featureGates:
         annotationBasedApplicationMetrics: true
    
  • Added optional ksmNodePodMetricsOnly feature gate in the stackdriver custom resource to reduce the number of metrics from kube-state-metrics. Reducing the number of metrics makes monitoring pipeline more stable in large scale clusters.

  • Audit logs are compressed on the wire for Cloud Audit Logs consumption, reducing egress bandwidth by approximately 60%.

  • Upgraded local volume provisioner to v2.5.0.

  • Upgraded snapshot controller to v5.0.1.

  • Deprecated v1beta1 volume snapshot custom resources. Anthos clusters on bare metal will stop serving v1beta1 resources in a future release.

  • Removed resource request limits on edge profile workloads.

  • Added preflight check to make sure control plane and load balancer nodes aren't under maintenance before an upgrade.

  • Updated the cluster snapshot capability so that information can be captured for the target cluster even when the cluster custom resource is missing or unavailable.

  • Improved bmctl error reporting for failures during the creation of a bootstrap cluster.

  • Added support for using the baremetal.cluster.gke.io/maintenance-mode-deadline-seconds cluster annotation to specify the maximum node draining duration, in seconds. By default, a 20-minute (1200 seconds) timeout is enforced. When the timeout elapses, all pods are stopped and the node is put into maintenance mode. For example to change the timeout to 10 minutes, add the annotation baremetal.cluster.gke.io/maintenance-mode-deadline-seconds: "600" to your cluster.

  • Updated bmctl check cluster to create a HealthCheck custom resource in the admin cluster if it's healthy.

Fixes:

  • Fixed an issue where the apiserver could become unresponsive during a cluster upgrade for clusters with a single control plane node.

  • Fixed an issue where cluster installations or upgrades fail when the cluster name has more than 45 characters.

  • Fixed an issue where the control plane VIP wasn't reachable during cluster installation on Red Hat Enterprise Linux.

  • Fixed an issue where audit logs were duplicated into the offline buffer even when they are sent to Cloud Audit Logs successfully.

  • Fixed an issue where node-specific labels set on the node pool were sometimes overwritten.

  • Updated avoidBuggyIPs and manualAssign fields in load balancer address pools (spec.loadBalancers.addressPools) to allow changes at any time.

  • Fixed an issue where containerd didn't restart when there was a version mismatch. This issue caused an inconsistent containerd version within the cluster.

  • Fixed an issue that caused the logging agent to use continuously increasing amounts of memory.

  • Fixed preflight check so that it no longer ignores the no_proxy setting.

  • Fixed Anthos Identity Service annotation needed for exporting metrics.

  • Fixed an issue that caused the bmctl restore command to stop responding for clusters with manually configured load balancers.

  • Fixed an issue that prevented Anthos clusters on bare metal from restoring a high-availability quorum for nodes that use /var/lib/etcd as a mountpoint.

  • Fixed an issue that caused health checks to report failure when they find a Pod with a status of TaintToleration even when the replicaset for the Pod has sufficient Pods running.

  • Fixed an issue that caused conflicts with third-party Ansible automation.

  • Fixed a cluster upgrade issue that prevented some control plane nodes from rejoining a cluster configured for high availability.

The following container image security vulnerabilities have been fixed:

Known issues:

For information about the latest known issues, see Anthos clusters on bare metal known issues in the Troubleshooting section.