Version 1.8. This version is supported as outlined in the Anthos version support policy, offering the latest patches and updates for security vulnerabilities, exposures, and issues impacting Anthos clusters on bare metal. For more details, see the release notes 1.8. This is the most recent version. For a complete list of each minor and patch release in chronological order, see the combined release notes.

Available versions: 1.8  |   1.7  |   1.6

Anthos clusters on bare metal known issues

Installation

Control group v2 incompatibility

Control group v2 (cgroup v2) is incompatible with Anthos clusters on bare metal 1.6. Kubernetes 1.18 does not support cgroup v2. Also Docker only offers experimental support as of 20.10. systemd switched to cgroup v2 by default in version 247.2-2. The presence of /sys/fs/cgroup/cgroup.controllers indicates that your system uses cgroup v2.

The preflight checks verify that cgroup v2 is not in use on the cluster machine.

Benign error messages during installation

When examining cluster creation logs, you may notice transient failures about registering clusters or calling webhooks. These errors can be safely ignored, because the installation will retry these operations until they succeed.

Preflight checks and service account credentials

For installations triggered by admin or hybrid clusters (in other words, clusters not created with bmctl, like user clusters), the preflight check does not verify Google Cloud Platform service account credentials or their associated permissions.

Preflight checks and permission denied

During installation you may see errors about /bin/sh: /tmp/disks_check.sh: Permission denied. These error messages are caused because /tmp is mounted with noexec option. For bmctl to work you need to remove noexec option from /tmp mount point.

Creating cloud monitoring workspace before viewing dashboards

You need to create a cloud monitoring workspace through the Google Cloud Console before you can view any Anthos clusters on bare metal monitoring dashboards,

Application default credentials and bmctl

bmctl uses Application Default Credentials (ADC) to validate the cluster operation's location value in the cluster spec when it is not set to global.

For ADC to work, you need to either point the GOOGLE_APPLICATION_CREDENTIALS environment variable to a service account credential file, or run gcloud auth application-default login.

Ubuntu 20.04 LTS and bmctl

On Anthos clusters on bare metal versions prior to 1.8.2, some Ubuntu 20.04 LTS distributions with a more recent Linux kernel (including GCP Ubuntu 20.04 LTS images on the 5.8 kernel) have made /proc/sys/net/netfilter/nf_conntrack_max read-only in non-init network namespaces. This prevents bmctl from setting the max connection tracking table size, which prevents the bootstrap cluster from starting. A symptom of the incorrect table size is that the kube-proxy Pod in the bootstrap cluster will crashloop as shown in the following sample error log:

kubectl logs -l k8s-app=kube-proxy -n kube-system --kubeconfig ./bmctl-workspace/.kindkubeconfig
I0624 19:05:08.009565       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 393216
F0624 19:05:08.009646       1 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

The workaround is to manually set net/netfilter/nf_conntrack_max to the needed value on the host: sudo sysctl net.netfilter.nf_conntrack_max=393216. Note that the needed value depends on the number of cores for the node. Use the above kubectl logs command shown above to confirm the desired value from kube-proxy logs.

This issue is fixed in Anthos clusters on bare metal release 1.8.2 and later.

Docker service

On cluster node machines, if the Docker executable is present in the PATH environment variable, but the Docker service is not active, preflight check will fail and report that the Docker service is not active. To fix this error, either remove Docker, or enable the Docker service.

Registry Mirror and Cloud Audit Logging

On Anthos clusters on bare metal versions prior to 1.8.2, the bmctl registry mirror package is missing the gcr.io/anthos-baremetal-release/auditproxy:gke_master_auditproxy_20201115_RC00 image. To enable the Cloud Audit Logging feature when using a registry mirror, you will need to manually download the missing image and push it to your registry server with the following commands:

docker pull gcr.io/anthos-baremetal-release/auditproxy:gke_master_auditproxy_20201115_RC00
docker tag gcr.io/anthos-baremetal-release/auditproxy:gke_master_auditproxy_20201115_RC00 REGISTRY_SERVER/anthos-baremetal-release/auditproxy:gke_master_auditproxy_20201115_RC00
docker push REGISTRY_SERVER/anthos-baremetal-release/auditproxy:gke_master_auditproxy_20201115_RC00

Containerd requires /usr/local/bin in PATH

Clusters with the containerd runtime require /usr/local/bin to be in the SSH user's PATH for the kubeadm init command to find the crictl binary. If crictl can't be found, cluster creation fails.

When you aren't logged in as the root user, sudo is used to run the kubeadm init command. The sudo PATH can differ from the root profile and may not contain /usr/local/bin.

Fix this error by updating the secure_path in /etc/sudoers to include /usr/local/bin. Alternatively, create a symbolic link for crictl in another /bin directory.

Starting with 1.8.2, Anthos clusters on bare metal adds /usr/local/bin to the PATH when running commands. However, running snapshot as nonroot user will still contain crictl: command not found (which can be fixed by workaround above).

Installing on vSphere

When installing Anthos clusters on bare metal on vSphere VMs, you must set the tx-udp_tnl-segmentation and tx-udp_tnl-csum-segmentation flags to off. These flags are related to the hardware segmentation offload done by the vSphere driver VMXNET3 and they don't work with the GENEVE tunnel of Anthos clusters on bare metal.

Run the following command on each node to check the current values for these flags. ethtool -k NET_INTFC |grep segm ... tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on ... Replace NET_INTFC with the network interface associated with the IP address of the node.

Sometimes in RHEL 8.4, ethtool shows these flags are off while they aren't. To explicitly set these flags to off, toggle the flags on and then off with the following commands.

ethtool -K ens192 tx-udp_tnl-segmentation on
ethtool -K ens192 tx-udp_tnl-csum-segmentation on

ethtool -K ens192 tx-udp_tnl-segmentation off
ethtool -K ens192 tx-udp_tnl-csum-segmentation off

This flag change does not persist across reboots. Configure the startup scripts to explicitly set these flags when the system boots.

Flapping node readiness

Clusters may occasionally exhibit flapping node readiness (node status changing rapidly between Ready and NotReady) behavior. An unhealthy Pod Lifecycle Event Generator (PLEG) causes this behavior. The PLEG is a module in kubelet.

To confirm an unhealthy PLEG is causing this behavior, use the following journalctl command to check for PLEG log entries:

journalctl -f | grep -i pleg

Log entries like the following indicate the PLEG is unhealthy:

...
skipping pod synchronization - PLEG is not healthy: pleg was last seen active
3m0.793469
...

A known runc race condition. is the probable cause of the unhealthy PLEG. Stuck runc processes are a symptom of the race condition. Use the following command to check the runc init process status:

ps aux | grep 'runc init'

To fix this issue:

  1. Run the following commands on each node to install the latest containerd.io and extract the latest runc command-line tool:

    Ubuntu

    sudo apt update
    sudo apt install containerd.io
    # Back up current runc
    cp /usr/local/sbin/runc ~/
    sudo cp /usr/bin/runc /usr/local/sbin/runc
    
    # runc version should be > 1.0.0-rc93
    /usr/local/sbin/runc --version
    

    CentOS/RHEL

    sudo dnf install containerd.io
    # Back up current runc
    cp /usr/local/sbin/runc ~/
    sudo cp /usr/bin/runc /usr/local/sbin/runc
    
    # runc version should be > 1.0.0-rc93
    /usr/local/sbin/runc --version
    
  2. Reboot the node if there are stuck runc init processes.

    Alternatively, you can clean up any stuck processes manually.

Upgrading Anthos clusters on bare metal

Upgrades to 1.8.0 and 1.8.1 admin, hybrid, and standalone clusters don't complete

Upgrading admin, hybrid, or standalone clusters from version 1.7.x to version 1.8.0 or 1.8.1 fails to complete sometimes. This upgrade failure applies to clusters that you have updated after cluster creation.

An indication of this upgrade problem is the console output Waiting for upgrade to complete ... with no mention of which node is being upgraded. This symptom also indicates that your admin cluster has been successfully upgraded to Kubernetes version v1.20.8-gke.1500, the Kubernetes version for Anthos clusters on bare metal releases 1.8.0 and 1.8.1.

This upgrade issue is fixed for Anthos clusters on bare metal release 1.8.2.

To confirm whether this issue impacts your cluster upgrade to 1.8.0 or 1.8.1:

  1. Create the following shell script:

    if [ $(kubectl get cluster <var>CLUSTER\_NAME -n <var>CLUSTER\_NAMESPACE
        --kubeconfig bmctl-workspace/.kindkubeconfig -o=jsonpath='{.metadata.generation}')
        -le $(kubectl get cluster CLUSTER_NAME -n CLUSTER_NAMESPACE
        --kubeconfig bmctl-workspace/.kindkubeconfig
        -o=jsonpath='{{.status.systemServiceConditions[?(@.type=="Reconciling")].observedGeneration}}') ];
        then echo "Bug Detected"; else echo "OK"; fi
    

    Replace the following:

    • CLUSTER_NAME: the name of the cluster being checked.
    • CLUSTER_NAMESPACE: the namespace for the cluster.
  2. Run the script while the upgrade is in process, but after the preflight checks have completed.

    When the observedGeneration value is not less than the generation value, Bug Detected is written to the console output. This output indicates that your cluster upgrade is affected.

  3. To unblock the upgrade, run the following command:

    kubectl get --raw=/apis/baremetal.cluster.gke.io/v1/namespaces/CLUSTER_NAMESPACE/clusters/CLUSTER_NAME/status \
        --kubeconfig bmctl-workspace/.kindkubeconfig | \
        sed -e 's/\("systemServiceConditions":\[{[^{]*"type":"DashboardReady"}\),{[^{}]*}/\1/g' | \
        kubectl replace --raw=/apis/baremetal.cluster.gke.io/v1/namespaces/CLUSTER_NAMESPACE/clusters/CLUSTER_NAME/status \
        --kubeconfig bmctl-workspace/.kindkubeconfig -f-
    

    Replace the following:

    • CLUSTER_NAME: the name of the cluster being checked.
    • CLUSTER_NAMESPACE: the namespace for the cluster.

Upgrades to 1.8.3 or 1.8.4

Upgrading Anthos clusters on bare metal to version 1.8.3 or 1.8.4 sometimes fails with a nil Context error. If your cluster upgrade fails with a nil Context error, perform the following steps to complete the upgrade:

  1. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to your service account key file.

    export GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH
    

    Replace KEY_PATH with the path of the JSON file that contains your service account key.

  2. Run the bmctl upgrade cluster command, again.

User cluster patch upgrade limitation

User clusters that are managed by an admin cluster must be at the same Anthos clusters on bare metal version or lower and within one minor release. For example, a version 1.7.1 (anthosBareMetalVersion: 1.7.1) admin cluster managing version 1.6.2 user clusters is acceptable.

An upgrade limitation prevents you from upgrading your user clusters to a new security patch when the patch is released after the release version the admin cluster is using. For example, if your admin cluster is at version 1.7.2, which was released on June 2, 2021, you can't upgrade your user clusters to version 1.6.4, because it was released on August 13, 2021.

Ubuntu 18.04 and 18.04.1 incompatibility

To upgrade to 1.8.1 or 1.8.2, cluster node machines and the workstation that runs bmctl need to have Linux kernel version 4.17.0 or newer. Otherwise the anetd networking controller will not work. The symptom is that pods with anet prefix in kube-system namespace will continue to crash with the following error message: BPF NodePort services needs kernel 4.17.0 or newer.

This issue affects Ubuntu 18.04 and 18.04.1, since they are on kernel version 4.15.

This issue has been fixed in Anthos clusters on bare metal 1.8.3.

Upgrading 1.7.x clusters that use containerd

Cluster upgrades to 1.8.x are blocked for 1.7.x clusters that are configured to use the preview containerd capability. The containerd preview uses the incorrect control group (cgroup) driver cgroupfs, instead of the recommended systemd driver. There are reported cases of cluster instability when clusters that use the cgroupfs driver are put under resource pressure. The GA containerd capability in release 1.8.0 uses the correct systemd driver.

If you have existing 1.7.x clusters that use the preview containerd container runtime feature, we recommend that you create new 1.8.0 clusters configured for containerd and migrate any existing apps and workloads. This ensures the highest cluster stability when using the containerd container runtime.

SELinux upgrade failures

Upgrading 1.7.1 clusters configured with the containerd container runtime and running SELinux on RHEL or CentOS will fail. We recommend that you create new 1.8.0 clusters configured to use containerd and migrate your workloads.

Node draining can't start when Node is out of reach

The draining process for Nodes won't start if the Node is out of reach from Anthos clusters on bare metal. For example, if a Node goes offline during a cluster upgrade process, it may cause the upgrade to stop responding. This is a rare occurrence. To minimize the likelyhood of encountering this problem, ensure your Nodes are operating properly before initiating an upgrade.

Reset/Deletion

Namespace deletion

Deleting a namespace will prevent new resources from being created in that namespace, including jobs to reset machines. When deleting a user cluster, you must delete the cluster object first before deleting its namespace. Otherwise, the jobs to reset machines cannot get created, and the deletion process will skip the machine clean-up step.

containerd service

The bmctl reset command doesn't delete any containerd configuration files or binaries. The containerd systemd service is left up and running. The command deletes the containers running pods scheduled to the node.

Security

The cluster CA/certificate will be rotated during upgrade. On-demand rotation support is a Preview feature.

Anthos clusters on bare metal rotates kubelet serving certificates automatically. Each kubelet node agent can send out a Certificate Signing Request (CSR) when a certificate nears expiration. A controller in your admin clusters validates and approves the CSR.

Cluster CA Rotation (Preview Feature)

After you perform a user cluster certificate authority (CA) rotation on a cluster, all user authentication flows fail. These failures occur because the ClientConfig custom resource used in authentication flows isn't being updated with the new CA data during CA rotation. If you have performed a cluster CA rotation on your cluster, check to see if the certificateAuthorityData field in default ClientConfig of the kube-public namespace contains the older cluster CA.

To resolve the issue manually, update the certificateAuthorityData field with the current cluster CA.

Networking

Modifying firewalld will erase Cilium iptable policy chains

When running Anthos clusters on bare metal with firewalld enabled on either CentOS or Red Had Enterprise Linux (RHEL), changes to firewalld can remove the Cilium iptables chains on the host network. The iptables chains are added by the anetd Pod when it is started. The loss of the Cilium iptables chains causes the Pod on the Node to lose network connectivity outside of the Node.

Changes to firewalld that will remove the iptables chains include, but aren't limited to:

  • Restarting firewalld, using systemctl
  • Reloading the firewalld with the command line client (firewall-cmd --reload)

You can fix this connectivity issue by restarting anetd on the Node. Locate and delete the anetd Pod with the following commands to restart anetd:

kubectl get pods -n kube-system
kubectl delete pods -n kube-system ANETD_XYZ

Replace ANETD_XYZ with the name of the anetd Pod.

Duplicate egressSourceIP addresses

When using the egress NAT gateway feature preview, it is possible to set traffic selection rules that specify an egressSourceIP address that is already in use for another EgressNATPolicy object. This may cause egress traffic routing conflicts. Coordinate with your development team to determine which floating IP addresses are available for use before specifying the egressSourceIP address in your EgressNATPolicy custom resource.

Pod connectivity failures and reverse path filtering

Anthos clusters on bare metal configures reverse path filtering on nodes to disable source validation (net.ipv4.conf.all.rp_filter=0). If therp_filter setting is changed to 1 or 2, pods will fail due to out-of-node communication timeouts.

Reverse path filtering is set with rp_filter files in the IPv4 configuration folder (net/ipv4/conf/all). This value may also be overridden by sysctl, which stores reverse path filtering settings in a network security configuration file, such as /etc/sysctl.d/60-gce-network-security.conf.

To restore Pod connectivity, either set net.ipv4.conf.all.rp_filter back to 0 manually, or restart the anetd Pod to set net.ipv4.conf.all.rp_filter back to 0. To restart the anetd Pod, use the following commands to locate and delete the anetd Pod and a new anetd Pod will start up in its place:

kubectl get pods -n kube-system
kubectl delete pods -n kube-system ANETD_XYZ

Replace ANETD_XYZ with the name of the anetd Pod.

Bootstrap (kind) cluster IP addresses and cluster node IP addresses overlapping

192.168.122.0/24 and 10.96.0.0/27 are the default pod and service CIDRs used by the bootstrap (kind) cluster. Preflight checks will fail if they overlap with cluster node machine IP addresses. To avoid the conflict, you can pass the --bootstrap-cluster-pod-cidr and --bootstrap-cluster-service-cidr flags to bmctl to specify different values.

Overlapping IP addresses across different clusters

There is no validation for overlapping IP addresses across different clusters during update. The validation only applies at cluster/node pool creation time.

Operating system endpoint limitations

On RHEL and CentOS, there is a cluster level limitation of 100,000 endpoints. This number is the sum of all pods that are referenced by a Kubernetes service. If 2 services reference the same set of pods, this counts as 2 separate sets of endpoints. The underlying nftable implementation on RHEL and CentOS causes this limitation; it is not an intrinsic limitation of Anthos clusters on bare metal.

Configuration

Control plane and load balancer specifications

The control plane and load balancer node pool specifications are special. These specifications declare and control critical cluster resources. The canonical source for these resources is their respective sections in the cluster config file:

  • spec.controlPlane.nodePoolSpec
  • spec.LoadBalancer.nodePoolSpec

Consequently, do not modify the top-level control plane and load balancer node pool resources directly. Modify the associated sections in the cluster config file instead.

Anthos VM Runtime

  • Restarting a pod causes the VMs on the pod to change IP addresses or lose their IP address altogether. If the IP address of a VM changes, this does not affect the reachability of VM applications exposed as a Kubernetes service. If the IP address is lost, you must run dhclient from the VM to acquire an IP address for the VM.

SELinux

SELinux errors during pod creation

Pod creation sometimes fails when SELinux prevents the container runtime from setting labels on tmpfs mounts. This failure is rare, but can happen when SELinux is in Enforcing mode and in some kernels.

To verify that SELinux is the cause of pod creation failures, use the following command to check for errors in the kubelet logs:

journalctl -u kubelet

If SELinux is causing pod creation to fail, the command response contains an error similar to the following:

error setting label on mount source '/var/lib/kubelet/pods/
6d9466f7-d818-4658-b27c-3474bfd48c79/volumes/kubernetes.io~secret/localpv-token-bpw5x':
failed to set file label on /var/lib/kubelet/pods/
6d9466f7-d818-4658-b27c-3474bfd48c79/volumes/kubernetes.io~secret/localpv-token-bpw5x:
permission denied

To verify that this issue is related to SELinux enforcement, run the following command:

ausearch -m avc

This command searches the audit logs for access vector cache (AVC) permission errors. The avc: denied in the following sample response confirms that the pod creation failures are related to SELinux enforcement.

type=AVC msg=audit(1627410995.808:9534): avc:  denied  { associate } for
pid=20660 comm="dockerd" name="/" dev="tmpfs" ino=186492
scontext=system_u:object_r:container_file_t:s0:c61,c201
tcontext=system_u:object_r:locale_t:s0 tclass=filesystem permissive=0

The root cause of this pod creation problem with SELinux is a kernel bug found in the following Linux images:

  • Red Hat Enterprise Linux (RHEL) releases prior to 8.3
  • CentOS releases prior to 8.3

Rebooting the machine helps recover from the issue.

To prevent pod creation errors from occurring, use RHEL 8.3 or later or CentOS 8.3 or later, because those versions have fixed the kernel bug.

Snapshots

Taking a snapshot as a non-root login user

For Anthos clusters on bare metal versions 1.8.1 and earlier, if you aren't logged in as root, you can't take a cluster snapshot with the bmctl command. Starting with release 1.8.2, Anthos clusters on bare metal will respect nodeAccess.loginUser in the cluster spec. If the admin cluster is unreachable, you can specify the login user with the --login-user flag.

Note that if you use containerd as the container runtime, snapshot still fails to run crictl commands. See Containerd requires /usr/local/bin in PATH for a workaround. The PATH settings used for SUDO cause this problem.

GKE Connect

Crash looping gke-connect-agent Pod

Heavy usage of GKE Connect gateway can sometimes result in gke-connect-agent Pod out-of-memory problems. Symptoms of these out-of-memory issues include:

  • The gke-connect-agent Pod shows a high number of restarts or ends up in crash looping state.
  • The connect gateway stops functioning.

To address this out-of-memory problem, edit the deployment with prefix gke-connect-agent under the gke-connect namespace and raise the memory limit to 256 MiB or higher.

kubectl patch deploy $(kubectl get deploy -l app=gke-connect-agent -n gke-connect -o jsonpath='{.items[0].metadata.name}') -n gke-connect --patch '{"spec":{"containers":[{"resources":{"limits":{"memory":"256Mi"}}}]}}'

This problem is fixed in Anthos clusters on bare metal release 1.8.2 and later.