Control group v2 incompatibility
Control group v2
(cgroup v2) is incompatible with Anthos clusters on bare metal 1.6.
Kubernetes 1.18 does not support cgroup v2. Also Docker
only offers experimental support as of 20.10. systemd
switched to cgroup v2 by default in version 247.2-2.
The presence of
/sys/fs/cgroup/cgroup.controllers indicates that your
system uses cgroup v2.
Starting with Anthos clusters on bare metal 1.6.2, the preflight checks verify that cgroup v2 is not in use on the cluster machine.
Benign error messages during installation
During highly available (HA) cluster installation, you may see errors about
etcdserver leader change. These error messages are benign and can be ignored.
When you use
bmctl for cluster installation, you may see a
Log streamer failed to get BareMetalMachine log message at the very end
create-cluster.log. This error message is benign and can be ignored.
When examining cluster creation logs, you may notice transient failures about registering clusters or calling webhooks. These errors can be safely ignored, because the installation will retry these operations until they succeed.
Preflight checks and service account credentials
For installations triggered by admin or hybrid clusters (in other words,
clusters not created with
bmctl, like user clusters), the preflight check does
not verify Google Cloud Platform service account credentials or their
Application default credentials and
bmctl uses Application Default Credentials (ADC)
to validate the cluster operation's
location value in the
cluster spec when it is not set to
For ADC to work, you need to either point the
environment variable to a service account credential file, or run
gcloud auth application-default login.
On cluster node machines, if the Docker executable is present in the
environment variable, but the Docker service is not active, preflight check
will fail and report that the
Docker service is not active. To fix this error,
either remove Docker, or enable the Docker service.
Upgrading Anthos clusters on bare metal
Upgrading is not available in the 1.6.0 release.
User cluster credentials
bmctl reset command relies on the top-level credentials section in the
cluster configuration file. For user clusters, you will need to manually
update the file to add the credentials section.
Mount points and
Reset does not unmount the mount points under
/mnt/localpv-share/. It also does not clean up the corresponding entries in
Deleting a namespace will prevent new resources from being created in that namespace, including jobs to reset machines. When deleting a user cluster, you must delete the cluster object first before deleting its namespace. Otherwise, the jobs to reset machines cannot get created, and the deletion process will skip the machine clean-up step.
The cluster CA/certificate will be rotated during upgrade. On-demand rotation support is not currently available.
Anthos clusters on bare metal rotates
kubelet serving certificates automatically.
kubelet node agent can send out a Certificate Signing Request (CSR) when
a certificate nears expiration. A controller in your admin clusters validates
and approves the CSR.
Pod connectivity failures and reverse path filtering
Anthos clusters on bare metal configures reverse path filtering on nodes to disable
source validation (
net.ipv4.conf.all.rp_filter=0). If the
is changed to
2, pods will fail due to out-of-node communication
Reverse path filtering is set with
rp_filter files in the IPv4 configuration
net/ipv4/conf/all). This value may also be overridden by
which stores reverse path filtering settings in a network security configuration
file, such as
To restore Pod connectivity, either set
net.ipv4.conf.all.rp_filter back to
0 manually, or restart the
anetd Pod to set
0. To restart the
anetd Pod, use the following commands to locate
and delete the
anetd Pod and a new
anetd Pod will start up in its place:
kubectl get pods -n kube-system kubectl delete pods -n kube-system ANETD_XYZ
Replace ANETD_XYZ with the name of the
Bootstrap (kind) cluster IP addresses and cluster node IP addresses overlapping
10.96.0.0/27 are the default pod and service CIDRs used by
the bootstrap (kind) cluster. Preflight checks will fail if they overlap with
cluster node machine IP addresses. To avoid the conflict, you can pass
bmctl to specify different values.
Overlapping IP addresses across different clusters
There is no preflight check to validate overlapping IP addresses across different clusters.
hostport feature in Anthos clusters on bare metal
hostport feature in
is not currently supported.
Operating system endpoint limitations
On RHEL and CentOS, there is a cluster level limitation of 100,000 endpoints. This number is the sum of all pods that are referenced by a
Kubernetes service. If 2 services reference the same set of pods, this counts
as 2 separate sets of endpoints. The underlying
nftable implementation on
RHEL and CentOS causes this limitation; it is not an intrinsic limitation of
Anthos clusters on bare metal.
Control plane and load balancer specifications
The control plane and load balancer node pool specifications are special. These specifications declare and control critical cluster resources. The canonical source for these resources is their respective sections in the cluster config file:
Consequently, do not modify the top-level control plane and load balancer node pool resources directly. Modify the associated sections in the cluster config file instead.
Mutable fields in the cluster and node pool specification
Currently, only the following cluster and node pool specification fields in the cluster config file can be updated after the cluster is created (they are mutable fields):
kind: Cluster), the following fields are mutable:
kind: NodePool), the following fields are mutable:
Under certain load conditions, Anthos clusters on bare metal 1.6.x nodes may display a
NotReady status due to
Pod Lifecycle Event Generator
(PLEG) being unhealthy. The node status will contain the following entry:
PLEG is not healthy: pleg was last seen active XXXmXXXs ago; threshold is 3m0s
How do I know if I'm affected?
A likely cause of this issue is the runc binary version. To confirm if you have the problematic version installed, connect to one of the cluster machines using SSH and run:
If the output is
1.0.0-rc93, then you have the problematic version installed.
To resolve this issue, we recommend upgrading to Anthos clusters on bare metal 1.7.0 or a later version.
If upgrading is not an option, you can revert the
containerd.io package to an
earlier version on the problematic node machines. To do this, connect to the
node machine using SSH and run:
apt install containerd.io=1.4.3-1
dnf install containerd.io-1.3.9-3.1.el8.