Installation
1.6.1 version of bmctl cannot install 1.6.0 clusters
Installing a 1.6.0 version of Anthos on bare metal cluster cannot be done via
the 1.6.1 version of bmctl. bmctl 1.6.1 can only install the 1.6.1 version of
Anthos on bare metal clusters. This issue only applies to admin, hybrid, or
standalone clusters installed with bmctl
. Clusters installed and managed
through admin or hybrid clusters are not affected by this issue.
Control group v2 incompatibility
Control group v2,
or cgroup v2 for short, is incompatible with Anthos on bare metal 1.6.
Kubernetes 1.18 does not support cgroup v2. Also Docker
only offers experimental support as of 20.10. systemd
switched to cgroup v2 by default in version 247.2-2.
The presence of /sys/fs/cgroup/cgroup.controllers
would indicate that your
system is using cgroup v2.
Benign error messages during installation
During highly available (HA) cluster installation, you may see errors about
etcdserver leader change
. These error messages are benign and can be ignored.
When you use bmctl
for cluster installation, you may see a
Log streamer failed to get BareMetalMachine
log message at the very end
of the create-cluster.log
. This error message is benign and can be ignored.
When examining cluster creation logs, you may notice transient failures about registering clusters or calling webhooks. These errors can be safely ignored, because the installation will retry these operations until they succeed.
Preflight checks and service account credentials
For installations triggered by admin or hybrid clusters (in other words,
clusters not created with bmctl
, like user clusters), the preflight check does
not verify Google Cloud Platform service account credentials or their
associated permissions.
Creating cloud monitoring workspace before viewing dashboards
You need to create a cloud monitoring workspace through the Google Cloud Console before you can view any Anthos on bare metal monitoring dashboards,
Application default credentials and bmctl
bmctl
uses Application Default Credentials (ADC)
to validate the cluster operation's
location value in the cluster spec
when it is not set to global
.
For ADC to work, you need to either point the GOOGLE_APPLICATION_CREDENTIALS
environment variable to a service account credential file, or run
gcloud auth application-default login
.
Docker service
On cluster node machines, if the Docker executable is present in the PATH
environment variable, but the Docker service is not active, preflight check
will fail and report that the Docker service is not active
. To fix this error,
either remove Docker, or enable the Docker service.
Upgrading Anthos on bare metal
Upgrading is not available in the 1.6.0 release.
Reset
User cluster credentials
The bmctl reset
command relies on the top-level credentials section in the
cluster configuration file. For user clusters, you will need to manually
update the file to add the credentials section.
Mount points and fstab
Reset does not unmount the mount points under /mnt/anthos-system
and
/mnt/localpv-share/
. It also does not clean up the corresponding entries in
/etc/fstab
.
Security
The cluster CA/certificate will be rotated during upgrade. On-demand rotation support is not currently available.
Anthos on bare metal rotates kubelet
serving certificates automatically.
Each kubelet
node agent can send out a Certificate Signing Request (CSR) when
a certificate nears expiration. A controller in your admin clusters validates
and approves the CSR.
CSI snapshot webhook
does not handle certificate rotation. You need to restart
the snapshot webhook pod in order to load the new certificate after the current
one expires. As a short-term mitigation of this issue, the expiration period
is set to 1 year in the 1.6.0 release.
Networking
Bootstrap (kind) cluster IP addresses and cluster node IP addresses overlapping
192.168.122.0/24
and 10.96.0.0/27
are the default pod and service CIDRs used by
the bootstrap (kind) cluster. Preflight checks will fail if they overlap with
cluster node machine IP addresses. To avoid the conflict, you can pass
the --bootstrap-cluster-pod-cidr
and --bootstrap-cluster-service-cidr
flags
to bmctl
to specify different values.
Overlapping IP addresses across different clusters
There is no preflight check to validate overlapping IP addresses across different clusters.
hostport
feature in Anthos on bare metal
The hostport
feature in ContainerPort
is not currently supported.
Operating system endpoint limitations
On RHEL and CentOS, there is a cluster level limitation of 100,000 endpoints. This number is the sum of all pods that are referenced by a
Kubernetes service. If 2 services reference the same set of pods, this counts
as 2 separate sets of endpoints. The underlying nftable
implementation on
RHEL and CentOS causes this limitation; it is not an intrinsic limitation of
Anthos on bare metal.
Configuration
Control plane and node balancer specifications
The control plane
and load balancer
node pool specifications (control plane spec
and node pool spec
) are special. These specifications declare and
control critical cluster resources. The canonical source for these resources is
their respective sections in the
Consequently, do not modify the top-level control plane
and load balancer node pool resources directly. Modify the associated sections
in the cluster spec
instead.
Mutable fields in the cluster and node pool specification
Only the following fields in cluster spec
and node pool spec
can be updated after the cluster is created
(they are mutable fields):
For the
cluster spec
the following fields are mutable:anthosBareMetalVersion controlPlane.nodePoolSpec.nodes maintenanceBlocks bypassPreflightCheck nodeAccess
For the
node pool spec
, the following fields are mutable:nodes