Anthos clusters on bare metal known issues

Stay organized with collections Save and categorize content based on your preferences.

Select your Anthos clusters on bare metal version:

Select your problem category:

Or, search for you issue:

Category Identified version(s) Issue and workaround
Upgrades and updates 1.13

Some version 1.12 clusters with the Docker container runtime can't upgrade to version 1.13

If a version 1.12 cluster that uses the Docker container runtime is missing the following annotation, it can't upgrade to version 1.13:

baremetal.cluster.gke.io/allow-docker-container-runtime:  "true"

If you're affected by this issue, bmctl writes the following error in the upgrade-cluster.log file inside the bmctl-workspace folder:

Operation failed, retrying with backoff. Cause: error creating
"baremetal.cluster.gke.io/v1, Kind=Cluster": admission webhook
"vcluster.kb.io" denied the request: Spec.NodeConfig.ContainerRuntime:
Forbidden: Starting with Anthos Bare Metal version 1.13 Docker container
runtime will not be supported. Before 1.13 please set the containerRuntime
to containerd in your cluster resources.

Although highly discouraged, you can create a cluster with Docker node pools
until 1.13 by passing the flag "--allow-docker-container-runtime" to bmctl
create cluster or add the annotation "baremetal.cluster.gke.io/allow-docker-
container-runtime: true" to the cluster configuration file.

This is most likely to occur with version 1.12 Docker clusters that were upgraded from 1.11, as that upgrade doesn't require the annotation to maintain the Docker container runtime. In this case, clusters don't have the annotation when upgrading to 1.13. Note that starting with version 1.13, containerd is the only permitted container runtime.

Workaround:

If you're affected by this problem, update the cluster resource with the missing annotation. You can add the annotation either while the upgrade is running or after canceling and before retrying the upgrade.

Installation 1.11

bmctl exits before cluster creation completes

Cluster creation may fail for Anthos clusters on bare metal version 1.11.0 (this issue is fixed in Anthos clusters on bare metal release 1.11.1). In some cases, the bmctl create cluster command exits early and writes errors like the following to the logs:

Error creating cluster: error waiting for applied resources: provider cluster-api watching namespace USER_CLUSTER_NAME not found in the target cluster

Workaround

The failed operation produces artifacts, but the cluster isn't operational. If this issue affects you, use the following steps to clean up artifacts and create a cluster:

Installation 1.11, 1.12

Installation reports VM runtime reconciliation error

The cluster creation operation may report an error similar to the following:

I0423 01:17:20.895640 3935589 logs.go:82]  "msg"="Cluster reconciling:" "message"="Internal error occurred: failed calling webhook \"vvmruntime.kb.io\": failed to call webhook: Post \"https://vmruntime-webhook-service.kube-system.svc:443/validate-vm-cluster-gke-io-v1vmruntime?timeout=10s\": dial tcp 10.95.5.151:443: connect: connection refused" "name"="xxx" "reason"="ReconciliationError"

Workaround

This error is benign and you can safely ignore it.

Installation 1.10, 1.11, 1.12

Cluster creation fails when using multi-NIC, containerd, and HTTPS proxy

Cluster creation fails when you have the following combination of conditions:

  • Cluster is configured to use containerd as the container runtime (nodeConfig.containerRuntime set to containerd in the cluster configuration file, the default for Anthos clusters on bare metal version 1.11).
  • Cluster is configured to provide multiple network interfaces, multi-NIC, for pods (clusterNetwork.multipleNetworkInterfaces set to true in the cluster configuration file).
  • Cluster is configured to use a proxy (spec.proxy.url is specified in the cluster configuration file). Even though cluster creation fails, this setting is propagated when you attempt to create a cluster. You may see this proxy setting as an HTTPS_PROXY environment variable or in your containerd configuration (/etc/systemd/system/containerd.service.d/09-proxy.conf).

Workaround

Append service CIDRs (clusterNetwork.services.cidrBlocks) to the NO_PROXY environment variable on all node machines.

Installation 1.10, 1.11, 1.12

Failure on systems with restrictive umask setting

Anthos clusters on bare metal release 1.10.0 introduced a rootless control plane feature that runs all the control plane components as a non-root user. Running all components as a non-root user may cause installation or upgrade failures on systems with a more restrictive umask setting of 0077.


Workaround

Reset the control plane nodes and change the umask setting to 0022 on all the control plane machines. After the machines have been updated, retry the installation.

Alternatively, you can change the directory and file permissions of /etc/kubernetes on the control-plane machines for the installation or upgrade to proceed.

  • Make /etc/kubernetes and all its subdirectories world readable: chmod o+rx.
  • Make all the files owned by root user under the directory (recursively) /etc/kubernetes world readable (chmod o+r). Exclude private key files (.key) from these changes as they are already created with correct ownership and permissions.
  • Make /usr/local/etc/haproxy/haproxy.cfg world readable.
  • Make /usr/local/etc/bgpadvertiser/bgpadvertiser-cfg.yaml world readable.
Installation 1.10, 1.11, 1.12, 1.13

Control group v2 incompatibility

Control group v2 (cgroup v2) is not supported in Anthos clusters on bare metal. The presence of /sys/fs/cgroup/cgroup.controllers indicates that your system uses cgroup v2.


Workaround

The preflight checks verify that cgroup v2 is not in use on the cluster machine.

Installation 1.10, 1.11, 1.12, 1.13

Benign error messages during installation

When examining cluster creation logs, you may notice transient failures about registering clusters or calling webhooks.


Workaround

These errors can be safely ignored, because the installation will retry these operations until they succeed.

Installation 1.10, 1.11, 1.12, 1.13

Preflight checks and service account credentials

For installations triggered by admin or hybrid clusters (in other words, clusters not created with bmctl, like user clusters), the preflight check does not verify Google Cloud service account credentials or their associated permissions.

Installation 1.10, 1.11, 1.12, 1.13

Application default credentials and bmctl

bmctl uses Application Default Credentials (ADC) to validate the cluster operation's location value in the cluster spec when it is not set to global.


Workaround

For ADC to work, you need to either point the GOOGLE_APPLICATION_CREDENTIALS environment variable to a service account credential file, or run gcloud auth application-default login.

Installation 1.10, 1.11, 1.12, 1.13

Docker service

On cluster node machines, if the Docker executable is present in the PATH environment variable, but the Docker service is not active, preflight check will fail and report that the Docker service is not active.


Workaround

Remove Docker, or enable the Docker service.

Installation 1.10, 1.11, 1.12, 1.13

Installing on vSphere

When installing Anthos clusters on bare metal on vSphere VMs, you must set the tx-udp_tnl-segmentation and tx-udp_tnl-csum-segmentation flags to off. These flags are related to the hardware segmentation offload done by the vSphere driver VMXNET3 and they don't work with the GENEVE tunnel of Anthos clusters on bare metal.


Workaround

Run the following command on each node to check the current values for these flags:

ethtool -k NET_INTFC |grep segm
...
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
...
Replace NET_INTFC with the network interface associated with the IP address of the node. Sometimes in RHEL 8.4, ethtool shows these flags are off while they aren't. To explicitly set these flags to off, toggle the flags on and then off with the following commands:
ethtool -K ens192 tx-udp_tnl-segmentation on
ethtool -K ens192 tx-udp_tnl-csum-segmentation on

ethtool -K ens192 tx-udp_tnl-segmentation off
ethtool -K ens192 tx-udp_tnl-csum-segmentation off
This flag change does not persist across reboots. Configure the startup scripts to explicitly set these flags when the system boots.

Upgrades and updates 1.10

bmctl can't create, update, or reset lower version user clusters

The bmctl CLI can't create, update, or reset a user cluster with a lower minor version, regardless of the admin cluster version. For example, you can't use bmctl with a version of 1.N.X to reset a user cluster of version 1.N-1.Y, even if the admin cluster is also at version 1.N.X.

If you are affected by this issue, you should see the logs similar to the following when you use bmctl:

[2022-06-02 05:36:03-0500] error judging if the cluster is managing itself: error to parse the target cluster: error parsing cluster config: 1 error occurred:
    * cluster version 1.8.1 is not supported in bmctl version 1.9.5, only cluster version 1.9.5 is supported

Workaround:

Use kubectl to create, edit, or delete the user cluster custom resource inside the admin cluster.

The ability to upgrade user clusters is unaffected.

Upgrades and updates 1.12

Cluster upgrades to version 1.12.1 may stall

Upgrading clusters to version 1.12.1 sometimes stalls due to the API server becoming unavailable. This issue affects all cluster types and all supported operating systems. When this issue occurs, the bmctl upgrade clustercommand can fail at multiple points, including during the second phase of preflight checks.


Workaround

You can check your upgrade logs to determine if you are affected by this issue. Upgrade logs are located in /baremetal/bmctl-workspace/CLUSTER_NAME/log/upgrade-cluster-TIMESTAMP by default. The upgrade-cluster.log may contain errors like the following:

Failed to upgrade cluster: preflight checks failed: preflight check failed
The machine log may contain errors like the following (repeated failures indicate that you are affected by this issue):
FAILED - RETRYING: Query CNI health endpoint (30 retries left).
FAILED - RETRYING: Query CNI health endpoint (29 retries left).
FAILED - RETRYING: Query CNI health endpoint (28 retries left).
...
HAProxy and Keepalived must be running on each control plane node before you reattempt to upgrade your cluster to version 1.12.1. Use the crictl command-line interface on each node to check to see if the haproxy and keepalived containers are running:
docker/crictl ps | grep haproxy
docker/crictl ps | grep keepalived
If either HAProxy or Keepalived isn't running on a node, restart kubelet on the node:

systemctl restart kubelet
Upgrades and updates 1.11, 1.12

Upgrading clusters to version 1.12.0 or higher fails when Anthos VM Runtime is enabled

In Anthos clusters on bare metal release 1.12.0, all resources related to Anthos VM Runtime are migrated to the vm-system namespace to better support the Anthos VM Runtime GA release. If you have Anthos VM Runtime enabled in a version 1.11.x or lower cluster, upgrading to version 1.12.0 or higher fails unless you first disable Anthos VM Runtime. When you're affected by this issue, the upgrade operation reports the following error:

Failed to upgrade cluster: cluster is not upgradable with vmruntime enabled from version 1.11.x to version 1.12.0: please disable VMruntime before upgrade to 1.12.0 and higher version

Workaround

To disable Anthos VM Runtime:

  1. Edit the VMRuntime custom resource:
    kubectl edit vmruntime
    
  2. Set enabled to false in the spec:
    apiVersion: vm.cluster.gke.io/v1
    kind: VMRuntime
    metadata:
      name: vmruntime
    spec:
      enabled: false
    ...
    
  3. Save the custom resource in your editor.
  4. Once the cluster upgrade is complete, re-enable Anthos VM Runtime.

For more information, see Working with VM-based workloads.

Upgrades and updates 1.10, 1.11, 1.12

Upgrade stuck at error during manifests operations

In some situations, cluster upgrades fail to complete and the bmctl CLI becomes unresponsive. This problem can be caused by an incorrectly updated resource. To determine if you're affected by this issue and to correct it, check the anthos-cluster-operator logs and look for errors similar to the following entries:

controllers/Cluster "msg"="error during manifests operations" "error"="1 error occurred:
...
{RESOURCE_NAME} is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update
These entries are a symptom of an incorrectly updated resource, where {RESOURCE_NAME} is the name of the problem resource.


Workaround

If you find these errors in your logs, complete the following steps:

  1. Use kubectl edit to remove the kubectl.kubernetes.io/last-applied-configuration annotation from the resource contained in the log message.
  2. Save and apply your changes to the resource.
  3. Retry the cluster upgrade.
Upgrades and updates 1.10, 1.11, 1.12

Upgrades are blocked for clusters with features that use Anthos Network Gateway

Cluster upgrades from 1.10.x to 1.11.x fail for clusters that use either egress NAT gateway or bundled load-balancing with BGP. These features both use Anthos Network Gateway. Cluster upgrades get stuck at the Waiting for upgrade to complete... command-line message and the anthos-cluster-operator logs errors like the following:

apply run failed ...
MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable...

Workaround

To unblock the upgrade, run the following commands against the cluster you are upgrading:

kubectl -n kube-system delete deployment \
    ang-controller-manager-autoscaler

kubectl -n kube-system delete deployment \
    ang-controller-manager

kubectl -n kube-system delete ds ang-node
Upgrades and updates 1.10, 1.11, 1.12, 1.13

bmctl update doesn't remove maintenance blocks

The bmctl update command can't remove or modify the maintenanceBlocks section from the cluster resource configuration.


Workaround

For more information, including instructions for removing nodes from maintenance mode, see Put nodes into maintenance mode.

Upgrades and updates 1.10, 1.11, 1.12, 1.13

Node draining can't start when Node is out of reach

The draining process for Nodes won't start if the Node is out of reach from Anthos clusters on bare metal. For example, if a Node goes offline during a cluster upgrade process, it may cause the upgrade to stop responding. This is a rare occurrence.


Workaround

To minimize the likelyhood of encountering this problem, ensure your Nodes are operating properly before initiating an upgrade.

Upgrades and updates 1.12

containerd 1.5.13 requires libseccomp 2.5 or higher

Anthos clusters on bare metal release 1.12.1 ships with containerd version 1.5.13, and this version of containerd requires libseccomp version 2.5 or higher.

If your system doesn't have libseccomp version 2.5 or higher installed, update it in advance of upgrading existing clusters to version 1.12.1. Otherwise, you may see errors in cplb-update Pods for load balancer nodes such as the following:

runc did not terminate successfully: runc: symbol lookup error: runc: undefined symbol: seccomp_notify_respond

Workaround

To install the latest version of libseccomp in Ubuntu, run the following command:

sudo apt-get install  libseccomp-dev

To install the latest version of libseccomp in CentOS or RHEL, run the following command:

sudo dnf -y install libseccomp-devel
Operation 1.10, 1.11, 1.12

Nodes uncordoned if you don't use the maintenance mode procedure

If you run Anthos clusters on bare metal version 1.12.0 (anthosBareMetalVersion: 1.12.0) or lower and manually use kubectl cordon on a node, Anthos clusters on bare metal might uncordon the node before you're ready in an effort to reconcile the expected state.


Workaround

For Anthos clusters on bare metal version 1.12.0 and lower, use maintenance mode to cordon and drain nodes safely.

In version 1.12.1 (anthosBareMetalVersion: 1.12.1) or higher, Anthos clusters on bare metal won't uncordon your nodes unexpectedly when you use kubectl cordon.

Operation 1.11

Version 11 admin clusters using a registry mirror can't manage version 1.10 clusters

If your admin cluster is on version 1.11 and uses a registry mirror, it can't manage user clusters that are on a lower minor version. This issue affects reset, update, and upgrade operations on the user cluster.

To determine whether this issue affects you, check your logs for cluster operations, such as create, upgrade, or reset. These logs are located in the bmctl-workspace/CLUSTER_NAME/ folder by default. If you're affected by the issue, your logs contain the following error message:

flag provided but not defined: -registry-mirror-host-to-endpoints
Operation 1.10, 1.11

kubeconfig Secret overwritten

The bmctl check cluster command, when run on user clusters, overwrites the user cluster kubeconfig Secret with the admin cluster kubeconfig. Overwriting the file causes standard cluster operations, such as updating and upgrading, to fail for affected user clusters. This problem applies to Anthos clusters on bare metal versions 1.11.1 and earlier.

To determine if this issue affects a user cluster, run the following command:

kubectl --kubeconfig ADMIN_KUBECONFIG \
  get secret -n USER_CLUSTER_NAMESPACE \
  USER_CLUSTER_NAME -kubeconfig \
  -o json  | jq -r '.data.value'  | base64 -d

Replace the following:

  • ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.
  • USER_CLUSTER_NAMESPACE: the namespace for the cluster. By default, the cluster namespaces for Anthos clusters on bare metal are the name of the cluster prefaced with cluster-. For example, if you name your cluster test, the default namespace is cluster-test.
  • USER_CLUSTER_NAME: the name of the user cluster to check.

If the cluster name in the output (see contexts.context.cluster in the following sample output) is the admin cluster name, then the specified user cluster is affected.

user-cluster-kubeconfig  -o json  | \
    jq -r '.data.value' | base64 -d
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data:LS0tLS1CRU...UtLS0tLQo=
    server: https://10.200.0.6:443
  name: ci-aed78cdeca81874
contexts:
- context:
    cluster: ci-aed78cdeca81
    user: ci-aed78cdeca81-admin
  name: ci-aed78cdeca81-admin@ci-aed78cdeca81
current-context: ci-aed78cdeca81-admin@ci-aed78cdeca81
kind: Config
preferences: {}
users:
- name: ci-aed78cdeca81-admin
  user:
    client-certificate-data: LS0tLS1CRU...UtLS0tLQo=
    client-key-data: LS0tLS1CRU...0tLS0tCg==

Workaround

The following steps restore function to an affected user cluster (USER_CLUSTER_NAME):

  1. Locate the user cluster kubeconfig file. Anthos clusters on bare metal generates the kubeconfig file on the admin workstation when you create a cluster. By default, the file is in the bmctl-workspace/USER_CLUSTER_NAME directory.
  2. Verify the kubeconfig is correct user cluster kubeconfig:
    kubectl get nodes \
      --kubeconfig PATH_TO_GENERATED_FILE
    
    Replace PATH_TO_GENERATED_FILE with the path to the user cluster kubeconfig file. The response returns details about the nodes for the user cluster. Confirm the machine names are correct for your cluster.
  3. Run the following command to delete the corrupted kubeconfig file in the admin cluster:
    kubectl delete secret \
      -n USER_CLUSTER_NAMESPACE \
      USER_CLUSTER_NAME-kubeconfig
    
  4. Run the following command to save the correct kubeconfig secret back to the admin cluster:
    kubectl create secret generic \
      -n USER_CLUSTER_NAMESPACE \
      USER_CLUSTER_NAME-kubeconfig \
      --from-file=value=PATH_TO_GENERATED_FILE
    
Operation 1.10, 1.11, 1.12, 1.13

Taking a snapshot as a non-root login user

If you use containerd as the container runtime, running snapshot as non-root user requires /usr/local/bin to be in the user's PATH. Otherwise it will fail with a crictl: command not found error.

When you aren't logged in as the root user, sudo is used to run the snapshot commands. The sudo PATH can differ from the root profile and may not contain /usr/local/bin.


Workaround

Update the secure_path in /etc/sudoers to include /usr/local/bin. Alternatively, create a symbolic link for crictl in another /bin directory.

Operation 1.10, 1.11, 1.12, 1.13

Anthos VM Runtime - Restarting a pod causes the VMs on the pod to change IP addresses or lose their IP address altogether.

If the IP address of a VM changes, this does not affect the reachability of VM applications exposed as a Kubernetes service.


Workaround

If the IP address is lost, you must run dhclient from the VM to acquire an IP address for the VM.

Logging and monitoring 1.10

stackdriver-log-forwarder has [parser:cri] invalid time format warning logs

If the container runtime interface (CRI) parser uses an incorrect regular expression for parsing time, the logs for the stackdriver-log-forwarder Pod contain errors and warnings like the following:

[2022/03/04 17:47:54] [error] [parser] time string length is too long
[2022/03/04 20:16:43] [ warn] [parser:cri] invalid time format %Y-%m-%dT%H:%M:%S.%L%z for '2022-03-04T20:16:43.680484387Z'

Workaround:

Logging and monitoring 1.10, 1.11, 1.12, 1.13

Unexpected monitoring billing

For Anthos clusters on bare metal versions 1.10 and 1.11, some customers have found unexpectedly high billing for Metrics volume on the Billing page. This issue affects you only when both of the following circumstances apply:

  • Application logging and monitoring is enabled (enableStackdriverForApplications=true)
  • Application Pods have the prometheus.io/scrap=true annotation

To confirm whether you are affected by this issue, list your user-defined metrics. If you see billing for unwanted metrics, then this issue applies to you.


Workaround

To ensure you don't get billed extra for Metrics volume when you use application logging and monitoring, use the following steps:

  1. Find the source Pods and Services that have the unwanted billed metrics.
    kubectl --kubeconfig KUBECONFIG \
      get pods -A -o yaml | grep 'prometheus.io/scrape: "true"'
    kubectl --kubeconfig KUBECONFIG get \
      services -A -o yaml | grep 'prometheus.io/scrape: "true"'
    
  2. Remove the prometheus.io/scrap=true annotation from the Pod or Service.
Logging and monitoring 1.11, 1.12, 1.13

Edits to metrics-server-config aren't persisted

High pod density can, in extreme cases, create excessive logging and monitoring overhead, which can cause Metrics Server to stop and restart. You can edit the metrics-server-config ConfigMap to allocate more resources to keep Metrics Server running. However, due to reconciliation, edits made to metrics-server-config can get reverted to the default value during a cluster update or upgrade operation. Metrics Server isn't affected immediately, but the next time it restarts, it picks up the reverted ConfigMap and is vulnerable to excessive overhead, again.


Workaround

As a workaround, you can script the ConfigMap edit and perform it along with updates or upgrades to the cluster.

Logging and monitoring 1.11, 1.12

Deprecated metrics affects Cloud Monitoring dashboard

Several Anthos metrics have been deprecated and, starting with Anthos clusters on bare metal release 1.11, data is no longer collected for these deprecated metrics. If you use these metrics in any of your alerting policies, there won't be any data to trigger the alerting condition.

The following table lists the individual metrics that have been deprecated and the metric that replaces them.

Deprecated metrics Replacement metric
kube_daemonset_updated_number_scheduled kube_daemonset_status_updated_number_scheduled
kube_node_status_allocatable_cpu_cores
kube_node_status_allocatable_memory_bytes
kube_node_status_allocatable_pods
kube_node_status_allocatable
kube_node_status_capacity_cpu_cores
kube_node_status_capacity_memory_bytes
kube_node_status_capacity_pods
kube_node_status_capacity

In Anthos clusters on bare metal releases before 1.11, the policy definition file for the recommended Anthos on baremetal node cpu usage exceeds 80 percent (critical) alert uses the deprecated metrics. The node-cpu-usage-high.json JSON definition file is updated for releases 1.11.0 and later.


Workaround

Use the following steps to migrate to the replacement metrics:

  1. In the Google Cloud console, select Monitoring or click the following button:
    Go to Monitoring
  2. In the navigation pane, select Dashboards, and delete the Anthos cluster node status dashboard.
  3. Click the Sample library tab and reinstall the Anthos cluster node status dashboard.
  4. Follow the instructions in Creating alerting policies to create a policy using the updated node-cpu-usage-high.json policy definition file.
Logging and monitoring 1.10

stackdriver-log-forwarder has CrashloopBackOff errors

In some situations, the fluent-bit logging agent can get stuck processing corrupt chunks. When the logging agent is unable to bypass corrupt chunks, you may observe that stackdriver-log-forwarder keeps crashing with a CrashloopBackOff error. If you are having this problem, your logs have entries like the following

[2022/03/09 02:18:44] [engine] caught signal (SIGSEGV)
#0  0x5590aa24bdd5      in  validate_insert_id() at plugins/out_stackdriver/stackdriver.c:1232
#1  0x5590aa24c502      in  stackdriver_format() at plugins/out_stackdriver/stackdriver.c:1523
#2  0x5590aa24e509      in  cb_stackdriver_flush() at plugins/out_stackdriver/stackdriver.c:2105
#3  0x5590aa19c0de      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:490
#4  0x5590aa6889a6      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#5  0xffffffffffffffff  in  ???() at ???:0

Workaround:

Clean up the buffer chunks for the Stackdriver Log Forwarder.

Note: In the following commands, replace KUBECONFIG with the path to the admin cluster kubeconfig file.

  1. Terminate all stackdriver-log-forwarder pods:
        kubectl --kubeconfig KUBECONFIG -n kube-system patch daemonset \
            stackdriver-log-forwarder -p \
            '{"spec": {"template": {"spec": {"nodeSelector": {"non-existing": "true"}}}}}'
    
    Verify that the stackdriver-log-forwarder pods are deleted before going to the next step.
  2. Deploy the following DaemonSet to clean up any corrupted data in fluent-bit buffers:
    kubectl --kubeconfig KUBECONFIG -n kube-system apply -f - << EOF
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: fluent-bit-cleanup
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          app: fluent-bit-cleanup
      template:
        metadata:
          labels:
            app: fluent-bit-cleanup
        spec:
          containers:
          - name: fluent-bit-cleanup
            image: debian:10-slim
            command: ["bash", "-c"]
            args:
            - |
              rm -rf /var/log/fluent-bit-buffers/
              echo "Fluent Bit local buffer is cleaned up."
              sleep 3600
            volumeMounts:
            - name: varlog
              mountPath: /var/log
            securityContext:
              privileged: true
          tolerations:
          - key: "CriticalAddonsOnly"
            operator: "Exists"
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
          - key: node-role.gke.io/observability
            effect: NoSchedule
          volumes:
          - name: varlog
            hostPath:
              path: /var/log
    EOF
    
  3. Use the following commands to verify that the DaemonSet has cleaned up all the nodes:
    kubectl --kubeconfig KUBECONFIG logs \
        -n kube-system -l \
        app=fluent-bit-cleanup | grep "cleaned up" | wc -l
    
    kubectl --kubeconfig KUBECONFIG -n \
        kube-system get pods -l \
        app=fluent-bit-cleanup --no-headers | wc -l
    
    The output of the two commands should be equal to the number of nodes in your cluster.
  4. Delete the cleanup DaemonSet:
    kubectl --kubeconfig KUBECONFIG -n \
        kube-system delete ds fluent-bit-cleanup
    
  5. Restart the log forwarder pods:
    kubectl --kubeconfig KUBECONFIG \
        -n kube-system patch daemonset \
        stackdriver-log-forwarder --type json \
        -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector/non-existing"}]'
    
Logging and monitoring 1.10, 1.11, 1.12, 1.13

Unknown metric data in Cloud Monitoring

The data in Cloud Monitoring for version 1.10.x clusters may contain irrelevant summary metrics entries such as the following:

Unknown metric: kubernetes.io/anthos/go_gc_duration_seconds_summary_percentile

Other metrics types that may have irrelevant summary metrics include:

  • apiserver_admission_step_admission_duration_seconds_summary
  • go_gc_duration_seconds
  • scheduler_scheduling_duration_seconds
  • gkeconnect_http_request_duration_seconds_summary
  • alertmanager_nflog_snapshot_duration_seconds_summary

While these summary type metrics are in the metrics list, they are not supported by gke-metrics-agent at this time.

Logging and monitoring 1.10, 1.11

Intermittent metrics export interruptions

Anthos clusters on bare metal may experience interruptions in normal, continuous exporting of metrics, or missing metrics on some nodes. If this issue affects your clusters, you may see gaps in data for the following metrics (at a minimum):

  • kubernetes.io/anthos/container_memory_working_set_bytes
  • kubernetes.io/anthos/container_cpu_usage_seconds_total
  • kubernetes.io/anthos/container_network_receive_bytes_total

Workaround

Upgrade your clusters to version 1.11.1 or later.

If you can't upgrade, perform the following steps as a workaround:

  1. Open your stackdriver resource for editing:
    kubectl -n kube-system edit stackdriver stackdriver
    
  2. To increase the CPU request for gke-metrics-agent from 10m to 50m, add the following resourceAttrOverride section to the stackdriver manifest:
    spec:
      resourceAttrOverride:
        gke-metrics-agent/gke-metrics-agent:
          limits:
            cpu: 100m
            memory: 4608Mi
          requests:
            cpu: 50m
            memory: 200Mi
    
    Your edited resource should look similar to the following:
    spec:
      anthosDistribution: baremetal
      clusterLocation: us-west1-a
      clusterName: my-cluster
      enableStackdriverForApplications: true
      gcpServiceAccountSecretName: ...
      optimizedMetrics: true
      portable: true
      projectID: my-project-191923
      proxyConfigSecretName: ...
      resourceAttrOverride:
        gke-metrics-agent/gke-metrics-agent:
          limits:
            cpu: 100m
            memory: 4608Mi
          requests:
            cpu: 50m
            memory: 200Mi
    
  3. Save your changes and close the text editor.
  4. To verify your changes have taken effect, run the following command:
    kubectl -n kube-system get daemonset \
        gke-metrics-agent -o yaml | grep "cpu: 50m"
    
    The command finds cpu: 50m if your edits have taken effect.
Networking 1.10

Multiple default gateways breaks connectivity to external endpoints

Having multiple default gateways in a node can lead to broken connectivity from within a Pod to external endpoints, such as google.com.

To determine if you're affected by this issue, run the following command on the node:

ip route show

Multiple instances of default in the response indicate that you're affected.

Networking 1.12

Networking custom resource edits on user clusters get overwritten

Anthos clusters on bare metal version 1.12.x doesn't prevent you from manually editing networking custom resources in your user cluster. Anthos clusters on bare metal reconciles custom resources in the user clusters with the custom resources in your admin cluster during cluster upgrades. This reconciliation overwrites any edits made directly to the networking custom resources in the user cluster. The networking custom resources should be modified in the admin cluster only, but Anthos clusters on bare metal version 1.12.x doesn't enforce this requirement.

Advanced networking features, such as bundled load balancing with BGP, egress NAT gateway, SR-IOV networking, flat-mode with BGP, and multi-NIC for Pods use the following custom resources:

  • BGPLoadBalancer
  • BGPPeer
  • NetworkGatewayGroup
  • NetworkAttachmentDefinition
  • ClusterCIDRConfig
  • FlatIPMode

You edit these custom resources in your admin cluster and the reconciliation step applies the changes to your user clusters.


Workaround

If you've modified any of the previously mentioned custom resources on a user cluster, modify the corresponding custom resources on your admin cluster to match before upgrading. This step ensures that your configuration changes are preserved. Anthos clusters on bare metal versions 1.13.0 and higher prevent you from modifying the networking custom resources on your user clusters directly.

Networking 1.11, 1.12, 1.13

NAT failure with too many parallel connections

For a given node in your cluster, the node IP address provides network address translation (NAT) for packets routed to an address outside of the cluster. Similarly, when inbound packets enter a load-balancing node configured to use bundled load balancing (spec.loadBalancer.mode: bundled), source network address translation (SNAT) routes the packets to the node IP address before they are forwarded on to a backend Pod.

The port range for NAT used by Anthos clusters on bare metal is 3276865535. This range limits the number of parallel connections to 32,767 per protocol on that node. Each connection needs an entry in the conntrack table. If you have too many short-lived connections, the conntrack table runs out of ports for NAT. A garbage collector cleans up the stale entries, but the cleanup isn't immediate.

When the number of connections on your node approaches 32,767, you will start seeing packet drops for connections that need NAT.

You can identify this problem by running the following command on the anetd Pod on the problematic node:

kubectl -n kube-system anetd-XXX -- hubble observe \
    --from-ip $IP --to-ip $IP -f

You should see errors of the following form:

No mapping for NAT masquerade DROPPED

Workaround

Redistribute your traffic to other nodes.

Networking 1.10, 1.11, 1.12, 1.13

Client source IP with bundled Layer 2 load balancing

Setting the external traffic policy to Local can cause routing errors, such as No route to host for bundled Layer 2 load balancing. The external traffic policy is set to Cluster (externalTrafficPolicy: Cluster), by default. With this setting, Kubernetes handles cluster-wide traffic. Services of type LoadBalancer or NodePort can use externalTrafficPolicy: Local to preserve the client source IP address. With this setting, however, Kubernetes only handles node-local traffic.


Workaround

If you want to preserve the client source IP address, additional configuration may be required to ensure service IPs are reachable. For configuration details, see Preserving client source IP address in Configure bundled load balancing.

Networking 1.10, 1.11, 1.12, 1.13

Modifying firewalld will erase Cilium iptable policy chains

When running Anthos clusters on bare metal with firewalld enabled on either CentOS or Red Had Enterprise Linux (RHEL), changes to firewalld can remove the Cilium iptables chains on the host network. The iptables chains are added by the anetd Pod when it is started. The loss of the Cilium iptables chains causes the Pod on the Node to lose network connectivity outside of the Node.

Changes to firewalld that will remove the iptables chains include, but aren't limited to:

  • Restarting firewalld, using systemctl
  • Reloading the firewalld with the command line client (firewall-cmd --reload)

Workaround

Restart anetd on the Node. Locate and delete the anetd Pod with the following commands to restart anetd:

kubectl get pods -n kube-system
kubectl delete pods -n kube-system ANETD_XYZ

Replace ANETD_XYZ with the name of the anetd Pod.

Networking 1.10, 1.11, 1.12, 1.13

Duplicate egressSourceIP addresses

When using the egress NAT gateway feature preview, it is possible to set traffic selection rules that specify an egressSourceIP address that is already in use for another EgressNATPolicy object. This may cause egress traffic routing conflicts.


Workaround

Coordinate with your development team to determine which floating IP addresses are available for use before specifying the egressSourceIP address in your EgressNATPolicy custom resource.

Networking 1.10, 1.11, 1.12, 1.13

Pod connectivity failures and reverse path filtering

Anthos clusters on bare metal configures reverse path filtering on nodes to disable source validation (net.ipv4.conf.all.rp_filter=0). If the rp_filter setting is changed to 1 or 2, pods will fail due to out-of-node communication timeouts.

Reverse path filtering is set with rp_filter files in the IPv4 configuration folder (net/ipv4/conf/all). This value may also be overridden by sysctl, which stores reverse path filtering settings in a network security configuration file, such as /etc/sysctl.d/60-gce-network-security.conf.


Workaround

To restore Pod connectivity, either set net.ipv4.conf.all.rp_filter back to 0 manually, or restart the anetd Pod to set net.ipv4.conf.all.rp_filter back to 0. To restart the anetd Pod, use the following commands to locate and delete the anetd Pod and a new anetd Pod will start up in its place:

kubectl get pods -n kube-system
kubectl delete pods -n kube-system ANETD_XYZ

Replace ANETD_XYZ with the name of the anetd Pod.

Networking 1.10, 1.11, 1.12, 1.13

Bootstrap (kind) cluster IP addresses and cluster node IP addresses overlapping

192.168.122.0/24 and 10.96.0.0/27 are the default pod and service CIDRs used by the bootstrap (kind) cluster. Preflight checks will fail if they overlap with cluster node machine IP addresses.


Workaround

To avoid the conflict, you can pass the --bootstrap-cluster-pod-cidr and --bootstrap-cluster-service-cidr flags to bmctl to specify different values.

Operating system 1.11

Incompatibility with Ubuntu 18.04.6 on GA kernel

Anthos clusters on bare metal versions 1.11.0 and 1.11.1 aren't compatible with Ubuntu 18.04.6 on the GA kernel (from 4.15.0-144-generic to 4.15.0-176-generic. The incompatibility causes the networking agent to fail to configure the cluster network with a "BPF program is too large" error in the anetd logs. You may see pods stuck in ContainerCreating status with a networkPlugin cni failed to set up pod error in the pods event log. This issue doesn't apply to the Ubuntu Hardware Enablement (HWE) kernels.


Workaround

We recommend that you get the HWE kernel and upgrade it to the latest supported HWE version for Ubuntu 18.04.

Operating system 1.10, 1.11, 1.12, 1.13

Cluster creation or upgrade fails on CentOS

In December 2020, the CentOS community and Red Hat announced the sunset of CentOS. On January 31, 2022, CentOS 8 reached its end of life (EOL). As a result of the EOL, yum repositories stopped working for CentOS, which causes cluster creation and cluster upgrade operations to fail. This applies to all supported versions of CentOS and affects all versions of Anthos clusters on bare metal.


Workaround

Operating system 1.10, 1.11, 1.12, 1.13

Operating system endpoint limitations

On RHEL and CentOS, there is a cluster level limitation of 100,000 endpoints. Kubernetes service. If 2 services reference the same set of pods, this counts as 2 separate sets of endpoints. The underlying nftable implementation on RHEL and CentOS causes this limitation; it is not an intrinsic limitation of Anthos clusters on bare metal.

Security 1.10, 1.11, 1.12, 1.13

Container can't write to VOLUME defined in Dockerfile with containerd and SELinux

If you use containerd as the container runtime and your operating system has SELinux enabled, the VOLUME defined in the application Dockerfile might not be writable. For example, containers built with the following Dockerfile aren't able to write to the /tmp folder.

FROM ubuntu:20.04
RUN chmod -R 777 /tmp
VOLUME /tmp

To verify if you're affected by this issue, run the following command on the node that hosts the problematic container:

ausearch -m avc

If you're affected by this issue, you see a denied error like the following:

time->Mon Apr  4 21:01:32 2022 type=PROCTITLE
msg=audit(1649106092.768:10979): proctitle="bash"
type=SYSCALL msg=audit(1649106092.768:10979):
arch=c000003e syscall=257 success=no exit=-13
a0=ffffff9c a1=55eeba72b320 a2=241 a3=1b6 items=0
ppid=75712 pid=76042 auid=4294967295 uid=0 gid=0
euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0
ses=4294967295 comm="bash" exe="/usr/bin/bash"
subj=system_u:system_r:container_t:s0:c701,c935
key=(null) type=AVC msg=audit(1649106092.768:10979):
avc:  denied { write }
for  pid=76042 comm="bash"
name="aca03d7bb8de23c725a86cb9f50945664cb338dfe6ac19ed0036c"
dev="sda2" ino=369501097 scontext=system_u:system_r:
container_t:s0:c701,c935 tcontext=system_u:object_r:
container_ro_file_t:s0 tclass=dir permissive=0 

Workaround

To work around this issue, make either of the following changes:

  • Turn off SELinux.
  • Don't use the VOLUME feature inside Dockerfile.
Security 1.10, 1.11, 1.12, 1.13

SELinux errors during pod creation

Pod creation sometimes fails when SELinux prevents the container runtime from setting labels on tmpfs mounts. This failure is rare, but can happen when SELinux is in Enforcing mode and in some kernels.

To verify that SELinux is the cause of pod creation failures, use the following command to check for errors in the kubelet logs:

journalctl -u kubelet

If SELinux is causing pod creation to fail, the command response contains an error similar to the following:

error setting label on mount source '/var/lib/kubelet/pods/6d9466f7-d818-4658-b27c-3474bfd48c79/volumes/kubernetes.io~secret/localpv-token-bpw5x': failed to set file label on /var/lib/kubelet/pods/6d9466f7-d818-4658-b27c-3474bfd48c79/volumes/kubernetes.io~secret/localpv-token-bpw5x: permission denied

To verify that this issue is related to SELinux enforcement, run the following command

ausearch -m avc

This command searches the audit logs for access vector cache (AVC) permission errors. The avc: denied in the following sample response confirms that the pod creation failures are related to SELinux enforcement.

type=AVC msg=audit(1627410995.808:9534): avc:  denied { associate } for pid=20660 comm="dockerd" name="/" dev="tmpfs" ino=186492 scontext=system_u:object_r: container_file_t:s0:c61,c201 tcontext=system_u: object_r:locale_t:s0 tclass=filesystem permissive=0

The root cause of this pod creation problem with SELinux is a kernel bug found in the following Linux images:

  • Red Hat Enterprise Linux (RHEL) releases prior to 8.3
  • CentOS releases prior to 8.3

Workaround

Rebooting the machine helps recover from the issue.

To prevent pod creation errors from occurring, use RHEL 8.3 or later or CentOS 8.3 or later, because those versions have fixed the kernel bug.

Reset/Deletion 1.10, 1.11, 1.12

Namespace deletion

Deleting a namespace will prevent new resources from being created in that namespace, including jobs to reset machines.


Workaround

When deleting a user cluster, you must delete the cluster object first before deleting its namespace. Otherwise, the jobs to reset machines cannot get created, and the deletion process will skip the machine clean-up step.

Reset/Deletion 1.10, 1.11, 1.12, 1.13

containerd service

The bmctl reset command doesn't delete any containerd configuration files or binaries. The containerd systemd service is left up and running. The command deletes the containers running pods scheduled to the node.

Upgrades and updates 1.10, 1.11, 1.12

Node Problem Detector is not enabled by default after cluster upgrades

When you upgrade Anthos clusters on bare metal, Node Problem Detector is not enabled by default. This issue is applicable for upgrades in release 1.10 to 1.12.1 and has been fixed in release 1.12.2.


Workaround:

To enable the Node Problem Detector:

  1. Verify if node-problem-detector systemd service is running on the node.
    1. Use the SSH command and connect to the node.
    2. Check if node-problem-detector systemd service is running on the node:
      systemctl is-active node-problem-detector
      
      If the command result displays inactive, then the node-problem-detector is not running on the node.
  2. To enable the Node Problem Detector, use the kubectl edit command and edit the node-problem-detector-config ConfigMap. For more information, see Node Problem Detector.
Operation 1.9, 1.10

Cluster backup fails when using non-root login

The bmctl backup cluster command fails if nodeAccess.loginUser is set to a non-root username.]


Workaround:

This issue applies to Anthos clusters on bare metal 1.9.x, 1.10.0, and 1.10.1 and is fixed in version 1.10.2 and later.

Networking 1.10, 1.11, 1.12

Load Balancer Services don't work with containers on the control plane host network

There is a bug in anetd where packets are dropped for LoadBalancer Services if the backend pods are both running on the control plane node and are using the hostNetwork: true field in the container's spec.

The bug is not present in version 1.13 or later.


Workaround:

The following workarounds can help if you use a LoadBalancer Service that is backed by hostNetwork Pods:

  1. Run them on worker nodes (not control plane nodes).
  2. Use externalTrafficPolicy: local in the Service spec and ensure your workloads run on load balancer nodes.
Upgrades and updates 1.13

1.12 clusters upgraded from 1.11 can't upgrade to 1.13.0

Version 1.12 clusters that were upgraded from version 1.11 can't be upgraded to version 1.13.0. This upgrade issue doesn't apply to clusters that were created at version 1.12.

To determine if you're affected, check the logs of the upgrade job that contains the upgrade-first-no* string in the admin cluster. If you see the following error message, you're affected.

TASK [kubeadm_upgrade_apply : Run kubeadm upgrade apply] *******
...
[upgrade/config] FATAL: featureGates: Invalid value: map[string]bool{\"IPv6DualStack\":false}: IPv6DualStack is not a valid feature name.
...

Workaround:

To work around this issue:

  1. Run the following commands on your admin workstation:
    echo '[{ "op": "remove", "path": \
        "/spec/clusterConfiguration/featureGates" }]' \
        > remove-feature-gates.patch
    export KUBECONFIG=$ADMIN_KUBECONFIG
    kubectl get kubeadmconfig -A --no-headers | xargs -L1 bash -c \
        'kubectl patch kubeadmconfig $1 -n $0 --type json \
        --patch-file remove-feature-gates.patch'
    
  2. Re-attempt the cluster upgrade.
If you need additional assistance, reach out to Google Support.