Troubleshooting the container runtime

This document provides troubleshooting steps for common issues that you might encounter with the container runtime on your Google Kubernetes Engine (GKE) nodes.

/etc/mtab: No such file or directory

The Docker container runtime populates this symlink inside the container by default, but the containerd runtime does not.

For more details, refer to issue #2419

Workarounds

To work around this issue, manually create the symlink /etc/mtab during your image build.

ln -sf /proc/mounts /etc/mtab

Image pull error: not a directory

Affected GKE versions: all

When you build an image with kaniko, it may fail to be pulled with containerd with the error message "not a directory". This error happens if the image is built in a special way: when a previous command removes a directory and the next command recreates the same files in that directory.

Below is a Dockerfile example with npm that illustrates this problem.

RUN npm cache clean --force
RUN npm install

For more details, refer to issue #4659.

Workarounds

To work around this issue, build your image using docker build, which is unaffected by this issue.

If docker build isn't an option for you, then combine the commands into one. Below is the Dockerfile example mentioned above with npm. The woraround is to combine "RUN npm cache clean --force" and "RUN npm install".

RUN npm cache clean --force && npm install 

Some filesystem metrics are missing and the metrics format is different

Affected GKE versions: all

The Kubelet /metrics/cadvisor endpoint provides Prometheus metrics, as documented in Metrics for Kubernetes system components. If you install a metrics collector that depends on that endpoint, you might see the following issues:

  • The metrics format on the Docker node is k8s_<container-name>_<pod-name>_<namespace>_<pod-uid>_<restart-count> but the format on the containerd node is <container-id>.
  • Some filesystem metrics are missing on the containerd node, as follows:

    container_fs_inodes_free
    container_fs_inodes_total
    container_fs_io_current
    container_fs_io_time_seconds_total
    container_fs_io_time_weighted_seconds_total
    container_fs_limit_bytes
    container_fs_read_seconds_total
    container_fs_reads_merged_total
    container_fs_sector_reads_total
    container_fs_sector_writes_total
    container_fs_usage_bytes
    container_fs_write_seconds_total
    container_fs_writes_merged_total
    

Workarounds

You can mitigate this issue by using cAdvisor as a standalone daemonset.

  1. Find the latest cAdvisor release with the name pattern vX.Y.Z-containerd-cri (for example, v0.42.0-containerd-cri).
  2. Follow the steps in cAdvisor Kubernetes Daemonset to create the daemonset.
  3. Point the installed metrics collector to use the cAdvisor /metrics endpoint which provides the full set of Prometheus container metrics.

Alternatives

  1. Migrate your monitoring solution to Cloud Monitoring, which provides the full set of container metrics.
  2. Collect metrics from the Kubelet summary API with an endpoint of /stats/summary.

Attach-based operations do not function correctly after container-runtime restarts on GKE Windows

Affected GKE versions: 1.21 to 1.21.5-gke.1802, 1.22 to 1.22.3-gke.700

GKE clusters running Windows Server node pools that use the containerd runtime (version 1.5.4 and 1.5.7-gke.0) might experience issues if the container runtime is forcibly restarted, with attach operations to existing running containers not being able to bind IO again. The issue will not cause failures in API calls, however data will not be sent or received. This includes data for attach and logs CLIs and APIs through the cluster API server.

A patched container runtime version (1.5.7-gke.1) with newer GKE releases address the issue.

Pods display failed to allocate for range 0: no IP addresses available in range set error message

Affected GKE versions: 1.18, 1.19.0 to 1.19.14-gke.1400, 1.20.0 to 1.20.11-gke.1000, 1.21.0 to 1.21.5-gke.1000

GKE clusters running node pools that use containerd might experience IP leak issues and exhaust all the Pod IPs on a node. A Pod scheduled on an affected node displays an error message similar to the following:

failed to allocate for range 0: no IP addresses available in range set: 10.48.131.1-10.48.131.62

For more information about the issue, see containerd issue #5438 and issue #5768.

Workarounds

You can mitigate this issue using the following workarounds:

  • Use Docker-based node pools instead of containerd (not recommended).
  • Clean up the secondary Pod IP address range on the affected node.

To clean up the secondary Pod IP address range, connect to the affected node and perform the following steps:

  1. Stop the kubelet and containerd:

    systemctl stop kubelet
    systemctl stop containerd
    
  2. Rename the /var/lib/cni/networks directory to remove the old range:

    mv /var/lib/cni/networks /var/lib/cni/networks.backups
    
  3. Recreate the /var/libs/cni/networks directory:

    mkdir /var/lib/cni/networks
    
  4. Restart containerd and the kubelet:

    systemctl start containerd
    systemctl start kubelet
    

Exec probe behavior difference when probe exceeds the timeout

Affected GKE versions: all

Exec probe behavior on containerd images is different from the behavior on dockershim images. When exec probe, defined for the Pod, exceeds the declared timeoutSeconds threshold, on dockershim images, it is treated as a probe failure. On containerd images, probe results returned after the declared timeoutSeconds threshold are ignored.

Insecure registry option is not configured for local network (10.0.0.0/8)

Affected GKE versions: all

On containerd images the insecure registry option is not configured for local network 10.0.0.0/8. When migrating from images with Docker, which were using the private image registry, ensure the correct certificate is installed on the registry, or the registry is configured to use http.

containerd ignores any device mappings for privileged pods

Affected GKE versions: all

For privileged Pods, the container runtime ignores any device mappings that volumeDevices.devicePath passed to it, and instead makes every device on the host available to the container under /dev.

IPv6 address family is enabled on pods running containerd

Affected GKE versions: 1.18, 1.19, 1.20.0 to 1.20.9

IPv6 image family is enabled for Pods running with containerd. The dockershim image disables IPv6 on all Pods, while the containerd image does not. For example, localhost resolves to IPv6 address ::1 first. This is typically not a problem, however, this might result in unexpected behavior in certain cases.

As a workaround, use an IPv4 address such as 127.0.0.1 explicitly, or configure an application running in the Pod to work on both address families.

Node auto-provisioning only provisions Container-Optimized OS with Docker node pools

Affected GKE versions: 1.18, 1.19, 1.20.0 to 1.20.6-gke.1800

Node auto-provisioning allows auto-scaling node pools with any supported image type, but can only create new node pools with the Container-Optimized OS with Docker image type.

In GKE version 1.20.6-gke.1800 and later, the default image type can be set for the cluster.

Conflict with 172.17/16 IP address range

Affected GKE versions: 1.18.0 to 1.18.14

The 172.17/16 IP address range is occupied by the docker0 interface on the node VM with containerd enabled. Traffic sending to or originating from that range might not be routed correctly (for example, a Pod might not be able to connect to a VPN-connected host with an IP address within 172.17/16).

GPU metrics not collected

Affected GKE versions: 1.18.0 to 1.18.18

GPU usage metrics are not collected when using containerd as a runtime on GKE versions before 1.18.18.

Images with config.mediaType set to application/octet-stream cannot be used on containerd

Affected GKE versions: All

Images with config.mediaType set to "application/octet-stream" cannot be used on containerd. See Issue #4756. These images are not compatible with the Open Container Initiative specification and are considered incorrect. These images work with Docker to provide backward compatibility, while in containerd these images are not supported.

Symptom and diagnosis

Example error in node logs:

Error syncing pod <pod-uid> ("<pod-name>_<namespace>(<pod-uid>)"), skipping: failed to "StartContainer" for "<container-name>" with CreateContainerError: "failed to create containerd container: error unpacking image: failed to extract layer sha256:<some id>: failed to get reader from content store: content digest sha256:<some id>: not found"

The image manifest can be usually found in the registry where it is hosted. Once you have the manifest, check config.mediaType to determine if you have this issue:

"mediaType": "application/octet-stream",

Fix

As the containerd community decided to not support such images, all versions of containerd are affected and there is no fix. The container image must be rebuilt with Docker version 1.11 or later and you must ensure that the config.mediaType field is not set to "application/octet-stream".

Slow disk operations cause pod creation failures

Affected GKE node versions: 1.18, 1.19, 1.20.0 to 1.20.15-gke.2100, 1.21.0 to 1.21.9-gke.2000, 1.21.10 to 1.21.10-gke.100, 1.22.0 to 1.22.6-gke.2000, 1.22.7 to 1.22.7-gke.100, 1.23.0 to 1.23.3-gke.700, 1.23.4 to 1.23.4-gke.100

Symptom and diagnosis

Example error in k8s_node container-runtime logs:

Error: failed to reserve container name "container-name-abcd-ef12345678-91011_default_12131415-1234-5678-1234-12345789012_0": name "container-name-abcd-ef12345678-91011_default_12131415-1234-5678-1234-12345789012_0" is reserved for "1234567812345678123456781234567812345678123456781234567812345678"

Mitigation

  1. If pods are failing, consider using restartPolicy:Always or restartPolicy:OnFailure in your PodSpec.
  2. Increase the boot disk IOPS (for example, upgrade the disk type or increase the disk size).