Troubleshooting

Learn about troubleshooting steps that you might find helpful if you run into problems using Google Kubernetes Engine (GKE).

Debugging Kubernetes resources

If you are experiencing an issue related to your cluster, refer to Troubleshooting Clusters in the Kubernetes documentation.

If you are having an issue with your application, its Pods, or its controller object, refer to Troubleshooting Applications.

The kubectl command isn't found

  1. Install the kubectl binary by running the following command:

    sudo gcloud components update kubectl
    
  2. Answer "yes" when the installer prompts you to modify your $PATH environment variable. Modifying this variable enables you to use kubectl commands without typing their full file path.

    Alternatively, add the following line to ~/.bashrc (or ~/.bash_profile in macOS, or wherever your shell stores environment variables):

    export PATH=$PATH:/usr/local/share/google/google-cloud-sdk/bin/
    
  3. Run the following command to load your updated .bashrc (or .bash_profile) file:

    source ~/.bashrc
    

kubectl commands return "connection refused" error

Set the cluster context with the following command:

gcloud container clusters get-credentials cluster-name

If you are unsure of what to enter for cluster-name, use the following command to list your clusters:

gcloud container clusters list

kubectl commands return "failed to negotiate an api version" error

Ensure kubectl has authentication credentials:

gcloud auth application-default login

The kubectl logs, attach, exec, and port-forward commands stops responding

These commands rely on the cluster's control plane (master) being able to talk to the nodes in the cluster. However, because the control plane isn't in the same Compute Engine network as your cluster's nodes, we rely on SSH tunnels to enable secure communication.

GKE saves an SSH public key file in your Compute Engine project metadata. All Compute Engine VMs using Google-provided images regularly check their project's common metadata and their instance's metadata for SSH keys to add to the VM's list of authorized users. GKE also adds a firewall rule to your Compute Engine network allowing SSH access from the control plane's IP address to each node in the cluster.

If any of the above kubectl commands don't run, it's likely that the API server is unable to open SSH tunnels with the nodes. Check for these potential causes:

  1. The cluster doesn't have any nodes.

    If you've scaled down the number of nodes in your cluster to zero, SSH tunnels won't work.

    To fix it, resize your cluster to have at least one node.

  2. Pods in the cluster have gotten stuck in a terminating state and have prevented nodes that no longer exist from being removed from the cluster.

    This is an issue that should only affect Kubernetes version 1.1, but could be caused by repeated resizing of the cluster.

    To fix it, delete the Pods that have been in a terminating state for more than a few minutes. The old nodes are then removed from the control plane and replaced by the new nodes.

  3. Your network's firewall rules don't allow for SSH access to the control plane.

    All Compute Engine networks are created with a firewall rule called default-allow-ssh that allows SSH access from all IP addresses (requiring a valid private key, of course). GKE also inserts an SSH rule for each public cluster of the form gke-cluster-name-random-characters-ssh that allows SSH access specifically from the cluster's control plane to the cluster's nodes. If neither of these rules exists, then the control plane will be unable to open SSH tunnels.

    To fix it, re-add a firewall rule allowing access to VMs with the tag that's on all the cluster's nodes from the control plane's IP address.

  4. Your project's common metadata entry for "ssh-keys" is full.

    If the project's metadata entry named "ssh-keys" is close to maximum size limit, then GKE isn't able to add its own SSH key to enable it to open SSH tunnels. You can see your project's metadata by running the following command:

    gcloud compute project-info describe [--project=PROJECT]
    

    And then check the length of the list of ssh-keys.

    To fix it, delete some of the SSH keys that are no longer needed.

  5. You have set a metadata field with the key "ssh-keys" on the VMs in the cluster.

    The node agent on VMs prefers per-instance ssh-keys to project-wide SSH keys, so if you've set any SSH keys specifically on the cluster's nodes, then the control plane's SSH key in the project metadata won't be respected by the nodes. To check, run gcloud compute instances describe <VM-name> and look for an ssh-keys" field in the metadata.

    To fix it, delete the per-instance SSH keys from the instance metadata.

It's worth noting that these features are not required for the correct functioning of the cluster. If you prefer to keep your cluster's network locked down from all outside access, be aware that features like these won't work.

Metrics from your cluster aren't showing up in Cloud Monitoring

Ensure that you have activated the Cloud Monitoring API and the Cloud Logging API on your project, and that you are able to view your project in Cloud Monitoring.

If the issue persists, check the following potential causes:

  1. Ensure that you have enabled monitoring on your cluster.

    Monitoring is enabled by default for clusters created from the Google Cloud Console and from the gcloud command-line tool, but you can verify by running the following command or clicking into the cluster's details in the Cloud Console:

    gcloud container clusters describe cluster-name
    

    The output from this command should state that the "monitoringService" is "monitoring.googleapis.com", and Cloud Monitoring should be enabled in the Cloud Console.

    If monitoring is not enabled, run the following command to enable it:

    gcloud container clusters update cluster-name --monitoring-service=monitoring.googleapis.com
    
  2. How long has it been since your cluster was created or had monitoring enabled?

    It can take up to an hour for a new cluster's metrics to start appearing in Cloud Monitoring.

  3. Is a heapster or gke-metrics-agent (the OpenTelemetry Collector) running in your cluster in the "kube-system" namespace?

    This pod might be failing to schedule workloads because your cluster is running low on resources. Check whether Heapster or OpenTelemetry is running by calling kubectl get pods --namespace=kube-system and checking for pods with heapster or gke-metrics-agent in the name.

  4. Is your cluster's control plane able to communicate with the nodes?

    Cloud Monitoring relies on that. You can check whether this is the case by running the following command:

    kubectl logs pod-name
    

    If this command returns an error, then the SSH tunnels may be causing the issue. See this section for further information.

If you are having an issue related to the Cloud Logging agent, see its troubleshooting documentation.

For more information, refer to the Logging documentation.

Error 404: Resource "not found" when calling gcloud container commands

Re-authenticate to the gcloud command-line tool:

gcloud auth login

Error 400/403: Missing edit permissions on account

Your Compute Engine default service account or the service account associated with GKE has been deleted or edited manually.

When you enable the Compute Engine or Kubernetes Engine API, a service account is created and given edit permissions on your project. If at any point you edit the permissions, remove the "Kubernetes Engine Service Agent" role, remove the account entirely, or disable the API, cluster creation and all management functionality will fail.

The name of your Google Kubernetes Engine service account is as follows, where project-number is your project number:

service-project-number@container-engine-robot.iam.gserviceaccount.com

To resolve the issue, if you have removed the Kubernetes Engine Service Agent role from your Google Kubernetes Engine service account, add it back. Otherwise, you must re-enable the Kubernetes Engine API, which will correctly restore your service accounts and permissions. You can do this in the gcloud tool or the Cloud Console.

Console

  1. Visit the APIs & Services in Cloud Console.

    APIs & Services page

  2. Select your project.

  3. Click Enable APIs and Services.

  4. Search for Kubernetes, then select the API from the search results.

  5. Click Enable. If you have previously enabled the API, you must first disable it and then enable it again. It can take several minutes for the API and related services to be enabled.

gcloud

Run the following command in the gcloud tool:

gcloud services enable container.googleapis.com

Replicating 1.8.x (and earlier) automatic firewall rules on 1.9.x and later

If your cluster is running Kubernetes version 1.9.x, the automatic firewall rules have changed to disallow workloads in a GKE cluster to initiate communication with other Compute Engine VMs that are outside the cluster but on the same network.

You can replicate the automatic firewall rules behavior of a 1.8.x (and earlier) cluster by performing the following steps:

  1. Find your cluster's network:

    gcloud container clusters describe cluster-name --format=get"(network)"
    
  2. Get the cluster's IPv4 CIDR used for the containers:

    gcloud container clusters describe cluster-name --format=get"(clusterIpv4Cidr)"
    
  3. Create a firewall rule for the network, with the CIDR as the source range, and allow all protocols:

    gcloud compute firewall-rules create "cluster-name-to-all-vms-on-network" \
      --network="network" --source-ranges="cluster-ipv4-cidr" \
      --allow=tcp,udp,icmp,esp,ah,sctp
    

Restore default service account to your GCP project

GKE's default service account, container-engine-robot, can accidentally become unbound from a project. GKE Service Agent is an Identity and Access Management (IAM) role that grants the service account the permissions to manage cluster resources. If you remove this role binding from the service account, the default service account becomes unbound from the project, which can prevent you from deploying applications and performing other cluster operations.

You can check to see if the service account has been removed from your project using gcloud tool or the Cloud Console.

gcloud

Run the following command:

gcloud projects get-iam-policy project-id

Replace project-id with your project ID.

Console

Visit the IAM & Admin page in Cloud Console.

If the command or the dashboard do not display container-engine-robot among your service accounts, the service account has become unbound.

If you removed the GKE Service Agent role binding, run the following commands to restore the role binding:

PROJECT_ID=$(gcloud config get-value project)
PROJECT_NUMBER=$(gcloud projects describe "${PROJECT_ID}" --format "value(projectNumber)")
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
  --member "serviceAccount:service-${PROJECT_NUMBER}@container-engine-robot.iam.gserviceaccount.com" \
  --role roles/container.serviceAgent

To confirm that the role binding was granted:

gcloud projects get-iam-policy $PROJECT_ID

If you see the service account name along with the container.serviceAgent role, the role binding has been granted. For example:

- members:
  - serviceAccount:service-1234567890@container-engine-robot.iam.gserviceaccount.com
  role: roles/container.serviceAgent

Cloud KMS key is disabled.

GKE's default service account cannot use a disabled Cloud KMS key for application-level secrets encryption.

To re-enable a disabled key, see Enable a disabled key version.

Pods stuck in pending state after enabling Node Allocatable

If you are experiencing an issue with Pods stuck in pending state after enabling Node Allocatable, please note the following:

Starting with version 1.7.6, GKE reserves CPU and memory for Kubernetes overhead, including Docker and the operating system. See Cluster architecture for information on how much of each machine type can be scheduled by Pods.

If Pods are pending after an upgrade, we suggest the following:

  1. Ensure CPU and Memory requests for your Pods do not exceed their peak usage. With GKE reserving CPU and memory for overhead, Pods cannot request these resources. Pods that request more CPU or memory than they use prevent other Pods from requesting these resources, and might leave the cluster underutilized. For more information, see How Pods with resource requests are scheduled.

  2. Consider resizing your cluster. For instructions, see Resizing a cluster.

  3. Revert this change by downgrading your cluster. For instructions, see Manually upgrading a cluster or node pool.

Private cluster nodes created but not joining the cluster

Often when using custom routing and third-party network appliances on the VPC your private cluster is using, the default route (0.0.0.0/0) is redirected to the appliance instead of the default internet gateway. In addition to the control plane connectivity, you need to ensure that the following destinations are reachable:

  • *.googleapis.com
  • *.gcr.io
  • gcr.io

Configure Private Google Access for all three domains. This best practice allows the new nodes to startup and join the cluster while keeping the internet bound traffic restricted.

Troubleshooting issues with deployed workloads

GKE returns an error if there are issues with a workload's Pods. You can check the status of a Pod using the kubectl command-line tool or Cloud Console.

kubectl

To see all Pods running in your cluster, run the following command:

kubectl get pods

Output:

NAME       READY  STATUS             RESTARTS  AGE
pod-name   0/1    CrashLoopBackOff   23        8d

To get more details information about a specific Pod, run the following command:

kubectl describe pod pod-name

Replace pod-name with the name of the desired Pod.

Console

Perform the following steps:

  1. Visit the GKE Workloads dashboard in Cloud Console.

    Visit the GKE Workloads dashboard

  2. Select the desired workload. The Overview tab displays the status of the workload.

  3. From the Managed Pods section, click the error status message.

The following sections explain some common errors returned by workloads and how to resolve them.

CrashLoopBackOff

CrashLoopBackOff indicates that a container is repeatedly crashing after restarting. A container might crash for many reasons, and checking a Pod's logs might aid in troubleshooting the root cause.

By default, crashed containers restart with an exponential delay limited to five minutes. You can change this behavior setting the restartPolicy field Deployment's Pod specification under spec: restartPolicy. The field's default value is Always.

You can find out why your Pod's container is crashing using the kubectl command-line tool or Cloud Console.

kubectl

To see all Pods running in your cluster, run the following command:

kubectl get pods

Look for the Pod with the CrashLoopBackOff error.

To get the Pod's logs, run the following command:

kubectl logs pod-name

Replace pod-name with the name of the problematic Pod.

You can also pass in the -p flag to get the logs for the previous instance of a Pod's container, if it exists.

Console

Perform the following steps:

  1. Visit the GKE Workloads dashboard in Cloud Console.

    Visit the GKE Workloads dashboard

  2. Select the desired workload. The Overview tab displays the status of the workload.

  3. From the Managed Pods section, click the problematic Pod.

  4. From the Pod's menu, click the Logs tab.

Check "Exit Code" of the crashed container

You can find the exit code by performing the following tasks:

  1. Run the following command:

    kubectl describe pod pod-name
    

    Replace pod-name with the name of the Pod.

  2. Review the value in the containers: container-name: last state: exit code field:

    • If the exit code is 1, the container crashed because the application crashed.
    • If the exit code is 0, verify for how long your app was running.

    Containers exit when your application's main process exits. If your app finishes execution very quickly, container might continue to restart.

Connect to a running container

Open a shell to the Pod:

kubectl exec -it pod-name -- /bin/bash

If there is more than one container in your Pod, add -c container-name.

Now, you can run bash commands from the container: you can test the network or check if you have access to files or databases used by your application.

ImagePullBackOff and ErrImagePull

ImagePullBackOff and ErrImagePull indicate that the image used by a container cannot be loaded from the image registry.

You can verify this issue using Cloud Console or the kubectl command-line tool.

kubectl

To get more information about a Pod's container image, run the following command:

kubectl describe pod pod-name

Console

Perform the following steps:

  1. Visit the GKE Workloads dashboard in Cloud Console.

    Visit the GKE Workloads dashboard

  2. Select the desired workload. The Overview tab displays the status of the workload.

  3. From the Managed Pods section, click the problematic Pod.

  4. From the Pod's menu, click the Events tab.

If the image is not found

If your image is not found:

  1. Verify that the image's name is correct.
  2. Verify that the image's tag is correct. (Try :latest or no tag to pull the latest image).
  3. If the image has full registry path, verify that it exists in the Docker registry you are using. If you provide only the image name, check the Docker Hub registry.
  4. Try to pull the docker image manually:

    • SSH into the node:

      For example, to SSH into example-instance in the us-central1-a zone:

      gcloud compute ssh example-instance --zone us-central1-a
      
    • Run docker pull image-name.

    If this option works, you probably need to specify ImagePullSecrets on a Pod. Pods can only reference image pull secrets in their own namespace, so this process needs to be done one time per namespace.

If you encounter a "permission denied" or "no pull access" error, verify that you are logged in and/or have access to the image.

If you are using a private registry, it may require keys to read images.

If your image is hosted in Container Registry, the service account associated with your node pool needs read access to the Cloud Storage bucket containing the image. See Container Registry documentation for further details.

Pod unschedulable

PodUnschedulable indicates that your Pod cannot be scheduled because of insufficient resources or some configuration error.

Insufficient resources

You might encounter an error indicating a lack of CPU, memory, or another resource. For example: "No nodes are available that match all of the predicates: Insufficient cpu (2)" which indicates that on two nodes there isn't enough CPU available to fulfill a Pod's requests.

The default CPU request is 100m or 10% of a CPU (or one core). If you want to request more or fewer resources, specify the value in the Pod specification under spec: containers: resources: requests

MatchNodeSelector

MatchNodeSelector indicates that there are no nodes that match the Pod's label selector.

To verify this, check the labels specified in the Pod specification's nodeSelector field, under spec: nodeSelector.

To see how nodes in your cluster are labelled, run the following command:

kubectl get nodes --show-labels

To attach a label to a node, run the following command:

kubectl label nodes node-name label-key=label-value

Replace the following:

  • node-name with the desired node.
  • label-key with the label's key.
  • label-value with the label's value.

For more information, refer to Assigning Pods to Nodes.

PodToleratesNodeTaints

PodToleratesNodeTaints indicates that the Pod can't be scheduled to any node because no node currently tolerates its node taint.

To verify that this is the case, run the following command:

kubectl describe nodes node-name

In the output, check the Taints field, which lists key-value pairs and scheduling effects.

If the effect listed is NoSchedule, then no Pod can be scheduled on that node unless it has a matching toleration.

One way to resolve this issue is to remove the taint. For example, to remove a NoSchedule taint, run the following command:

kubectl taint nodes node-name key:NoSchedule-

PodFitsHostPorts

PodFitsHostPorts indicates that a port that a node is attempting to use is already in use.

To resolve this issue, check the Pod specification's hostPort value under spec: containers: ports: hostPort. You might need to change this value to another port.

Does not have minimum availability

If a node has adequate resources but you still see the Does not have minimum availability message, check the Pod's status. If the status is SchedulingDisabled or Cordoned status, the node cannot schedule new Pods. You can check the status of a node using Cloud Console or the kubectl command-line tool.

kubectl

To get statuses of your nodes, run the following command:

kubectl get nodes

To enable scheduling on the Node, run:

kubectl uncordon node-name

Console

Perform the following steps:

  1. Visit the GKE Workloads dashboard in Cloud Console.

    Visit the GKE Clusters dashboard

  2. Select the desired cluster. The Nodes tab displays the Nodes and their status.

To enable scheduling on the Node, perform the following steps:

  1. From the list, click the desired Node.

  2. From the Node Details, click Uncordon button.

Unbound PersistentVolumeClaims

Unbound PersistentVolumeClaims indicates that the Pod references a PersistentVolumeClaim that is not bound. This error might happen if your PersistentVolume failed to provision. You can verify that provisioning failed by getting the events for your PersistentVolumeClaim and examining them for failures.

To get events, run the following command:

kubectl describe pvc statefulset-name-pvc-name-0

Replace the following:

  • statefulset-name with the name of the StatefulSet object.
  • pvc-name with the name of the PersistentVolumeClaim object.

This may also happen if there was a configuration error during your manual pre-provisioning of a PersistentVolume and its binding to a PersistentVolumeClaim. You can try to pre-provision the volume again.

Connectivity issues

As mentioned in the Network Overview discussion, it is important to understand how Pods are wired from their network namespaces to the root namespace on the node in order to troubleshoot effectively. For the following discussion, unless otherwise stated, assume that the cluster uses GKE's native CNI rather than Calico's. That is, no network policy has been applied.

Pods on select nodes have no availability

If Pods on select nodes have no network connectivity, ensure that the Linux bridge is up:

ip address show cbr0

If the Linux bridge is down, raise it:

sudo ip link set cbr0 up

Ensure that the node is learning Pod MAC addresses attached to cbr0:

arp -an

Pods on select nodes have minimal connectivity

If Pods on select nodes have minimal connectivity, you should first confirm whether there are any lost packets by running tcpdump in the toolbox container:

sudo toolbox bash

Install tcpdump in the toolbox if you have not done so already:

apt install -y tcpdump

Run tcpdump against cbr0:

tcpdump -ni cbr0 host hostname and port port-number and [tcp|udp|icmp]

Should it appear that large packets are being dropped downstream from the bridge (for example, the TCP handshake completes, but no SSL hellos are received), ensure that the Linux bridge MTU is correctly set to the MTU of the cluster's VPC network.

ip address show cbr0

When overlays are used (for example, Weave or Flannel), this MTU must be further reduced to accommodate encapsulation overhead on the overlay.

Intermittent failed connections

Connections to and from the Pods are forwarded by iptables. Flows are tracked as entries in the conntrack table and, where there are many workloads per node, conntrack table exhaustion may manifest as a failure. These can be logged in the serial console of the node, for example:

nf_conntrack: table full, dropping packet

If you are able to determine that intermittent issues are driven by conntrack exhaustion, you may increase the size of the cluster (thus reducing the number of workloads and flows per node), or increase nf_conntrack_max:

new_ct_max=$(awk '$1 == "MemTotal:" { printf "%d\n", $2/32; exit; }' /proc/meminfo)
sysctl -w net.netfilter.nf_conntrack_max="${new_ct_max:?}" \
  && echo "net.netfilter.nf_conntrack_max=${new_ct_max:?}" >> /etc/sysctl.conf

"bind: Address already in use" reported for a container

A container in a Pod is unable to start because according to the container logs, the port where the application is trying to bind to is already reserved. The container is crash looping. For example, in Cloud Logging:

resource.type="container"
textPayload:"bind: Address already in use"
resource.labels.container_name="redis"

2018-10-16 07:06:47.000 CEST 16 Oct 05:06:47.533 # Creating Server TCP listening socket *:60250: bind: Address already in use
2018-10-16 07:07:35.000 CEST 16 Oct 05:07:35.753 # Creating Server TCP listening socket *:60250: bind: Address already in use

When Docker crashes, sometimes a running container gets left behind and is stale. The process is still running in the network namespace allocated for the Pod, and listening on its port. Because Docker and the kubelet don't know about the stale container they try to start a new container with a new process, which is unable to bind on the port as it gets added to the network namespace already associated with the Pod.

To diagnose this problem:

  1. You need the UUID of the Pod in the .metadata.uuid field:

    kubectl get pod -o custom-columns="name:.metadata.name,UUID:.metadata.uid" ubuntu-6948dd5657-4gsgg
    
    name                      UUID
    ubuntu-6948dd5657-4gsgg   db9ed086-edba-11e8-bdd6-42010a800164
    
  2. Get the output of the following commands from the node:

    docker ps -a
    ps -eo pid,ppid,stat,wchan:20,netns,comm,args:50,cgroup --cumulative -H | grep [Pod UUID]
    
  3. Check running processes from this Pod. Because the UUID of the cgroup namespaces contain the UUID of the Pod, you can grep for the Pod UUID in ps output. Grep also the line before, so you will have the docker-containerd-shim processes having the container id in the argument as well. Cut the rest of the cgroup column to get a simpler output:

    # ps -eo pid,ppid,stat,wchan:20,netns,comm,args:50,cgroup --cumulative -H | grep -B 1 db9ed086-edba-11e8-bdd6-42010a800164 | sed s/'blkio:.*'/''/
    1283089     959 Sl   futex_wait_queue_me  4026531993       docker-co       docker-containerd-shim 276e173b0846e24b704d4 12:
    1283107 1283089 Ss   sys_pause            4026532393         pause           /pause                                     12:
    1283150     959 Sl   futex_wait_queue_me  4026531993       docker-co       docker-containerd-shim ab4c7762f5abf40951770 12:
    1283169 1283150 Ss   do_wait              4026532393         sh              /bin/sh -c echo hello && sleep 6000000     12:
    1283185 1283169 S    hrtimer_nanosleep    4026532393           sleep           sleep 6000000                            12:
    1283244     959 Sl   futex_wait_queue_me  4026531993       docker-co       docker-containerd-shim 44e76e50e5ef4156fd5d3 12:
    1283263 1283244 Ss   sigsuspend           4026532393         nginx           nginx: master process nginx -g daemon off; 12:
    1283282 1283263 S    ep_poll              4026532393           nginx           nginx: worker process
    
  4. From this list, you can see the container ids, which should be visible in docker ps as well.

    In this case:

    • docker-containerd-shim 276e173b0846e24b704d4 for pause
    • docker-containerd-shim ab4c7762f5abf40951770 for sh with sleep (sleep-ctr)
    • docker-containerd-shim 44e76e50e5ef4156fd5d3 for nginx (echoserver-ctr)
  5. Check those in the docker ps output:

    # docker ps --no-trunc | egrep '276e173b0846e24b704d4|ab4c7762f5abf40951770|44e76e50e5ef4156fd5d3'
    44e76e50e5ef4156fd5d383744fa6a5f14460582d0b16855177cbed89a3cbd1f   gcr.io/google_containers/echoserver@sha256:3e7b182372b398d97b747bbe6cb7595e5ffaaae9a62506c725656966d36643cc                   "nginx -g 'daemon off;'"                                                                                                                                                                                                                                                                                                                                                                     14 hours ago        Up 14 hours                             k8s_echoserver-cnt_ubuntu-6948dd5657-4gsgg_default_db9ed086-edba-11e8-bdd6-42010a800164_0
    ab4c7762f5abf40951770d3e247fa2559a2d1f8c8834e5412bdcec7df37f8475   ubuntu@sha256:acd85db6e4b18aafa7fcde5480872909bd8e6d5fbd4e5e790ecc09acc06a8b78                                                "/bin/sh -c 'echo hello && sleep 6000000'"                                                                                                                                                                                                                                                                                                                                                   14 hours ago        Up 14 hours                             k8s_sleep-cnt_ubuntu-6948dd5657-4gsgg_default_db9ed086-edba-11e8-bdd6-42010a800164_0
    276e173b0846e24b704d41cf4fbb950bfa5d0f59c304827349f4cf5091be3327   k8s.gcr.io/pause-amd64:3.1
    

    In normal cases, you see all container ids from ps showing up in docker ps. If there is one you don't see, it's a stale container, and probably you will see a child process of the docker-containerd-shim process listening on the TCP port that is reporting as already in use.

    To verify this, execute netstat in the container's network namespace. Get the pid of any container process (so NOT docker-containerd-shim) for the Pod.

    From the above example:

    • 1283107 - pause
    • 1283169 - sh
    • 1283185 - sleep
    • 1283263 - nginx master
    • 1283282 - nginx worker
    # nsenter -t 1283107 --net netstat -anp
    Active Internet connections (servers and established)
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
    tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      1283263/nginx: mast
    Active UNIX domain sockets (servers and established)
    Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
    unix  3      [ ]         STREAM     CONNECTED     3097406  1283263/nginx: mast
    unix  3      [ ]         STREAM     CONNECTED     3097405  1283263/nginx: mast
    
    gke-zonal-110-default-pool-fe00befa-n2hx ~ # nsenter -t 1283169 --net netstat -anp
    Active Internet connections (servers and established)
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
    tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      1283263/nginx: mast
    Active UNIX domain sockets (servers and established)
    Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
    unix  3      [ ]         STREAM     CONNECTED     3097406  1283263/nginx: mast
    unix  3      [ ]         STREAM     CONNECTED     3097405  1283263/nginx: mast
    

    You can also execute netstat using ip netns, but you need to link the network namespace of the process manually, as Docker is not doing the link:

    # ln -s /proc/1283169/ns/net /var/run/netns/1283169
    gke-zonal-110-default-pool-fe00befa-n2hx ~ # ip netns list
    1283169 (id: 2)
    gke-zonal-110-default-pool-fe00befa-n2hx ~ # ip netns exec 1283169 netstat -anp
    Active Internet connections (servers and established)
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
    tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      1283263/nginx: mast
    Active UNIX domain sockets (servers and established)
    Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
    unix  3      [ ]         STREAM     CONNECTED     3097406  1283263/nginx: mast
    unix  3      [ ]         STREAM     CONNECTED     3097405  1283263/nginx: mast
    gke-zonal-110-default-pool-fe00befa-n2hx ~ # rm /var/run/netns/1283169
    

Mitigation:

The short term mitigation is to identify stale processes by the method outlined above, and end the processes using the kill [PID] command.

Long term mitigation involves identifying why Docker is crashing and fixing that. Possible reasons include:

  • Zombie processes piling up, so running out of PID namespaces
  • Bug in docker
  • Resource pressure / OOM