Pod in kube-system stuck in CrashLoopBackOff status

Problem

Heapster pod suddenly changed to CrashLoopBackOff status. By checking the logs of the pod you will see this error:

panic: runtime error: invalid memory address or nil pointer dereference

Environment

  • Google Kubernetes Engine cluster v1.11.7-gke.12 in project A
  • Google Kubernetes Engine cluster v.1.11.8-gke.4 in project A

Solution

Workaround 1
  1. Edit your Google Kubernetes Engine cluster through the UI or the Cloud Shell, and disable Stackdriver Logging and Monitoring.
  2. Manually install heapster 1.6.1 in a new namespace called stackdriver-agents by running these commands:
    $ kubectl apply -f https://raw.githubusercontent.com/Stackdriver/kubernetes-configs/stable/rbac-setup.yaml --as=admin --as-group=system:masters
    
    $ kubectl apply -f https://raw.githubusercontent.com/Stackdriver/kubernetes-configs/stable/heapster.yaml
    

Workaround 2
To use this option, you have to be sure that the master version of the Google Kubernetes Engine cluster is v.1.11.8-gke.4 or higher (if not, an upgrade is required).

  1. Enable the new Stackdriver beta Monitoring and Logging experience v2.

By enabling this option, heapster in v1.6.0-beta.1 will be upgraded to v1.6.1. Also, it will continue in the kube-system namespace, and managed by Google Kubernetes Engine instead of the first workaround.

Cause

The nil pointer dereference issue is caused by an error in the heapster image, solved in GitHub.